1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 21:04:34 +00:00

787 Commits

Author SHA1 Message Date
Pablo Hoffman
c3d61dc999 changing extension in test from csv to txt (not all systems support the text/csv mimetype)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40787
2009-01-27 13:33:28 +00:00
Pablo Hoffman
134b867abe added support for unknown file extensions to ResponseTypes.from_filename
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40786
2009-01-27 13:08:07 +00:00
elpolilla
14e67d59c2 Added test for encoding in csviter
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40785
2009-01-27 12:35:09 +00:00
Pablo Hoffman
1e55e88111 fixed some typos in previous commit
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40784
2009-01-27 12:18:23 +00:00
Pablo Hoffman
11a9236125 added example about creating Requests with cookies
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40783
2009-01-27 12:17:40 +00:00
Pablo Hoffman
14ceca98bf added FormRequest example
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40782
2009-01-27 12:10:49 +00:00
Pablo Hoffman
e25bfa5d88 added more cases to ResponseTypes, and tests for ResponseTypes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40781
2009-01-27 11:27:09 +00:00
elpolilla
b6246cbc37 Fixed encoding-related bug in csviter
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40780
2009-01-27 11:10:42 +00:00
Pablo Hoffman
4013497edb more patches sent by Patrick
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40779
2009-01-26 23:42:51 +00:00
Pablo Hoffman
c1a1b8945a some doc fixes suggested by Patrick Mézard
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40778
2009-01-26 23:38:21 +00:00
Pablo Hoffman
57189e1b92 applied Patrick Mézard patch for loading local files
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40777
2009-01-26 23:31:04 +00:00
Pablo Hoffman
ce1700dd8e added libxml2 installation steps for MacOSX (thanks Patrick Mézard)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40776
2009-01-26 23:28:19 +00:00
Pablo Hoffman
279b9ac40d updated adaptors docs to reflect its inestability and improved the ReST formatting (80 column lines, no ugly pipes at the beginnig of lines)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40775
2009-01-26 23:22:53 +00:00
elpolilla
e8e87bcb52 Fixed small bug in adaptor pipelines that tried to work with None values
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40774
2009-01-26 16:20:58 +00:00
Pablo Hoffman
7ed88fd0f3 added Content-Disposition encoding discovery to ResponseTypes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40773
2009-01-26 15:36:53 +00:00
elpolilla
9de6ee5109 Added missing test for change in r769
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40772
2009-01-26 15:25:53 +00:00
elpolilla
91eff31f18 . Modified the default value of the BOT_NAME setting to the project's name
. Modified spider templates to use the already-generated example item instead of a ScrapedItem

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40771
2009-01-26 15:24:52 +00:00
elpolilla
f63a661320 Normalized the usage of ints for storing http status codes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40770
2009-01-26 15:03:47 +00:00
elpolilla
74661d54d0 Added application/xml mimetype to the known response types
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40769
2009-01-26 15:03:14 +00:00
elpolilla
b9d3a2cb96 Improved adaptors debugging in order to make it clearer for reading
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40768
2009-01-26 12:15:31 +00:00
Pablo Hoffman
bc4e80f640 reverted to IO-blocking MailSender implementation (using standard smtplib) until we fix some problems with deferred left unexectuted when stopping the engine
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40767
2009-01-26 03:40:59 +00:00
Pablo Hoffman
5f0d5a1653 Big Response/Request refactoring:
- added Response subclasses: TextResponse, HtmlResponse, XmlResponse
- made Response.body a str
- added Response.body_as_unicode() method
- added encoding attribute for TextResponse and subclasses
- added headers_encoding() and body_encoding() to TextResponse and subclasses
- added ResponseTypes class to guess the Response class to use based on
  mimetype and other criteria

- added and improved several Request/Response tests
- updated request/response documetnation to reflect the changes

Another changes not related to encoding:

- added lixbml2debug for debugging libxml2 memory leaks, which can be enabled
  by a environment variable
- added memoizemethod decorator (implemented using descriptors) to cache the
  result of methods
- moved DecompressionMiddleware to contrib_exp

--HG--
rename : scrapy/trunk/scrapy/tests/test_spiders/testplugin.py => scrapy/trunk/scrapy/tests/test_spiders/testspider.py
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40766
2009-01-26 02:57:03 +00:00
elpolilla
dbaf602730 . Updated some adaptors docstrings
. Turned Delist and Unquote adaptors into factory functions instead of classes
. Updated tests

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40765
2009-01-26 01:15:05 +00:00
Pablo Hoffman
33df3c81c2 explained what the heck is scrapy.contrib_exp
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40764
2009-01-24 16:59:08 +00:00
Daniel Grana
a07c95003f duplicatesfilter: first version of configurable duplicate requests filtering middleware
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40763
2009-01-23 03:42:05 +00:00
elpolilla
9209dbc882 Fixed syntax error in test
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40762
2009-01-22 17:13:27 +00:00
elpolilla
f6021cad2c Several modifications done:
. attribute method moved from ScrapedItem to RobustScrapedItem
. this method was also drastically changed. now it accepts *args and runs the adaptor pipeline for each value
. many adaptors were changed to work with single values instead of lists (due to the previous point's change)
. ExtractImageLinks adaptor was modified to make use of the HTMLImageLinkExtractor
. adaptors were moved from contrib to contrib_exp
. tests were updated

--HG--
rename : scrapy/trunk/scrapy/contrib/adaptors/misc.py => scrapy/trunk/scrapy/contrib_exp/adaptors/misc.py
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40761
2009-01-22 16:41:41 +00:00
elpolilla
4142fd0d5d Added get_base_url to utils.response and tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40760
2009-01-22 16:40:01 +00:00
elpolilla
86923cc694 Changes in HTMLImageLinkExtractor:
. Fixed little bug that triggered IndexErrors in some cases
. Added support for receiving selectors instead of just raw xpath expressions
. Re-enabled tests

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40759
2009-01-22 16:39:31 +00:00
Daniel Grana
eebc070fe4 tutorial: fix reference to scrapy-ctl. contributed by Patrick Mezard <pmezard@gmail.com>
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40758
2009-01-22 14:28:20 +00:00
Daniel Grana
8ff4dc4d02 RetryMiddleware: added ConnectionLost to retried exceptions
twisted >8.0 has a ConnectionClosed exception parent of ConnectionLost
and ConnectionDone, but twisted 2.5 hasn't.

I add ConnectionLost until we can move forward to twisted >8.0

this is the docstring of ConnectionLost, hopes it is self explanatory:
    """Connection to the other side was lost in a non-clean fashion"""

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40757
2009-01-22 03:19:53 +00:00
Pablo Hoffman
bef1fa967a removed Request.append_callback() method (it was just an alias to Request.deferred.addCallback). refs #48
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40756
2009-01-20 21:49:20 +00:00
Pablo Hoffman
677d0c366f renamed Request url_encoding constructor argument to encoding. added Request.body tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40755
2009-01-20 21:10:18 +00:00
Pablo Hoffman
12d0bd4dbb added Content-Length header population to Common downloader middleware
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40754
2009-01-20 21:00:56 +00:00
Ismael Carnales
1e002a0c98 make the install scrapy code steps a list, so it doesn't show as sepparate points in the doc overview (we need an style guiide)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40753
2009-01-19 14:01:57 +00:00
Ismael Carnales
ddf0366037 removed $ from commands in install it doesn't look so nice but it copy/paste compatible
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40752
2009-01-19 13:51:16 +00:00
Ismael Carnales
1cc95dc129 changed (and fixed) download links for windows libraries in install
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40751
2009-01-19 13:48:06 +00:00
Ismael Carnales
30e928568f corrected arch linux install information
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40750
2009-01-19 13:37:07 +00:00
Pablo Hoffman
4018f07d17 minor update to topics/settings.rst
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40749
2009-01-19 03:14:23 +00:00
Daniel Grana
a29fed066c docs: fix MailSender and Settings method references
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40748
2009-01-19 00:35:28 +00:00
Pablo Hoffman
3f13b388c7 added additional test to ResponseSoup extension
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40747
2009-01-18 19:31:35 +00:00
Pablo Hoffman
a413fc3bce some minor performance improvements in downloader handlers, added scrapy.optional_features set
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40746
2009-01-18 19:20:32 +00:00
Pablo Hoffman
0c9f7257a2 added Request.replace method, improved tests for replace/copy method in Request/Response classes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40745
2009-01-18 17:52:21 +00:00
Pablo Hoffman
314bbabb30 removed 'domain' from Request attributes and constructor arguments
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40744
2009-01-18 16:55:54 +00:00
Pablo Hoffman
d3c4d1f1e1 removed domain argument from Response constructor
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40743
2009-01-18 16:38:01 +00:00
Pablo Hoffman
db91d26871 removed 'domain' argument from Response objects constructor. besides being a required first constructor argument, it wasn't actually needed and made the Response consturctor more complex
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40742
2009-01-18 16:36:17 +00:00
Pablo Hoffman
654b49c86e added meta argument to Request & Response constructors
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40741
2009-01-17 23:57:53 +00:00
Pablo Hoffman
8ecc6808e0 removed Request.context attribute (use Request.meta instead)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40740
2009-01-17 23:09:53 +00:00
Pablo Hoffman
7e640da433 renamed to_string() Request and Response methods to httprepr(). removed __len__() from Request and Response
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40739
2009-01-17 22:11:54 +00:00
Pablo Hoffman
5dc1e7e5ca updated request/response reference doc
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40738
2009-01-17 21:05:08 +00:00