1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 19:47:36 +00:00

2525 Commits

Author SHA1 Message Date
Pablo Hoffman
279b9ac40d updated adaptors docs to reflect its inestability and improved the ReST formatting (80 column lines, no ugly pipes at the beginnig of lines)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40775
2009-01-26 23:22:53 +00:00
elpolilla
e8e87bcb52 Fixed small bug in adaptor pipelines that tried to work with None values
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40774
2009-01-26 16:20:58 +00:00
Pablo Hoffman
7ed88fd0f3 added Content-Disposition encoding discovery to ResponseTypes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40773
2009-01-26 15:36:53 +00:00
elpolilla
9de6ee5109 Added missing test for change in r769
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40772
2009-01-26 15:25:53 +00:00
elpolilla
91eff31f18 . Modified the default value of the BOT_NAME setting to the project's name
. Modified spider templates to use the already-generated example item instead of a ScrapedItem

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40771
2009-01-26 15:24:52 +00:00
elpolilla
f63a661320 Normalized the usage of ints for storing http status codes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40770
2009-01-26 15:03:47 +00:00
elpolilla
74661d54d0 Added application/xml mimetype to the known response types
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40769
2009-01-26 15:03:14 +00:00
elpolilla
b9d3a2cb96 Improved adaptors debugging in order to make it clearer for reading
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40768
2009-01-26 12:15:31 +00:00
Pablo Hoffman
bc4e80f640 reverted to IO-blocking MailSender implementation (using standard smtplib) until we fix some problems with deferred left unexectuted when stopping the engine
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40767
2009-01-26 03:40:59 +00:00
Pablo Hoffman
5f0d5a1653 Big Response/Request refactoring:
- added Response subclasses: TextResponse, HtmlResponse, XmlResponse
- made Response.body a str
- added Response.body_as_unicode() method
- added encoding attribute for TextResponse and subclasses
- added headers_encoding() and body_encoding() to TextResponse and subclasses
- added ResponseTypes class to guess the Response class to use based on
  mimetype and other criteria

- added and improved several Request/Response tests
- updated request/response documetnation to reflect the changes

Another changes not related to encoding:

- added lixbml2debug for debugging libxml2 memory leaks, which can be enabled
  by a environment variable
- added memoizemethod decorator (implemented using descriptors) to cache the
  result of methods
- moved DecompressionMiddleware to contrib_exp

--HG--
rename : scrapy/trunk/scrapy/tests/test_spiders/testplugin.py => scrapy/trunk/scrapy/tests/test_spiders/testspider.py
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40766
2009-01-26 02:57:03 +00:00
elpolilla
dbaf602730 . Updated some adaptors docstrings
. Turned Delist and Unquote adaptors into factory functions instead of classes
. Updated tests

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40765
2009-01-26 01:15:05 +00:00
Pablo Hoffman
33df3c81c2 explained what the heck is scrapy.contrib_exp
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40764
2009-01-24 16:59:08 +00:00
Daniel Grana
a07c95003f duplicatesfilter: first version of configurable duplicate requests filtering middleware
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40763
2009-01-23 03:42:05 +00:00
elpolilla
9209dbc882 Fixed syntax error in test
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40762
2009-01-22 17:13:27 +00:00
elpolilla
f6021cad2c Several modifications done:
. attribute method moved from ScrapedItem to RobustScrapedItem
. this method was also drastically changed. now it accepts *args and runs the adaptor pipeline for each value
. many adaptors were changed to work with single values instead of lists (due to the previous point's change)
. ExtractImageLinks adaptor was modified to make use of the HTMLImageLinkExtractor
. adaptors were moved from contrib to contrib_exp
. tests were updated

--HG--
rename : scrapy/trunk/scrapy/contrib/adaptors/misc.py => scrapy/trunk/scrapy/contrib_exp/adaptors/misc.py
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40761
2009-01-22 16:41:41 +00:00
elpolilla
4142fd0d5d Added get_base_url to utils.response and tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40760
2009-01-22 16:40:01 +00:00
elpolilla
86923cc694 Changes in HTMLImageLinkExtractor:
. Fixed little bug that triggered IndexErrors in some cases
. Added support for receiving selectors instead of just raw xpath expressions
. Re-enabled tests

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40759
2009-01-22 16:39:31 +00:00
Daniel Grana
eebc070fe4 tutorial: fix reference to scrapy-ctl. contributed by Patrick Mezard <pmezard@gmail.com>
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40758
2009-01-22 14:28:20 +00:00
Daniel Grana
8ff4dc4d02 RetryMiddleware: added ConnectionLost to retried exceptions
twisted >8.0 has a ConnectionClosed exception parent of ConnectionLost
and ConnectionDone, but twisted 2.5 hasn't.

I add ConnectionLost until we can move forward to twisted >8.0

this is the docstring of ConnectionLost, hopes it is self explanatory:
    """Connection to the other side was lost in a non-clean fashion"""

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40757
2009-01-22 03:19:53 +00:00
Pablo Hoffman
bef1fa967a removed Request.append_callback() method (it was just an alias to Request.deferred.addCallback). refs #48
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40756
2009-01-20 21:49:20 +00:00
Pablo Hoffman
677d0c366f renamed Request url_encoding constructor argument to encoding. added Request.body tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40755
2009-01-20 21:10:18 +00:00
Pablo Hoffman
12d0bd4dbb added Content-Length header population to Common downloader middleware
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40754
2009-01-20 21:00:56 +00:00
Ismael Carnales
1e002a0c98 make the install scrapy code steps a list, so it doesn't show as sepparate points in the doc overview (we need an style guiide)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40753
2009-01-19 14:01:57 +00:00
Ismael Carnales
ddf0366037 removed $ from commands in install it doesn't look so nice but it copy/paste compatible
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40752
2009-01-19 13:51:16 +00:00
Ismael Carnales
1cc95dc129 changed (and fixed) download links for windows libraries in install
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40751
2009-01-19 13:48:06 +00:00
Ismael Carnales
30e928568f corrected arch linux install information
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40750
2009-01-19 13:37:07 +00:00
Pablo Hoffman
4018f07d17 minor update to topics/settings.rst
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40749
2009-01-19 03:14:23 +00:00
Daniel Grana
a29fed066c docs: fix MailSender and Settings method references
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40748
2009-01-19 00:35:28 +00:00
Pablo Hoffman
3f13b388c7 added additional test to ResponseSoup extension
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40747
2009-01-18 19:31:35 +00:00
Pablo Hoffman
a413fc3bce some minor performance improvements in downloader handlers, added scrapy.optional_features set
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40746
2009-01-18 19:20:32 +00:00
Pablo Hoffman
0c9f7257a2 added Request.replace method, improved tests for replace/copy method in Request/Response classes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40745
2009-01-18 17:52:21 +00:00
Pablo Hoffman
314bbabb30 removed 'domain' from Request attributes and constructor arguments
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40744
2009-01-18 16:55:54 +00:00
Pablo Hoffman
d3c4d1f1e1 removed domain argument from Response constructor
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40743
2009-01-18 16:38:01 +00:00
Pablo Hoffman
db91d26871 removed 'domain' argument from Response objects constructor. besides being a required first constructor argument, it wasn't actually needed and made the Response consturctor more complex
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40742
2009-01-18 16:36:17 +00:00
Pablo Hoffman
654b49c86e added meta argument to Request & Response constructors
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40741
2009-01-17 23:57:53 +00:00
Pablo Hoffman
8ecc6808e0 removed Request.context attribute (use Request.meta instead)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40740
2009-01-17 23:09:53 +00:00
Pablo Hoffman
7e640da433 renamed to_string() Request and Response methods to httprepr(). removed __len__() from Request and Response
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40739
2009-01-17 22:11:54 +00:00
Pablo Hoffman
5dc1e7e5ca updated request/response reference doc
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40738
2009-01-17 21:05:08 +00:00
Pablo Hoffman
da6a24b662 More Request/Response cleanup:
* made status attribute an int
 * made engine use __str__ to display crawled requests
 * HTTP cache now inherits Response class to change __str__
 * added tests to check that the class is preserved on .copy() (for both Requests and Responses)
 * removed custom cached attribute (and passed to a Response.meta item)
 * removed some custom (and seldom used) methods from Response class: version(), info()
 * reinforced the privacy of the ResponseBody class, by renaming it to _ResponseBody and added a warning that it may be removed in the future
 * added tests for Request & Response to_string() methods
 * fixed minor (and harmless) bug in to_string() methods

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40737
2009-01-17 20:40:07 +00:00
Pablo Hoffman
b1745f49f1 removed deprecated original_url attribute from Response objects (it can be accessed through Response.request.url)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40736
2009-01-17 15:57:28 +00:00
Pablo Hoffman
7b545381bd changed log message and increased log level, when spiders return objects which are not Request or ScrapedItem
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40735
2009-01-17 15:22:59 +00:00
Pablo Hoffman
6ba6238c83 Response class:
* added meta and cache attributes to Response class
 * added tests for Response copy

Request class:
 * added meta attribute and renamed old _cache attribute to cache
 * moved depth and link_text to Request.meta
 * added tests for Request copy

* ResponseLibxml2 and ResponseSoup extensions now use Response.cache

Updated doc with changes

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40734
2009-01-15 03:24:48 +00:00
Pablo Hoffman
d26a54f541 added tests for ResponseSoup and ResponseLibxml2 extensions
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40733
2009-01-15 03:06:00 +00:00
Pablo Hoffman
604af8e74f doc; removed referer argument from Request constructor
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40732
2009-01-15 00:20:24 +00:00
Pablo Hoffman
2a7b41cdb2 removed referer argument from Request constructor. refs #48
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40731
2009-01-15 00:10:31 +00:00
Daniel Grana
9513b1f465 Remove response referneces from pipelines. refs #51
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40730
2009-01-14 23:59:45 +00:00
Pablo Hoffman
eef01a9fdd removed Request.method magic in Request constructor. refs #48
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40729
2009-01-14 23:50:23 +00:00
Pablo Hoffman
ae95c1df68 removed unused (and broken) prepend_callback Request method
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40728
2009-01-14 23:31:24 +00:00
Pablo Hoffman
3f89fc10b7 shortened some line widths
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40727
2009-01-14 23:23:10 +00:00
Pablo Hoffman
1272b138ea moved HTTP auth functionality out of Request class and into scrapy.utils.request.request_authenticate function, added tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40726
2009-01-14 22:02:58 +00:00