1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 23:44:01 +00:00

858 Commits

Author SHA1 Message Date
Daniel Grana
eebc070fe4 tutorial: fix reference to scrapy-ctl. contributed by Patrick Mezard <pmezard@gmail.com>
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40758
2009-01-22 14:28:20 +00:00
Daniel Grana
8ff4dc4d02 RetryMiddleware: added ConnectionLost to retried exceptions
twisted >8.0 has a ConnectionClosed exception parent of ConnectionLost
and ConnectionDone, but twisted 2.5 hasn't.

I add ConnectionLost until we can move forward to twisted >8.0

this is the docstring of ConnectionLost, hopes it is self explanatory:
    """Connection to the other side was lost in a non-clean fashion"""

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40757
2009-01-22 03:19:53 +00:00
Pablo Hoffman
bef1fa967a removed Request.append_callback() method (it was just an alias to Request.deferred.addCallback). refs #48
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40756
2009-01-20 21:49:20 +00:00
Pablo Hoffman
677d0c366f renamed Request url_encoding constructor argument to encoding. added Request.body tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40755
2009-01-20 21:10:18 +00:00
Pablo Hoffman
12d0bd4dbb added Content-Length header population to Common downloader middleware
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40754
2009-01-20 21:00:56 +00:00
Ismael Carnales
1e002a0c98 make the install scrapy code steps a list, so it doesn't show as sepparate points in the doc overview (we need an style guiide)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40753
2009-01-19 14:01:57 +00:00
Ismael Carnales
ddf0366037 removed $ from commands in install it doesn't look so nice but it copy/paste compatible
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40752
2009-01-19 13:51:16 +00:00
Ismael Carnales
1cc95dc129 changed (and fixed) download links for windows libraries in install
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40751
2009-01-19 13:48:06 +00:00
Ismael Carnales
30e928568f corrected arch linux install information
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40750
2009-01-19 13:37:07 +00:00
Pablo Hoffman
4018f07d17 minor update to topics/settings.rst
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40749
2009-01-19 03:14:23 +00:00
Daniel Grana
a29fed066c docs: fix MailSender and Settings method references
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40748
2009-01-19 00:35:28 +00:00
Pablo Hoffman
3f13b388c7 added additional test to ResponseSoup extension
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40747
2009-01-18 19:31:35 +00:00
Pablo Hoffman
a413fc3bce some minor performance improvements in downloader handlers, added scrapy.optional_features set
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40746
2009-01-18 19:20:32 +00:00
Pablo Hoffman
0c9f7257a2 added Request.replace method, improved tests for replace/copy method in Request/Response classes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40745
2009-01-18 17:52:21 +00:00
Pablo Hoffman
314bbabb30 removed 'domain' from Request attributes and constructor arguments
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40744
2009-01-18 16:55:54 +00:00
Pablo Hoffman
d3c4d1f1e1 removed domain argument from Response constructor
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40743
2009-01-18 16:38:01 +00:00
Pablo Hoffman
db91d26871 removed 'domain' argument from Response objects constructor. besides being a required first constructor argument, it wasn't actually needed and made the Response consturctor more complex
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40742
2009-01-18 16:36:17 +00:00
Pablo Hoffman
654b49c86e added meta argument to Request & Response constructors
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40741
2009-01-17 23:57:53 +00:00
Pablo Hoffman
8ecc6808e0 removed Request.context attribute (use Request.meta instead)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40740
2009-01-17 23:09:53 +00:00
Pablo Hoffman
7e640da433 renamed to_string() Request and Response methods to httprepr(). removed __len__() from Request and Response
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40739
2009-01-17 22:11:54 +00:00
Pablo Hoffman
5dc1e7e5ca updated request/response reference doc
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40738
2009-01-17 21:05:08 +00:00
Pablo Hoffman
da6a24b662 More Request/Response cleanup:
* made status attribute an int
 * made engine use __str__ to display crawled requests
 * HTTP cache now inherits Response class to change __str__
 * added tests to check that the class is preserved on .copy() (for both Requests and Responses)
 * removed custom cached attribute (and passed to a Response.meta item)
 * removed some custom (and seldom used) methods from Response class: version(), info()
 * reinforced the privacy of the ResponseBody class, by renaming it to _ResponseBody and added a warning that it may be removed in the future
 * added tests for Request & Response to_string() methods
 * fixed minor (and harmless) bug in to_string() methods

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40737
2009-01-17 20:40:07 +00:00
Pablo Hoffman
b1745f49f1 removed deprecated original_url attribute from Response objects (it can be accessed through Response.request.url)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40736
2009-01-17 15:57:28 +00:00
Pablo Hoffman
7b545381bd changed log message and increased log level, when spiders return objects which are not Request or ScrapedItem
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40735
2009-01-17 15:22:59 +00:00
Pablo Hoffman
6ba6238c83 Response class:
* added meta and cache attributes to Response class
 * added tests for Response copy

Request class:
 * added meta attribute and renamed old _cache attribute to cache
 * moved depth and link_text to Request.meta
 * added tests for Request copy

* ResponseLibxml2 and ResponseSoup extensions now use Response.cache

Updated doc with changes

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40734
2009-01-15 03:24:48 +00:00
Pablo Hoffman
d26a54f541 added tests for ResponseSoup and ResponseLibxml2 extensions
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40733
2009-01-15 03:06:00 +00:00
Pablo Hoffman
604af8e74f doc; removed referer argument from Request constructor
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40732
2009-01-15 00:20:24 +00:00
Pablo Hoffman
2a7b41cdb2 removed referer argument from Request constructor. refs #48
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40731
2009-01-15 00:10:31 +00:00
Daniel Grana
9513b1f465 Remove response referneces from pipelines. refs #51
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40730
2009-01-14 23:59:45 +00:00
Pablo Hoffman
eef01a9fdd removed Request.method magic in Request constructor. refs #48
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40729
2009-01-14 23:50:23 +00:00
Pablo Hoffman
ae95c1df68 removed unused (and broken) prepend_callback Request method
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40728
2009-01-14 23:31:24 +00:00
Pablo Hoffman
3f89fc10b7 shortened some line widths
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40727
2009-01-14 23:23:10 +00:00
Pablo Hoffman
1272b138ea moved HTTP auth functionality out of Request class and into scrapy.utils.request.request_authenticate function, added tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40726
2009-01-14 22:02:58 +00:00
Andres Moreira
8fc4719d0c Added dns cache support for the crawler, improving the performance of the page download because this reduce the dns lookups.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40725
2009-01-14 18:59:52 +00:00
samus_
7be6ff0727 typo
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40724
2009-01-14 12:12:37 +00:00
samus_
baf9a8d846 renamed expiration setting to the same used by the image pipeline
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40723
2009-01-14 12:09:06 +00:00
Pablo Hoffman
0ea37f51db * moved request fingerprinting from Request class to scrapy.utils.request - closes #50
* cleaned up fingerprint tests suite (only left relevant tests)

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40722
2009-01-14 01:17:40 +00:00
Pablo Hoffman
519458bdae added documentation for settings: ENGINE_DEBUG, DOWNLOADER_DEBUG
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40721
2009-01-14 00:19:22 +00:00
Pablo Hoffman
64d1f67c57 decreased logging level of RequestLimitMiddleware to DEBUG
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40720
2009-01-13 21:47:49 +00:00
Pablo Hoffman
4ed811b4d3 added DOWNLOAD_DELAY to default_settings and documentation, fixed some typo errors in settings reference
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40719
2009-01-13 14:43:38 +00:00
Pablo Hoffman
0e78003c92 removed my email from CLOSEDOMAIN_NOTIFY setting
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40718
2009-01-13 13:50:51 +00:00
Pablo Hoffman
e316722bb1 updated doc: ref/emails.rst and topics/downloader-middleware.rst
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40717
2009-01-13 11:55:20 +00:00
Pablo Hoffman
468bfeb278 removed unused imports
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40716
2009-01-13 10:10:19 +00:00
Pablo Hoffman
95d99d51b9 renamde old SchedulerStats web console module to ScheduleQueue and made it work with the new PriorityQueue
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40715
2009-01-13 02:49:50 +00:00
Pablo Hoffman
deb960526b removed unused (Django) classes from scrapy.utils.datatypes: MergeDict, SortedDict, DotExpandedDict, FileDict. And also removed unused class gzStringIO
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40714
2009-01-13 01:46:45 +00:00
Pablo Hoffman
ff637d9a0a added __len__ to PriorityQueue/Stack, and changed __iter__ implementation to return (item, priority) tuples, added more test cases
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40713
2009-01-13 01:37:49 +00:00
Pablo Hoffman
2434000cda * ported PriorityQueue and PriorityStack to use heapq instead of queue.Queue +
bisect which was up to 5x slower!
 * added test case for PriorityStack (only PriorityQueue had before)
 * changed Priority{Stack,Queue} API to just push(), pop(), and made them
   iterable

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40712
2009-01-13 01:14:40 +00:00
samus_
8b28d365b1 removed extra return
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40711
2009-01-12 22:43:10 +00:00
Andres Moreira
eca60c7c4d Small change in canonicalize_url improved its performance a bit
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40710
2009-01-12 17:18:54 +00:00
Pablo Hoffman
30e44c9a58 added settings: REQUEST_HEADER_ACCEPT, REQUEST_HEADER_ACCEPT_LANGUAGE. started built-in downloader middleware reference
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40709
2009-01-12 00:53:37 +00:00