Pablo Hoffman
cb9c937f50
minor code rearrangement for consistency
2011-07-26 18:49:01 -03:00
Pablo Hoffman
dd020e184f
removed rather useless (and some deprecated) docstrings
2011-07-26 18:45:51 -03:00
Pablo Hoffman
70493c754d
retry middleware: added TCPTimedOutError to exceptions to retry
2011-07-25 14:52:24 -03:00
Pablo Hoffman
6f0e492390
fixed bug with scraper KeyError's on some ConnectionLost errors. closes #334
2011-07-25 12:24:26 -03:00
Pablo Hoffman
6e50f94406
engine: make it more explicit that we don't need to return the value of nextcall.schedule()
2011-07-25 10:47:54 -03:00
Pablo Hoffman
ea3bf6d95d
more core refactoring including moving engine next request call logic to a separate class
2011-07-25 10:46:00 -03:00
Pablo Hoffman
209ecdf471
updated settings.py for djangoitem tests to new django multi-db format
2011-07-25 00:49:54 -03:00
Pablo Hoffman
2ac08a713d
downloader: renamed SpiderInfo to Slot, for consistency with engine and scraper names
2011-07-22 02:06:10 -03:00
Pablo Hoffman
d6b83fee3e
scraper: renamed SpiderInfo to Slot, for consistency with engine names
2011-07-22 02:01:05 -03:00
Pablo Hoffman
f19442425a
forked UnicodeDammit from BeautifulSoup to explicitly disable usage of chardet library
2011-07-20 17:41:53 -03:00
Pablo Hoffman
7d18fe18e2
added missing import
2011-07-20 17:05:21 -03:00
Pablo Hoffman
0e008268e1
removed SimpledbStatsCollector from scrapy code, it was moved to https://github.com/scrapinghub/scaws
2011-07-20 10:38:16 -03:00
Pablo Hoffman
b6b0a54d9f
removed FAQ entry
2011-07-20 01:31:36 -03:00
Pablo Hoffman
cc6ef3beb2
engine: renamed slot.requests to slot.start_requests
2011-07-20 01:18:34 -03:00
Pablo Hoffman
de0bf22010
speed up consumption of spider start requests, while the engine has capacity to process them
2011-07-20 01:18:00 -03:00
Pablo Hoffman
9f742fc97c
removed unused import from 'crawl' spider template
2011-07-20 01:04:16 -03:00
Pablo Hoffman
e3f640c7bf
added FAQ entry about scrapy deploy issue on Mac + Python 2.5
2011-07-19 19:53:32 -03:00
Pablo Hoffman
75e2c3eb33
moved spider queues to scrapyd
...
--HG--
rename : scrapy/spiderqueue.py => scrapyd/spiderqueue.py
rename : scrapy/tests/test_spiderqueue.py => scrapyd/tests/test_spiderqueue.py
2011-07-19 19:39:27 -03:00
Pablo Hoffman
d97d6d20c6
removed no longer used settings
2011-07-19 19:31:19 -03:00
Pablo Hoffman
442c0bdc18
removed SQSSpiderQueue from base scrapy code, it was moved to https://github.com/scrapinghub/scaws
2011-07-19 14:11:55 -03:00
Daniel Grana
bdd627fe1d
allow overriding store_uri by extending ImagePipeline
...
--HG--
extra : rebase_source : 5c561b8282f733ab0f26607059dd96d858154426
2011-07-15 15:17:38 -03:00
Pablo Hoffman
84f518fc5e
More core changes:
...
* removed execution queue (replaced by newer spider queues)
* added real support for returning iterators in Spider.start_requests()
* removed support for passing urls to 'scrapy crawl' command
2011-07-15 15:18:39 -03:00
Daniel Grana
4dadeb7ccb
fix issue with responses preventing spiders to be idle in engine counts
2011-07-15 13:57:24 -03:00
Pablo Hoffman
d207c0afe4
fixed bug in engine.download() method
2011-07-15 12:55:07 -03:00
Pablo Hoffman
830255eea3
removed deprecated commands: queue, runserver
2011-07-14 01:41:24 -03:00
Pablo Hoffman
359129adf9
fixed python pass handling in cmdline/commands tests so that it works with new w3lib library
2011-07-14 01:40:31 -03:00
Pablo Hoffman
dbad1373f1
Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12
2011-07-13 18:44:54 -03:00
Pablo Hoffman
18cb4ff1d8
added natty to list of supporte ubuntu distros
2011-07-13 18:43:52 -03:00
Pablo Hoffman
39a2ea97c8
redirect mw: added REDIRECT_ENABLED setting and documented the other settings
2011-07-13 14:18:15 -03:00
Pablo Hoffman
0b6c7ce9b8
improved download errors propagation to the spiders, and removed no longer needed code to simplify
2011-07-13 14:10:05 -03:00
Pablo Hoffman
804c0279ec
setup.py: only add lxml requirement if libxml2 is not available
2011-07-13 13:04:42 -03:00
Pablo Hoffman
541ed3913b
retry middleware: added RETRY_ENABLED setting and documented the other settings more properly, also improved messages when no longer retrying requests
2011-07-13 11:55:05 -03:00
Pablo Hoffman
763f3dc628
minor update to doc
2011-07-12 19:56:39 -03:00
Pablo Hoffman
bfda9ec319
added clarification about scrapy versioning including the recently adopted odd/even versioning scheme
...
--HG--
rename : docs/api-stability.rst => docs/versioning.rst
2011-07-12 19:53:23 -03:00
Pablo Hoffman
4fde1ef94d
added CloseSpider exception, to manually close spiders
2011-07-12 14:24:10 -03:00
Pablo Hoffman
4bb409923c
improved encoding detection by adding support for HTML5 meta charset
2011-07-12 09:52:50 -03:00
Pablo Hoffman
67213ce673
logformatter: support non-ascii characters in custom implementations of Item.__str__()
2011-07-12 01:16:06 -03:00
Pablo Hoffman
31a375bde7
Close the scheduler after closing the scraper and downloader. This shouldn't have any real effect in practice, but it feels more appropiate to close the components in this order
2011-07-10 04:18:50 -03:00
Pablo Hoffman
90b1ae694c
get_engine_status(): preserve test order defined in code
2011-07-10 04:10:20 -03:00
Pablo Hoffman
409aaade0b
Refactored close spider behaviour so that the engine now waits for all
...
downloading (and enqueued for download) requests to finish and their responses
to be processed in the scraper/spiders, before closing the spider.
This will be required in the future to avoid loosing requests when we add
scheduler persistence and it's also a more correct behaviour overall.
The closing process has also been refactored to remove unneeded closing state
from downloader and leave it only in the engine.
Finally, some unused methods has been removed too, like spider_is_open() for
engine and scheduler.
2011-07-08 11:40:19 -03:00
Pablo Hoffman
574b070bb4
fixed minor bug in sitemap parser
2011-07-08 09:33:56 -03:00
Pablo Hoffman
ab9b786791
Updated CAMELCASE_EXCLUDE_CHARS to also exclude digits (patch by Adam Wentz)
2011-07-06 20:11:11 -03:00
Pablo Hoffman
7abc4b4c5a
fixed typo
2011-07-06 01:35:21 -03:00
Pablo Hoffman
949e11ee31
SitemapSpider: added support for parsing gzipped sitemaps (patch contributed by Rolando Espinoza)
2011-07-06 01:33:46 -03:00
Pablo Hoffman
5707051352
fixed httpcompression middleware tests
2011-07-04 21:31:05 -03:00
Pablo Hoffman
81fbe8c9a4
added x-gzip to supported encoding declarations in httpcompression middleware
2011-07-04 21:27:24 -03:00
Pablo Hoffman
a5223881ee
removed debugging code
2011-06-30 02:28:53 -03:00
Pablo Hoffman
5275343fa1
use handle_httpstatus_all=True in scrapy shell
2011-06-28 17:27:40 -03:00
Pablo Hoffman
7cd559eca5
SitemapSpider: ignore non-xml responses. fixes #331
2011-06-27 10:02:16 -03:00
Pablo Hoffman
db5cae7c03
SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330
2011-06-23 18:18:29 -03:00