Pablo Hoffman
18cb4ff1d8
added natty to list of supporte ubuntu distros
2011-07-13 18:43:52 -03:00
Pablo Hoffman
39a2ea97c8
redirect mw: added REDIRECT_ENABLED setting and documented the other settings
2011-07-13 14:18:15 -03:00
Pablo Hoffman
0b6c7ce9b8
improved download errors propagation to the spiders, and removed no longer needed code to simplify
2011-07-13 14:10:05 -03:00
Pablo Hoffman
804c0279ec
setup.py: only add lxml requirement if libxml2 is not available
2011-07-13 13:04:42 -03:00
Pablo Hoffman
541ed3913b
retry middleware: added RETRY_ENABLED setting and documented the other settings more properly, also improved messages when no longer retrying requests
2011-07-13 11:55:05 -03:00
Pablo Hoffman
763f3dc628
minor update to doc
2011-07-12 19:56:39 -03:00
Pablo Hoffman
bfda9ec319
added clarification about scrapy versioning including the recently adopted odd/even versioning scheme
...
--HG--
rename : docs/api-stability.rst => docs/versioning.rst
2011-07-12 19:53:23 -03:00
Pablo Hoffman
4fde1ef94d
added CloseSpider exception, to manually close spiders
2011-07-12 14:24:10 -03:00
Pablo Hoffman
4bb409923c
improved encoding detection by adding support for HTML5 meta charset
2011-07-12 09:52:50 -03:00
Pablo Hoffman
67213ce673
logformatter: support non-ascii characters in custom implementations of Item.__str__()
2011-07-12 01:16:06 -03:00
Pablo Hoffman
31a375bde7
Close the scheduler after closing the scraper and downloader. This shouldn't have any real effect in practice, but it feels more appropiate to close the components in this order
2011-07-10 04:18:50 -03:00
Pablo Hoffman
90b1ae694c
get_engine_status(): preserve test order defined in code
2011-07-10 04:10:20 -03:00
Pablo Hoffman
409aaade0b
Refactored close spider behaviour so that the engine now waits for all
...
downloading (and enqueued for download) requests to finish and their responses
to be processed in the scraper/spiders, before closing the spider.
This will be required in the future to avoid loosing requests when we add
scheduler persistence and it's also a more correct behaviour overall.
The closing process has also been refactored to remove unneeded closing state
from downloader and leave it only in the engine.
Finally, some unused methods has been removed too, like spider_is_open() for
engine and scheduler.
2011-07-08 11:40:19 -03:00
Pablo Hoffman
574b070bb4
fixed minor bug in sitemap parser
2011-07-08 09:33:56 -03:00
Pablo Hoffman
ab9b786791
Updated CAMELCASE_EXCLUDE_CHARS to also exclude digits (patch by Adam Wentz)
2011-07-06 20:11:11 -03:00
Pablo Hoffman
7abc4b4c5a
fixed typo
2011-07-06 01:35:21 -03:00
Pablo Hoffman
949e11ee31
SitemapSpider: added support for parsing gzipped sitemaps (patch contributed by Rolando Espinoza)
2011-07-06 01:33:46 -03:00
Pablo Hoffman
5707051352
fixed httpcompression middleware tests
2011-07-04 21:31:05 -03:00
Pablo Hoffman
81fbe8c9a4
added x-gzip to supported encoding declarations in httpcompression middleware
2011-07-04 21:27:24 -03:00
Pablo Hoffman
a5223881ee
removed debugging code
2011-06-30 02:28:53 -03:00
Pablo Hoffman
5275343fa1
use handle_httpstatus_all=True in scrapy shell
2011-06-28 17:27:40 -03:00
Pablo Hoffman
7cd559eca5
SitemapSpider: ignore non-xml responses. fixes #331
2011-06-27 10:02:16 -03:00
Pablo Hoffman
db5cae7c03
SitemapSpider: added support for filtering which sitemaps to follow (patch contributed by Rolando Espinoza). closes #330
2011-06-23 18:18:29 -03:00
Pablo Hoffman
d97a9d8731
improved errors of ItemLoader.load_item() so that it shows the field name and value of the output processor that failed
2011-06-23 12:39:51 -03:00
Pablo Hoffman
fbafb295e8
removed DEFAULT_ITEM_CLASS setting from settings in new project template
2011-06-23 11:34:28 -03:00
Pablo Hoffman
d197895d8f
removed deprecated code
2011-06-21 18:06:04 -03:00
Pablo Hoffman
d8775a7575
removed old deprecated FileExportPipeline
2011-06-21 18:01:05 -03:00
Pablo Hoffman
0305ffdd6c
sitemaps: support trailing spaces in <loc> elements
2011-06-20 21:22:16 -03:00
Pablo Hoffman
2e74ccaa7e
dropped InitSpider super class from CrawlSpider and Feed spiders, to avoid potentially confusing code, as it's also not needed
2011-06-20 13:10:13 -03:00
Pablo Hoffman
03bc218987
fixed bug in get_engine_status() function
2011-06-20 11:09:01 -03:00
Pablo Hoffman
03a92a8b03
slightly improved version of scrapyd script
2011-06-20 11:04:38 -03:00
Pablo Hoffman
5de5cac43e
added quick script script to launch scrapyd
2011-06-20 10:48:34 -03:00
Pablo Hoffman
841007b5c5
added envvar SCRAPY_VERSION_FROM_HG=1 to extras/makedeb.py script
2011-06-18 03:31:47 -03:00
Pablo Hoffman
7e5e00cea5
Added public engine.download() method to use the downloader bypassing the scheduler. Changed media pipeline to use engine.download() to prevent deadlocks.
2011-06-18 02:52:21 -03:00
Pablo Hoffman
dd90e83eae
get_engine_status(): also look up open spiders in scraper component
2011-06-18 02:48:01 -03:00
Pablo Hoffman
e575e015c1
LogStats extension: fixed KeyError bug caused with spiders that don't scrape any items
2011-06-17 16:50:02 -03:00
Pablo Hoffman
cfc93ba9db
added SitemapSpider to basic spider assertion tests
2011-06-16 10:20:28 -03:00
Pablo Hoffman
25b0ca3125
minor imports sort out
2011-06-16 10:19:27 -03:00
Pablo Hoffman
59acb129e5
scrapyd activate_egg(): don't override SCRAPY_SETTINGS_MODULE envvar if already set
2011-06-15 19:35:03 -03:00
Pablo Hoffman
cd52a7c83b
removed debugging print
2011-06-15 12:35:54 -03:00
Pablo Hoffman
57c43fdce6
added SitemapSpider, with tests and doc
2011-06-15 11:54:34 -03:00
Pablo Hoffman
91dc46539f
added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)
2011-06-14 00:50:05 -03:00
Pablo Hoffman
d2a9c0fdcd
issue deprecation warning when using CLOSESPIDER_ITEMPASSED setting
2011-06-13 22:34:01 -03:00
Pablo Hoffman
841e9913db
renamed CLOSESPIDER_ITEMPASSED setting to CLOSESPIDER_ITEMCOUNT, to follow the refactoring done in r2630
2011-06-13 16:58:51 -03:00
Pablo Hoffman
5dea6be513
use log for dumping stack trace and engine status, in StackTraceDump extension
2011-06-13 14:28:03 -03:00
Pablo Hoffman
72cf5a97c3
added -e|--edit option to genspider command
2011-06-13 09:54:06 -03:00
Pablo Hoffman
80b557849a
fixed test broken in previous commit
2011-06-12 02:55:21 -03:00
Pablo Hoffman
0d5399d0bf
fixed scrapyd tests on win32. closes #295
2011-06-12 02:46:41 -03:00
Pablo Hoffman
c434d11f09
added Darian Moody to AUTHORS
2011-06-12 01:42:30 -03:00
Darian Moody
6873d5b952
Added to tests for last commit; now tests to make sure
...
custom primary keys are editable from the Scrapy Item.
---
scrapy/tests/test_djangoitem/__init__.py | 15 ++++++++++++++-
scrapy/tests/test_djangoitem/models.py | 7 +++++++
2 files changed, 21 insertions(+), 1 deletions(-)
2011-06-12 01:41:10 -03:00