scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 14:44:08 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	7fe7c3f3b1	MemoryUsage extension: close the spiders (instead of stopping the engine) when the limit is exceeded, providing a descriptive reason for the close. Also fixed default value of MEMUSAGE_ENABLED setting to match the documentation.	2012-02-23 17:05:06 -02:00
Pablo Hoffman	7b8942a648	updated StackTraceDump extension doc	2012-02-16 15:14:17 -02:00
Pablo Hoffman	ea77342b55	updated versioning doc according to recent changes	2012-01-05 11:50:28 -02:00
Pablo Hoffman	0b0bce7f3c	scrapyd: added cancel.json and listjobs.json api methods to documentation	2012-01-05 11:23:25 -02:00
Pablo Hoffman	8f42633a94	scrapyd: added clarification about how to disable items feeds generation	2012-01-05 11:20:50 -02:00
Pablo Hoffman	dbda33efa6	scrapyd: added support for storing items by default Items are stored the same way as logs, in jsonlines format. Also renamed logs_to_keep setting to jobs_to_keep.	2012-01-03 23:08:54 -02:00
Pablo Hoffman	0be421fbf0	fixed reference to tutorial directory	2011-12-23 18:57:11 -02:00
Pablo Hoffman	41fd3c4f6c	doc: removed duplicated callback argument from Request.replace()	2011-12-23 15:55:46 -02:00
Pablo Hoffman	0eeff76227	fixed formatting of scrapyd doc	2011-12-20 03:18:37 -02:00
Daniel Graña	bcb31988f2	change tutorial to follow changes on dmoz site	2011-12-14 13:03:31 -02:00
Pablo Hoffman	992af8d38f	ubuntu repos: added support for oneiric release	2011-10-25 14:26:38 -02:00
Pablo Hoffman	c38c49d56a	fixed PickeItemExporter bug, added unittest, and added pickle to suported feed exports formats	2011-10-25 02:36:51 -02:00
Pablo Hoffman	8bdf288428	made scrapyd doc more version agnostic	2011-10-23 05:29:54 -02:00
Pablo Hoffman	ade5efdc61	added -o option to scrapy crawl, a convenient shortcut for using feed exports	2011-10-22 20:53:49 -02:00
Pablo Hoffman	431441cb52	updated documentation to remove references to old issue tracker and mercurial repos	2011-09-25 13:06:24 -03:00
Pablo Hoffman	ce03ccd4ec	updated documentation about DEPTH_PRIORITY and DFO/BFO crawls	2011-09-23 13:22:25 -03:00
Julien Duponchelle	b7c436343a	scrapy deploy support git version	2011-09-21 22:17:08 +02:00
Pablo Hoffman	ab1c9cfc56	removed documentation header notifying about other documentation versions, as that's provided by readthedocs already	2011-09-14 02:39:32 -03:00
Daniel Grana	5f1b1c05f8	Do not filter requests with dont_filter attribute set in OffsiteMiddleware	2011-09-08 15:18:10 -03:00
Pablo Hoffman	bff3d31469	scrapyd: updated schedule.json response format	2011-09-04 09:29:24 -03:00
Pablo Hoffman	a1dbc62b45	removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead)	2011-09-02 18:27:39 -03:00
Pablo Hoffman	40f7075f11	added initial documentation about suspend and resume crawls	2011-09-02 13:12:27 -03:00
Pablo Hoffman	27dd68a690	added SpiderState extension	2011-09-02 13:06:59 -03:00
Pablo Hoffman	6a31ab667d	minor fix to doc	2011-09-01 15:08:23 -03:00
Pablo Hoffman	d98b058c21	no longer recommend using labmda's in the doc, as they're not friendly with scheduler persistence	2011-09-01 15:06:49 -03:00
Pablo Hoffman	76af0cdd44	updated documentation and code to use -s instead of --set option	2011-09-01 14:35:37 -03:00
Pablo Hoffman	98b68ca89d	scrapyd: documented support for passing setting to spiders in schedule.json	2011-08-27 01:31:12 -03:00
Pablo Hoffman	5c6b0631e2	minor doc fix	2011-08-19 11:42:03 -03:00
Pablo Hoffman	9d97e73a24	fixed priority handling on the new scheduler so that it's backwards compatible (ie. bigger priorities are higher). also fixed a few documentation bugs related to requests priority	2011-08-19 08:26:41 -03:00
Pablo Hoffman	a3697421c0	some minor updates to documentation	2011-08-11 09:19:59 -03:00
Pablo Hoffman	5da6ffb57b	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-08-11 09:11:19 -03:00
Pablo Hoffman	bc2d2183e9	fixed import in doc	2011-08-11 09:11:08 -03:00
Pablo Hoffman	19e6da59d8	added new downloader middleware: ChunkedTransferMiddleware	2011-08-09 03:03:25 -03:00
Pablo Hoffman	984be35461	Some telnet console changes: * renamed manager alias to crawler * added aliases: spider, slot * fixed est() function	2011-08-08 15:01:08 -03:00
Pablo Hoffman	f7c0aeccc6	added note about engine_started signal	2011-08-07 03:57:09 -03:00
Pablo Hoffman	9f60c27612	added setting to support disabling DNS cache: DNSCACHE_ENABLED	2011-08-05 20:41:59 -03:00
Pablo Hoffman	cb95d7a5af	added marshal to formats supported by feed exports	2011-08-03 16:16:48 -03:00
Pablo Hoffman	549725215e	Initial support for a persistent scheduler, to support pausing and resuming crawls. * requests are serialized (using marshal by default) and stored on disk, using one queue per priority * request priorities must be integers now * breadh-first and depth-first crawling orders can now be configured through a new DEPTH_PRIORITY setting (see doc). backwards compatilibty with SCHEDULER_ORDER was kept. * requests that can't be serialized (for example, non serializable callbacks) are always kept in memory queues * adapted crawl spider to work with persitent scheduler	2011-08-02 11:57:55 -03:00
Pablo Hoffman	f354a49d0f	added FAQ about preventing bots getting banned	2011-07-28 00:40:30 -03:00
Pablo Hoffman	ce7a787970	Big downloader refactoring to support real concurrency limits per domain/ip, instead of global limits per spider which were a bit useless. This removes the setting CONCURRENT_REQUESTS_PER_SPIDER and adds thre new settings: * CONCURRENT_REQUESTS * CONCURRENT_REQUESTS_PER_DOMAIN * CONCURRENT_REQUESTS_PER_IP (overrides per domain) The AutoThrottle extension had to be disabled, but will be ported and re-enabled soon.	2011-07-27 13:38:09 -03:00
Pablo Hoffman	c59340150f	Added cached DNS resolver based on old caching resolver extension from scrapy.contrib.resolver. This new one is not an extension, it comes builtin and always enabled.	2011-07-27 03:45:15 -03:00
Pablo Hoffman	2ac08a713d	downloader: renamed SpiderInfo to Slot, for consistency with engine and scraper names	2011-07-22 02:06:10 -03:00
Pablo Hoffman	0e008268e1	removed SimpledbStatsCollector from scrapy code, it was moved to https://github.com/scrapinghub/scaws	2011-07-20 10:38:16 -03:00
Pablo Hoffman	b6b0a54d9f	removed FAQ entry	2011-07-20 01:31:36 -03:00
Pablo Hoffman	e3f640c7bf	added FAQ entry about scrapy deploy issue on Mac + Python 2.5	2011-07-19 19:53:32 -03:00
Pablo Hoffman	84f518fc5e	More core changes: * removed execution queue (replaced by newer spider queues) * added real support for returning iterators in Spider.start_requests() * removed support for passing urls to 'scrapy crawl' command	2011-07-15 15:18:39 -03:00
Pablo Hoffman	dbad1373f1	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-07-13 18:44:54 -03:00
Pablo Hoffman	18cb4ff1d8	added natty to list of supporte ubuntu distros	2011-07-13 18:43:52 -03:00
Pablo Hoffman	39a2ea97c8	redirect mw: added REDIRECT_ENABLED setting and documented the other settings	2011-07-13 14:18:15 -03:00
Pablo Hoffman	541ed3913b	retry middleware: added RETRY_ENABLED setting and documented the other settings more properly, also improved messages when no longer retrying requests	2011-07-13 11:55:05 -03:00

1 2 3 4 5 ...

531 Commits