scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 01:23:57 +00:00

Author	SHA1	Message	Date
Daniel Graña	ccde3317d7	Merge pull request #816 from Curita/api-cleanup GSoC API cleanup	2014-09-01 21:55:36 -03:00
Daniel Graña	a9292cfab7	jsonrpc webservice moved to https://github.com/scrapy/scrapy-jsonrpc repository	2014-08-15 23:28:13 -03:00
Julia Medina	d7038b2a13	SpiderManager interface cleanup	2014-08-12 14:02:55 -03:00
Julia Medina	8a3a6236d9	Settings topic updated	2014-06-10 11:26:50 -03:00
Alexander Chekunkov	fa5a67729b	DOWNLOADER setting	2014-06-02 13:11:02 +03:00
Alexander Chekunkov	1fba64d34e	DOWNLOADER setting	2014-06-02 13:05:22 +03:00
Alexander Chekunkov	4aa6179af7	added short RFPDupeFilter.request_fingerprint interface description	2014-04-26 15:46:53 +03:00
Alexander Chekunkov	baaa077725	added note about RFPDupeFilter.request_fingerprint overriding to the settings documentation	2014-04-26 15:43:12 +03:00
Julia Medina	b9e2aad874	Doc for disabling download handler	2014-03-12 23:21:33 -03:00
Pablo Hoffman	6d8b7b29ef	remove unused setting: DOWNLOADER_DEBUG	2014-02-27 12:16:05 -02:00
Paul Tremberth	41765ca18d	DupeFilter: add setting for verbose logging + stats counter for filtered requests	2014-02-17 13:42:42 +01:00
Rolando Espinoza	28f946b05f	DOC Use pipelines module name instead of pipieline following default project files.	2014-02-15 11:01:26 -04:00
Mikhail Korobov	9a999daa2a	DOWNLOAD_DELAY docs clarification: * delay is enforced per website, not per spider; * document download_delay attribute (it was previously documented only in FAQ about 999 error codes); * document how CONCURRENT_REQUESTS_PER_IP affects download delays.	2013-12-28 06:30:34 +06:00
Pablo Hoffman	e8ee449a2a	Merge pull request #432 from darkrho/crawl-url Removed URL reference in crawl command and .tld suffix in docs for spider names	2013-10-21 09:40:58 -07:00
Rolando Espinoza La fuente	34543c2b2e	DOCS removed .tld suffix for spider names for the sake of consistency.	2013-10-19 23:03:20 -04:00
Pablo Hoffman	12280c2a95	fix sphinx references in doc	2013-09-25 15:13:17 -03:00
Pablo Hoffman	fc388f4636	Make ITEM_PIPELINE setting a dict This is for consistency with how spider and downloader middlewares are defined. ITEM_PIPELINE_BASE was also added and both remain empty. Backwards compatibility is kept (with a warning) with list-based ITEM_PIPELINES.	2013-09-23 17:50:43 -03:00
Pablo Hoffman	22edc44c6c	doc: remove links to diveintopython.org, which is no longer available. closes #246	2013-02-14 11:09:40 -02:00
Chris Tilden	aae6aed4fb	fixes spelling errors in documentation	2013-01-22 14:52:18 -08:00
Daniel Graña	076ba40404	update DOWNLOADER_MIDDLEWARES_BASE setting documentation	2013-01-08 10:50:27 -02:00
Pablo Hoffman	7a7c5d1334	removed reference to global scrapy stats from settings doc	2012-11-03 17:05:01 -02:00
Pablo Hoffman	1f89eb59fe	fixed doc reference to topics-contracts	2012-10-09 16:02:12 -02:00
Pablo Hoffman	c380910b40	Merge pull request #167 from alexcepoi/sep-017 Spider contracts (SEP-017)	2012-09-28 13:57:07 -07:00
Pablo Hoffman	b46b5a6ef0	Documented AutoThrottle extension and added to extensions available by default. Also deprecated concurrency and delay settings, in favour of using the standard Scrapy ones.	2012-09-20 18:52:57 -03:00
Pablo Hoffman	c7f8219901	- removed scrapy.conf singleton from scrapy.log, scrapy.responsetypes, scrapy.http.response.text, scrapy.selector - fixed bug with scrapy.conf.settings backwards compatibility support - added facility to notify (and provide some guidelines) about deprecated/obsolete settings	2012-09-19 03:03:34 -03:00
Alex Cepoi	bf8dc61fb7	SEP-017 contracts: pretty-printing and docs	2012-09-10 23:17:27 +02:00
Pablo Hoffman	babfc6e79b	Updated documentation after singleton removal changes. Also removed some unused code and made some minor additional refactoring.	2012-08-28 18:35:57 -03:00
Pablo Hoffman	27018fced7	changed default user agent to Scrapy/0.15 (+http://scrapy.org ) and removed no longer needed BOT_VERSION setting	2012-03-23 13:45:21 -03:00
Pablo Hoffman	35fb01156e	removed some obsolete remaining code related to sqlite support in scrapy	2012-03-16 11:55:55 -03:00
Pablo Hoffman	ce03ccd4ec	updated documentation about DEPTH_PRIORITY and DFO/BFO crawls	2011-09-23 13:22:25 -03:00
Pablo Hoffman	a1dbc62b45	removed CONCURRENT_SPIDERS setting (use scrapyd maxproc instead)	2011-09-02 18:27:39 -03:00
Pablo Hoffman	27dd68a690	added SpiderState extension	2011-09-02 13:06:59 -03:00
Pablo Hoffman	76af0cdd44	updated documentation and code to use -s instead of --set option	2011-09-01 14:35:37 -03:00
Pablo Hoffman	9d97e73a24	fixed priority handling on the new scheduler so that it's backwards compatible (ie. bigger priorities are higher). also fixed a few documentation bugs related to requests priority	2011-08-19 08:26:41 -03:00
Pablo Hoffman	a3697421c0	some minor updates to documentation	2011-08-11 09:19:59 -03:00
Pablo Hoffman	19e6da59d8	added new downloader middleware: ChunkedTransferMiddleware	2011-08-09 03:03:25 -03:00
Pablo Hoffman	9f60c27612	added setting to support disabling DNS cache: DNSCACHE_ENABLED	2011-08-05 20:41:59 -03:00
Pablo Hoffman	549725215e	Initial support for a persistent scheduler, to support pausing and resuming crawls. * requests are serialized (using marshal by default) and stored on disk, using one queue per priority * request priorities must be integers now * breadh-first and depth-first crawling orders can now be configured through a new DEPTH_PRIORITY setting (see doc). backwards compatilibty with SCHEDULER_ORDER was kept. * requests that can't be serialized (for example, non serializable callbacks) are always kept in memory queues * adapted crawl spider to work with persitent scheduler	2011-08-02 11:57:55 -03:00
Pablo Hoffman	ce7a787970	Big downloader refactoring to support real concurrency limits per domain/ip, instead of global limits per spider which were a bit useless. This removes the setting CONCURRENT_REQUESTS_PER_SPIDER and adds thre new settings: * CONCURRENT_REQUESTS * CONCURRENT_REQUESTS_PER_DOMAIN * CONCURRENT_REQUESTS_PER_IP (overrides per domain) The AutoThrottle extension had to be disabled, but will be ported and re-enabled soon.	2011-07-27 13:38:09 -03:00
Pablo Hoffman	91dc46539f	added LogStats extension for periodically logging basic stats (like crawled pages and scraped items)	2011-06-14 00:50:05 -03:00
Pablo Hoffman	9d9c8877da	added 'scrapy edit' command	2011-06-05 22:02:56 -03:00
Pablo Hoffman	2fa0f75f2d	added COOKIES_ENABLED setting to support disabling the cookies middleware	2011-05-27 00:35:34 -03:00
Pablo Hoffman	503f302010	removed remaining references to scheduler middleware from doc, as it will be removed on next release	2011-05-18 19:48:48 -03:00
Pablo Hoffman	3fd17432cf	fixed outdated documentation	2011-05-18 14:46:20 -03:00
Pablo Hoffman	495152bd50	disabled verbose depth stats collection by default, added DEPTH_STATS_VERBOSE setting to enable it	2011-05-18 11:04:48 -03:00
Pablo Hoffman	accb6ed830	dump stats to log by default (ie. change default value of STATS_DUMP to True)	2011-05-17 22:42:05 -03:00
Pablo Hoffman	b76c5c597f	* Added support for project data storage (closes #276 ) * Documented project file structure * Moved default location of SQLite database to project data storage dir (closes #277)	2010-10-31 03:25:37 -02:00
Pablo Hoffman	9599bde3e9	Removed RequestLimitMiddleware	2010-09-22 16:09:13 -03:00
Pablo Hoffman	ed4aec187f	Ported code to use new unified access to spider settings, keeping backwards compatibility for old spider attributes. Refs #245	2010-09-22 16:09:13 -03:00
Pablo Hoffman	b6c2b55e5b	Splitted settings classes from settings singleton. Closes #244 --HG-- rename : scrapy/conf/__init__.py => scrapy/conf.py rename : scrapy/conf/default_settings.py => scrapy/settings/default_settings.py rename : scrapy/tests/test_conf.py => scrapy/tests/test_settings.py	2010-09-22 15:47:33 -03:00

1 2 3

103 Commits