scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 22:43:57 +00:00

Author	SHA1	Message	Date
grammy-jiang	cb76b88331	fix a mistake in topic spider-middleware.rst	2018-04-04 05:56:05 -04:00
Jesse Bakker	0b14cb44aa	Added from_crawler to middleware docs	2017-11-23 15:25:43 +01:00
djunzu	8288f78a39	Add note about request.meta['depth'] in DepthMiddleware	2017-10-16 21:34:37 -02:00
Paul Tremberth	bc200d1155	Rename setting to REFERRER_POLICY (with 2 Rs)	2017-03-01 17:51:23 +01:00
Paul Tremberth	537683f945	Add autoclass directives to document built-in policies	2017-03-01 17:51:23 +01:00
Paul Tremberth	3dc09eeceb	Use table for referrer policy options	2017-03-01 17:51:23 +01:00
Paul Tremberth	605935f015	Edit text	2017-03-01 17:51:23 +01:00
Paul Tremberth	eb07285a63	Reword warning on no-referrer-when-downgrade policy	2017-03-01 17:51:23 +01:00
Paul Tremberth	03ff19d188	Update docs for new "referrer_policy" Request.meta key	2017-03-01 17:51:23 +01:00
Paul Tremberth	e249abc32b	Update docs	2017-03-01 17:50:39 +01:00
Paul Tremberth	c86f568b9c	Update docs with "strict-..." policies	2017-03-01 17:50:39 +01:00
Paul Tremberth	c9c59db489	Update documentation about REFERER_POLICY setting	2017-03-01 17:50:39 +01:00
Takehiro Shiozaki	fcb3daf4fa	fix typo	2017-02-06 14:03:41 +09:00
Jose Ricardo	e12e364a40	Add details to the spider middlewares docs Document the effects of the middleware order in a more detailed way.	2016-10-18 12:29:30 -02:00
nyov	5876b9aa30	Update documentation links	2016-03-03 16:28:33 +00:00
Νικόλαος-Διγενής Καραγιάννης	1cffa99e0d	tests+doc for subdomains in offsite middleware	2016-01-26 12:49:43 +02:00
Jakob de Maeyer	e66f649894	Bring back _BASE settings	2015-11-11 17:39:56 +01:00
Jakob de Maeyer	26586ef5a6	Deprecate _BASE settings, unify _BASE backwards-compatibility	2015-10-27 12:43:23 +01:00
Julia Medina	d3f576a816	Move scrapy/spider.py to scrapy/spiders/__init__.py	2015-05-09 04:20:09 -03:00
Julia Medina	180272c092	Move scrapy/contrib/spidermiddleware to scrapy/spidermiddlewares	2015-04-29 21:26:35 -03:00
Pablo Hoffman	bb4c922d85	Merge pull request #1081 from scrapy/dict-items Allow spiders to return dicts.	2015-03-27 15:19:27 -03:00
Mikhail Korobov	817dbc6cbd	DOC mention dicts in documentation; explain better what are Items for	2015-03-19 05:16:14 +05:00
Shadab Zafar	5a58d64131	Fix some redirection links in documentation Fixes #606	2015-03-18 19:41:26 -03:00
Mikhail Korobov	baf5c59386	Merge pull request #1071 from eliasdorneles/updating-request-meta-special-keys updating list of Request.meta special keys	2015-03-13 16:38:19 +05:00
Elias Dorneles	f7031c08ff	updating list of Request.meta special keys	2015-03-10 22:29:07 -03:00
Mikhail Korobov	283d6a5344	DOC a couple more references are fixed	2015-01-19 22:07:03 +05:00
Mikhail Korobov	73e6b35622	DOC fix a reference	2015-01-19 22:02:46 +05:00
Mikhail Korobov	e435b3e3a3	DOC simplify extension docs	2014-09-21 00:19:24 +06:00
Mikhail Korobov	2d3803672b	DOC use top-level shortcuts in docs	2014-04-15 01:09:35 +06:00
Nikolaos-Digenis Karagiannis	4335420f40	SpiderMW doc typo: SWP request, response	2014-03-06 16:09:37 +02:00
Mikhail Korobov	a27d91f0a6	Rename BaseSpider to Spider. See GH-495.	2013-12-30 19:46:41 +06:00
Pablo Hoffman	f87be371a2	better names for HANDLE_* settings, and added doc	2013-11-21 14:33:17 -02:00
Steven Almeroth	f62b6660d4	doc: fix typo in spider middleware	2013-03-02 19:46:31 -06:00
Chris Tilden	aae6aed4fb	fixes spelling errors in documentation	2013-01-22 14:52:18 -08:00
Pablo Hoffman	be206ca5ab	added process_start_requests method to spider middlewares	2012-08-31 16:41:50 -03:00
Pablo Hoffman	4ec99117d3	fixed minor doc typo	2012-08-30 11:56:30 -03:00
stav	f1802289cd	small doc typo change to get the fork rolling	2012-04-11 12:05:39 -05:00
Pablo Hoffman	8933e2f2be	added REFERER_ENABLED setting, to control referer middleware	2012-03-22 16:35:14 -03:00
Pablo Hoffman	ce03ccd4ec	updated documentation about DEPTH_PRIORITY and DFO/BFO crawls	2011-09-23 13:22:25 -03:00
Daniel Grana	5f1b1c05f8	Do not filter requests with dont_filter attribute set in OffsiteMiddleware	2011-09-08 15:18:10 -03:00
Pablo Hoffman	549725215e	Initial support for a persistent scheduler, to support pausing and resuming crawls. * requests are serialized (using marshal by default) and stored on disk, using one queue per priority * request priorities must be integers now * breadh-first and depth-first crawling orders can now be configured through a new DEPTH_PRIORITY setting (see doc). backwards compatilibty with SCHEDULER_ORDER was kept. * requests that can't be serialized (for example, non serializable callbacks) are always kept in memory queues * adapted crawl spider to work with persitent scheduler	2011-08-02 11:57:55 -03:00
Pablo Hoffman	e6091df551	fixed doc typo	2011-05-30 09:04:31 -03:00
Pablo Hoffman	9599bde3e9	Removed RequestLimitMiddleware	2010-09-22 16:09:13 -03:00
Pablo Hoffman	7f21a6384f	Documented handle_httpstatus_list request.meta key	2010-09-09 21:50:40 -03:00
Pablo Hoffman	e9ebebb230	Removed UrlFilterMiddleware from scrapy.contrib - see this snippet for an alternative: http://snippets.scrapy.org/snippets/12/	2010-09-07 17:51:02 -03:00
Pablo Hoffman	7b9fa7fbaa	Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225	2010-09-04 02:23:04 -03:00
Pablo Hoffman	9aefa242d5	Applied documentation patch provided by Lucian Ursu (closes #207 )	2010-08-21 01:26:35 -03:00
Daniel Grana	c925c9e9a0	Notify spider when requests are ignored by HttpErrorMiddleware, and generally when any call to process_spider_input raises an exception	2010-05-12 16:41:06 -03:00
Rolando Espinoza La fuente	db5c3df679	SEP12 implementation * Rename BaseSpider.domain_name to BaseSpider.name This patch implements the domain_name to name change in BaseSpider class and change all spider instantiations to use the new attribute. * Add allowed_domains to spider This patch implements the merging of spider.domain_name and spider.extra_domain_names in spider.allowed_domains for offsite checking purposes. Note that spider.domain_name is not touched by this patch, only not used. * Remove spider.domain_name references from scrapy.stats * Rename domain_stats to spider_stats in MemoryStatsCollector * Use ``spider`` instead of ``domain`` in SimpledbStatsCollector * Rename domain_stats_history table to spider_data_history and rename domain field to spider in MysqlStatsCollector * Refactor genspider command The new signature for genspider is: genspider [options] <domain_name>. Genspider uses domain_name for spider name and for the module name. * Remove spider.domain_name references * Update crawl command signature <spider\|url> * docs: updated references to domain_name * examples/experimental: use spider.name * genspider: require <name> <domain> * spidermanager: renamed crawl_domain to crawl_spider_name * spiderctl: updated references of domain to spider * added backward compatiblity with legacy spider's attributes 'domain_name' and 'extra_domain_names'	2010-04-01 18:27:22 -03:00
Pablo Hoffman	415dec4e16	made offsite middleware log messages when filtering out requests	2009-11-12 10:17:21 -02:00

1 2

63 Commits