Mikhail Korobov
b245d592aa
Update faq.rst
...
spider.DOWNLOAD_DELAY is deprecated
2013-04-18 02:42:15 +06:00
Pablo Hoffman
bb20907254
minor updated to faq
2013-03-14 16:43:00 -03:00
Pablo Hoffman
098ccff862
added FAQ about error: "cannot import name crawler"
2013-03-14 12:57:59 -03:00
Pablo Hoffman
6ab8afb992
improve documentation about removing namespaces
2013-01-18 12:35:30 -02:00
Pablo Hoffman
1e2ee76df2
add documentation topics: Broad Crawls & Common Practies
2012-12-26 14:02:13 -02:00
Pablo Hoffman
1f0d167037
doc: removed broken proxyhub link from FAQ
2012-11-22 15:10:26 -02:00
Pablo Hoffman
fff2871828
added doc section (and FAQ) about spider arguments
2012-09-04 14:49:30 -03:00
Pablo Hoffman
e1be9c01bc
updated FAQ about bot bans
2012-06-08 18:33:53 -03:00
Pablo Hoffman
e521da2e2f
Dropped support for Python 2.5. See: http://blog.scrapy.org/scrapy-dropping-support-for-python-25
2012-03-01 08:18:12 -02:00
Pablo Hoffman
ade5efdc61
added -o option to scrapy crawl, a convenient shortcut for using feed exports
2011-10-22 20:53:49 -02:00
Pablo Hoffman
431441cb52
updated documentation to remove references to old issue tracker and mercurial repos
2011-09-25 13:06:24 -03:00
Pablo Hoffman
ce03ccd4ec
updated documentation about DEPTH_PRIORITY and DFO/BFO crawls
2011-09-23 13:22:25 -03:00
Pablo Hoffman
76af0cdd44
updated documentation and code to use -s instead of --set option
2011-09-01 14:35:37 -03:00
Pablo Hoffman
549725215e
Initial support for a persistent scheduler, to support pausing and resuming
...
crawls.
* requests are serialized (using marshal by default) and stored on disk, using
one queue per priority
* request priorities must be integers now
* breadh-first and depth-first crawling orders can now be configured
through a new DEPTH_PRIORITY setting (see doc). backwards compatilibty with
SCHEDULER_ORDER was kept.
* requests that can't be serialized (for example, non serializable callbacks)
are always kept in memory queues
* adapted crawl spider to work with persitent scheduler
2011-08-02 11:57:55 -03:00
Pablo Hoffman
f354a49d0f
added FAQ about preventing bots getting banned
2011-07-28 00:40:30 -03:00
Pablo Hoffman
b6b0a54d9f
removed FAQ entry
2011-07-20 01:31:36 -03:00
Pablo Hoffman
e3f640c7bf
added FAQ entry about scrapy deploy issue on Mac + Python 2.5
2011-07-19 19:53:32 -03:00
Pablo Hoffman
763f3dc628
minor update to doc
2011-07-12 19:56:39 -03:00
Pablo Hoffman
4fde1ef94d
added CloseSpider exception, to manually close spiders
2011-07-12 14:24:10 -03:00
Pablo Hoffman
bf73002428
removed googledir example, replaced by dirbot project on github. updated docs accordingly
2011-04-28 02:28:39 -03:00
Pablo Hoffman
3ee2c94e93
Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it
2011-04-06 14:54:48 -03:00
Daniel Grana
c55355642c
fix FAQ typos reported by marlun_ at #scrapy IRC channel
2011-02-16 08:57:42 -02:00
Pablo Hoffman
16d9a33951
added FAQ entry about working with big data feeds
2011-02-15 07:24:52 -02:00
Pablo Hoffman
ac007802d6
Simplified installation guide, including lxml as alternative dependency to libxml2. Closes #280
2010-11-17 21:32:23 -02:00
Pablo Hoffman
5a5364d0c1
Updated documentation to point out that simplejson is now required if using Python 2.5, and to recommended switching to Python 2.6
2010-11-16 03:31:04 -02:00
Pablo Hoffman
ed4aec187f
Ported code to use new unified access to spider settings, keeping backwards compatibility for old spider attributes. Refs #245
2010-09-22 16:09:13 -03:00
Pablo Hoffman
3c5ab10688
Added FAQ entry about __VIEWSTATE parameter
2010-09-06 13:17:08 -03:00
Pablo Hoffman
2f12618890
Post reference to Scrapyd in FAQ
2010-09-05 04:35:27 -03:00
Pablo Hoffman
6da1162839
minor fixes to FAQ
2010-08-22 19:08:45 -03:00
Pablo Hoffman
b3753d34eb
Added FAQ entry about feed exports
2010-08-22 05:59:30 -03:00
Pablo Hoffman
1d3b9e2ca8
Scrapy shell refactoring
2010-08-20 11:26:14 -03:00
Pablo Hoffman
30e2404d8f
updated FAQ entry to recommend using higher download delays
2010-08-19 17:59:52 -03:00
Pablo Hoffman
3d8151bb26
Added FAQ entry about response code 999
2010-08-19 16:51:51 -03:00
Pablo Hoffman
34554da201
Deprecated scrapy-ctl.py command in favour of simpler "scrapy" command. Closes #199 . Also updated documenation accordingly and added convenient scrapy.bat script for running from Windows.
...
--HG--
rename : debian/scrapy-ctl.1 => debian/scrapy.1
rename : docs/topics/scrapy-ctl.rst => docs/topics/cmdline.rst
2010-08-18 19:48:32 -03:00
Pablo Hoffman
3e3a66620b
Added support for returning deferreds from (some) signal handlers. Closes #193
2010-08-14 21:10:37 -03:00
Pablo Hoffman
c7d9f6e270
Added JSON item exporter with doc and unittests ( closes #192 ), and also:
...
* put all json exporters in scrapy.contrib.exporters and deprecated
scrapy.contrib.exporters.jsonlines to reduce module nesting
* use JSON exporter with EXPORT_FORMAT=json in file export pipeline
2010-08-07 15:52:59 -03:00
Pablo Hoffman
115e9f2162
Added FAQ entry about running Scrapy deployment.
2010-06-14 18:21:12 -03:00
Pablo Hoffman
6084be3b2e
added iter_all() function to scrapy.util.trackref module and improved memory leaks documentation. also added a new FAQ antry about memory issues
2009-11-28 16:21:59 -02:00
Pablo Hoffman
db7fec1fef
fixed doc typo
2009-11-12 12:17:39 -02:00
Pablo Hoffman
415dec4e16
made offsite middleware log messages when filtering out requests
2009-11-12 10:17:21 -02:00
Pablo Hoffman
937acd91d1
improved documentation of http proxy middleware
2009-10-07 21:00:34 -02:00
Pablo Hoffman
e8960bf616
added runspider command to run spiders directly, without having to create a project
2009-09-14 22:05:14 -03:00
Pablo Hoffman
f85813cd94
added FAQ entry about scrapy recipes and community spiders
2009-09-10 18:32:50 -03:00
Pablo Hoffman
44783a3a06
minor improvements to FAQ entry
2009-08-26 00:18:58 -03:00
Pablo Hoffman
0363040884
doc: added FAQ entry about Accept-Language
2009-08-24 13:56:44 -03:00
Pablo Hoffman
0186c6937a
HTTP auth middleware: added doc and unittest
2009-08-24 08:07:20 -03:00
Pablo Hoffman
ef6f04eb06
moved doc about debugging memory leaks to its own topic and added doc about trackref module
2009-08-21 16:07:16 -03:00
Ismael Carnales
33089d287d
merged topics and reference doc
2009-08-18 14:05:15 -03:00
Pablo Hoffman
d95e99f585
Added documentation for Items and Loaders, removed obsolete Item Adaptors documentation
...
--HG--
rename : docs/experimental/topics/newitem/index.rst => docs/experimental/newitem.rst
2009-08-07 03:50:09 -03:00
Pablo Hoffman
0c4c153819
improved Scrapy documentation index for better usability
2009-07-01 09:51:57 -03:00