1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 23:24:01 +00:00

52 Commits

Author SHA1 Message Date
Mikhail Korobov
b245d592aa Update faq.rst
spider.DOWNLOAD_DELAY is deprecated
2013-04-18 02:42:15 +06:00
Pablo Hoffman
bb20907254 minor updated to faq 2013-03-14 16:43:00 -03:00
Pablo Hoffman
098ccff862 added FAQ about error: "cannot import name crawler" 2013-03-14 12:57:59 -03:00
Pablo Hoffman
6ab8afb992 improve documentation about removing namespaces 2013-01-18 12:35:30 -02:00
Pablo Hoffman
1e2ee76df2 add documentation topics: Broad Crawls & Common Practies 2012-12-26 14:02:13 -02:00
Pablo Hoffman
1f0d167037 doc: removed broken proxyhub link from FAQ 2012-11-22 15:10:26 -02:00
Pablo Hoffman
fff2871828 added doc section (and FAQ) about spider arguments 2012-09-04 14:49:30 -03:00
Pablo Hoffman
e1be9c01bc updated FAQ about bot bans 2012-06-08 18:33:53 -03:00
Pablo Hoffman
e521da2e2f Dropped support for Python 2.5. See: http://blog.scrapy.org/scrapy-dropping-support-for-python-25 2012-03-01 08:18:12 -02:00
Pablo Hoffman
ade5efdc61 added -o option to scrapy crawl, a convenient shortcut for using feed exports 2011-10-22 20:53:49 -02:00
Pablo Hoffman
431441cb52 updated documentation to remove references to old issue tracker and mercurial repos 2011-09-25 13:06:24 -03:00
Pablo Hoffman
ce03ccd4ec updated documentation about DEPTH_PRIORITY and DFO/BFO crawls 2011-09-23 13:22:25 -03:00
Pablo Hoffman
76af0cdd44 updated documentation and code to use -s instead of --set option 2011-09-01 14:35:37 -03:00
Pablo Hoffman
549725215e Initial support for a persistent scheduler, to support pausing and resuming
crawls.

* requests are serialized (using marshal by default) and stored on disk, using
  one queue per priority
* request priorities must be integers now
* breadh-first and depth-first crawling orders can now be configured
  through a new DEPTH_PRIORITY setting (see doc). backwards compatilibty with
  SCHEDULER_ORDER was kept.
* requests that can't be serialized (for example, non serializable callbacks)
  are always kept in memory queues
* adapted crawl spider to work with persitent scheduler
2011-08-02 11:57:55 -03:00
Pablo Hoffman
f354a49d0f added FAQ about preventing bots getting banned 2011-07-28 00:40:30 -03:00
Pablo Hoffman
b6b0a54d9f removed FAQ entry 2011-07-20 01:31:36 -03:00
Pablo Hoffman
e3f640c7bf added FAQ entry about scrapy deploy issue on Mac + Python 2.5 2011-07-19 19:53:32 -03:00
Pablo Hoffman
763f3dc628 minor update to doc 2011-07-12 19:56:39 -03:00
Pablo Hoffman
4fde1ef94d added CloseSpider exception, to manually close spiders 2011-07-12 14:24:10 -03:00
Pablo Hoffman
bf73002428 removed googledir example, replaced by dirbot project on github. updated docs accordingly 2011-04-28 02:28:39 -03:00
Pablo Hoffman
3ee2c94e93 Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it 2011-04-06 14:54:48 -03:00
Daniel Grana
c55355642c fix FAQ typos reported by marlun_ at #scrapy IRC channel 2011-02-16 08:57:42 -02:00
Pablo Hoffman
16d9a33951 added FAQ entry about working with big data feeds 2011-02-15 07:24:52 -02:00
Pablo Hoffman
ac007802d6 Simplified installation guide, including lxml as alternative dependency to libxml2. Closes #280 2010-11-17 21:32:23 -02:00
Pablo Hoffman
5a5364d0c1 Updated documentation to point out that simplejson is now required if using Python 2.5, and to recommended switching to Python 2.6 2010-11-16 03:31:04 -02:00
Pablo Hoffman
ed4aec187f Ported code to use new unified access to spider settings, keeping backwards compatibility for old spider attributes. Refs #245 2010-09-22 16:09:13 -03:00
Pablo Hoffman
3c5ab10688 Added FAQ entry about __VIEWSTATE parameter 2010-09-06 13:17:08 -03:00
Pablo Hoffman
2f12618890 Post reference to Scrapyd in FAQ 2010-09-05 04:35:27 -03:00
Pablo Hoffman
6da1162839 minor fixes to FAQ 2010-08-22 19:08:45 -03:00
Pablo Hoffman
b3753d34eb Added FAQ entry about feed exports 2010-08-22 05:59:30 -03:00
Pablo Hoffman
1d3b9e2ca8 Scrapy shell refactoring 2010-08-20 11:26:14 -03:00
Pablo Hoffman
30e2404d8f updated FAQ entry to recommend using higher download delays 2010-08-19 17:59:52 -03:00
Pablo Hoffman
3d8151bb26 Added FAQ entry about response code 999 2010-08-19 16:51:51 -03:00
Pablo Hoffman
34554da201 Deprecated scrapy-ctl.py command in favour of simpler "scrapy" command. Closes #199. Also updated documenation accordingly and added convenient scrapy.bat script for running from Windows.
--HG--
rename : debian/scrapy-ctl.1 => debian/scrapy.1
rename : docs/topics/scrapy-ctl.rst => docs/topics/cmdline.rst
2010-08-18 19:48:32 -03:00
Pablo Hoffman
3e3a66620b Added support for returning deferreds from (some) signal handlers. Closes #193 2010-08-14 21:10:37 -03:00
Pablo Hoffman
c7d9f6e270 Added JSON item exporter with doc and unittests (closes #192), and also:
* put all json exporters in scrapy.contrib.exporters and deprecated
  scrapy.contrib.exporters.jsonlines to reduce module nesting
* use JSON exporter with EXPORT_FORMAT=json in file export pipeline
2010-08-07 15:52:59 -03:00
Pablo Hoffman
115e9f2162 Added FAQ entry about running Scrapy deployment. 2010-06-14 18:21:12 -03:00
Pablo Hoffman
6084be3b2e added iter_all() function to scrapy.util.trackref module and improved memory leaks documentation. also added a new FAQ antry about memory issues 2009-11-28 16:21:59 -02:00
Pablo Hoffman
db7fec1fef fixed doc typo 2009-11-12 12:17:39 -02:00
Pablo Hoffman
415dec4e16 made offsite middleware log messages when filtering out requests 2009-11-12 10:17:21 -02:00
Pablo Hoffman
937acd91d1 improved documentation of http proxy middleware 2009-10-07 21:00:34 -02:00
Pablo Hoffman
e8960bf616 added runspider command to run spiders directly, without having to create a project 2009-09-14 22:05:14 -03:00
Pablo Hoffman
f85813cd94 added FAQ entry about scrapy recipes and community spiders 2009-09-10 18:32:50 -03:00
Pablo Hoffman
44783a3a06 minor improvements to FAQ entry 2009-08-26 00:18:58 -03:00
Pablo Hoffman
0363040884 doc: added FAQ entry about Accept-Language 2009-08-24 13:56:44 -03:00
Pablo Hoffman
0186c6937a HTTP auth middleware: added doc and unittest 2009-08-24 08:07:20 -03:00
Pablo Hoffman
ef6f04eb06 moved doc about debugging memory leaks to its own topic and added doc about trackref module 2009-08-21 16:07:16 -03:00
Ismael Carnales
33089d287d merged topics and reference doc 2009-08-18 14:05:15 -03:00
Pablo Hoffman
d95e99f585 Added documentation for Items and Loaders, removed obsolete Item Adaptors documentation
--HG--
rename : docs/experimental/topics/newitem/index.rst => docs/experimental/newitem.rst
2009-08-07 03:50:09 -03:00
Pablo Hoffman
0c4c153819 improved Scrapy documentation index for better usability 2009-07-01 09:51:57 -03:00