1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 18:04:11 +00:00

669 Commits

Author SHA1 Message Date
Paul Tremberth
41765ca18d DupeFilter: add setting for verbose logging + stats counter for filtered requests 2014-02-17 13:42:42 +01:00
Rolando Espinoza
28f946b05f DOC Use pipelines module name instead of pipieline following default project files. 2014-02-15 11:01:26 -04:00
Daniel Graña
6ca49ce76a Merge pull request #594 from dangra/593-engineslots
fix a reference to unexistent engine.slots.
2014-02-14 09:39:30 -08:00
Daniel Graña
b58285b635 fix a reference to unexistent engine.slots. closes #593 2014-02-14 15:31:05 -02:00
Nikolaos-Digenis Karagiannis
43a797e2f7 downloaderMW doc typo (spiderMW doc copy remnant) 2014-02-11 22:30:00 +02:00
tracicot
b2f4b296df Correct typos 2014-02-10 11:46:23 -02:00
Rolando Espinoza
a6279fe95b DOC Fixed HTTPCACHE_STORAGE typo in the default value which is now Filesystem instead Dbm. 2014-01-30 11:53:42 -04:00
Rolando Espinoza
9aab9224cb Updated shell docs with the crawler reference and fixed the actual shell output.
Also updated the shell example with a reproducible code example.
2014-01-23 18:04:57 -04:00
Daniel Graña
677afe7e54 show ubuntu setup instructions as literal code 2014-01-20 16:22:53 -02:00
Daniel Graña
ebfb5b7096 replace warning about updating package lists by a note on package upgrade 2014-01-20 15:18:34 -02:00
Daniel Graña
52c3ff9190 fix apt-get line 2014-01-20 14:39:26 -02:00
Daniel Graña
eb73ddd301 Update Ubuntu installation instructions 2014-01-20 14:15:01 -02:00
stray-leone
7f30a671c3 modify the version of scrapy ubuntu package
latest version is 0.22.
with scrapy-0.18, tutorial project provides error
relative issue : https://github.com/scrapy/scrapy/issues/511
2014-01-20 10:24:34 -02:00
Mikhail Korobov
7b7a1d8dfd Make Filesystem storage backend default again. See GH-500. 2014-01-17 04:32:08 +06:00
Mikhail Korobov
b03fe04999 Rename AjaxCrawlableMiddleware to AjaxCrawlMiddleware 2014-01-16 23:09:37 +06:00
Pablo Hoffman
ed6fd4933f Merge pull request #524 from hobsonlane/master
documentation code example corrections per pablohoffman
2014-01-16 06:44:51 -08:00
Pablo Hoffman
71ada5476e Merge pull request #472 from redapple/exslt
Register EXSLT namespaces by default (resolves #470)
2014-01-16 06:32:05 -08:00
Daniel Graña
b9bb9bed6b Merge pull request #343 from kmike/ajax-crawlable
[MRG] AjaxCrawlableMiddleware
2014-01-16 05:07:39 -08:00
Paul Tremberth
827c0cf51f Rename "regexp" prefix to "re" 2014-01-15 15:00:25 +01:00
Paul Tremberth
88c8a523a7 Add warning in docs on performance when using EXSLT regexp functions 2014-01-15 12:52:10 +01:00
Paul Tremberth
a3eba68aca Drop EXSLT strings and math extensions 2014-01-15 12:28:25 +01:00
Pablo Hoffman
ea2f897b81 Merge pull request #502 from scrapy/doc-fixes
DOWNLOAD_DELAY docs clarification
2014-01-14 21:07:42 -08:00
Paul Tremberth
2cc26e6f56 Fix typo error 2014-01-14 13:09:18 +01:00
Paul Tremberth
29fc9f3466 Update selectors documentation and tests 2014-01-14 12:56:37 +01:00
Hobson Lane
6ba0857a5c documentation code example correction corrections per pablohoffman 2014-01-10 10:37:27 -08:00
malcolm m
962e5ef702 Clarify return value from extract_links 2014-01-05 14:42:48 -08:00
Yuri Prezument
060891c01c Remove unused import from code sample
Item pipeline docs - removed unused import from code sample
2014-01-03 15:44:17 +02:00
Mikhail Korobov
a27d91f0a6 Rename BaseSpider to Spider. See GH-495. 2013-12-30 19:46:41 +06:00
Mikhail Korobov
e713733edf minor fixes to scrapy shell docs
* better IPython links;
* MDC link instead of w3schools;
* small formatting fixes;
* show quoted URL in example
2013-12-30 10:27:39 +06:00
Mikhail Korobov
9a999daa2a DOWNLOAD_DELAY docs clarification:
* delay is enforced per website, not per spider;
* document download_delay attribute (it was previously documented only in FAQ about 999 error codes);
* document how CONCURRENT_REQUESTS_PER_IP affects download delays.
2013-12-28 06:30:34 +06:00
Pablo Hoffman
e42e3743fe quick documentation for #475 2013-12-24 12:19:15 -02:00
Mikhail Korobov
e0cebbfc8f add a remark about 1% 2013-12-20 23:12:37 +06:00
Mikhail Korobov
943a0bd264 AjaxCrawlableMiddleware in Broad Crawl docs 2013-12-19 01:01:26 +06:00
Mikhail Korobov
a87b3bd1c8 AjaxCrawlableMiddleware 2013-12-19 00:06:47 +06:00
Pablo Hoffman
339861367e Merge pull request #425 from audiodude/master
DownloaderMiddleware docs: Update process_request and minor cleanups.
2013-11-25 10:33:35 -08:00
Paul Tremberth
14f5817d6b Modify ItemLoader to support XPath and CSS selectors
Deprecate XPathItemLoader (now an alias to the new ItemLoader)
2013-11-21 18:05:24 +01:00
Pablo Hoffman
f87be371a2 better names for HANDLE_* settings, and added doc 2013-11-21 14:33:17 -02:00
Brian Lange
e4c1d8d37d Elaborate on use of order numbers 2013-11-19 17:51:50 -06:00
Brian Lange
b878f60b5a Add note to item-pipeline documentation explaining order in the ITEM_PIPELINES setting. 2013-11-19 16:12:54 -06:00
tntC4stl3
b51d5d81e4 duplicate 'use' in line 87 2013-11-15 13:56:44 +08:00
Daniel Graña
2df8156431 Drop Python 2.6 support 2013-10-29 13:44:00 -02:00
Pablo Hoffman
911c8082b0 simplified description of crawl command 2013-10-21 14:42:51 -02:00
Pablo Hoffman
e8ee449a2a Merge pull request #432 from darkrho/crawl-url
Removed URL reference in crawl command and .tld suffix in docs for spider names
2013-10-21 09:40:58 -07:00
Rolando Espinoza La fuente
34543c2b2e DOCS removed .tld suffix for spider names for the sake of consistency. 2013-10-19 23:03:20 -04:00
Daniel Graña
875b07aef8 fix references to old selector naming in docs 2013-10-17 09:33:15 -02:00
Travis Briggs
3043a5ba37 DownloaderMiddleware docs: Update process_request, proper explanation of IgnoreRequest.
Also:
* Change terminology to eliminate uses of terms such as "request middleware" to refer to the process_request methods of installed middleware.
* Remove description of "immediate redirection", as it is misleading.

Further changes.
2013-10-17 00:23:21 +00:00
Mikhail Korobov
086b8a20d4 typo fix in TextResponse docs 2013-10-17 04:50:30 +06:00
Pablo Hoffman
951a9f3f4c Merge pull request #226 from scraperdragon/patch-1
Parameters to Request() in wrong order
2013-10-16 13:15:52 -07:00
Daniel Graña
1461363809 Replace contenttype references by type
The type to choose from is the selector type, not the input type. A
content-type doesn't make sense in this context.
2013-10-16 17:37:25 -02:00
Daniel Graña
155ea08ea1 use sel name for Selector's instances in docs, internals and shell 2013-10-15 15:58:42 -02:00