1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 10:43:49 +00:00

831 Commits

Author SHA1 Message Date
Elias Dorneles
57a5ee0097 added example value to set for proxy meta key 2015-03-12 23:20:44 -03:00
Elias Dorneles
f7031c08ff updating list of Request.meta special keys 2015-03-10 22:29:07 -03:00
Sudhanshu Shekhar
839ffba971 Added the first version of SelectJmes
Utilizes jmespath. Also, added tests and documentation for the same.
2015-02-24 22:59:01 +05:30
Nicolás Alejandro Ramírez Quiros
8a3b9b6131 Merge pull request #1011 from SudShekhar/master
Extension example fix to something that makes more sense
2015-01-30 15:45:52 -02:00
Sudhanshu Shekhar
e42a1ac1a1 Reset items_scraped instead of item_count
items_scraped is the counter that needs to be reset each time we have scraped a specific number of items in the code instead of item_count (which represents the specific number of items needed before a message is logged). Updating the source code to reflect this.
Removed some irrelevant words from the log message.
Signed-off-by: Sudhanshu Shekhar <sudshekhar02@gmail.com>
2015-01-30 23:13:06 +05:30
Jonas Tingeborn
bd5d99a2d2 add gzip compression to filesystem http cache backend 2015-01-21 20:18:11 +01:00
Capi Etheriel
4bc14da59e Updates documentation on dynamic item classes.
Fixes #398
2015-01-19 17:21:56 -02:00
Mikhail Korobov
283d6a5344 DOC a couple more references are fixed 2015-01-19 22:07:03 +05:00
Mikhail Korobov
73e6b35622 DOC fix a reference 2015-01-19 22:02:46 +05:00
Artur Gaspar
b0730a1d16 documentation for CSS support in link extractors 2014-12-11 18:22:08 -02:00
Stefan
3602fc4fcb fixed the variable types in mailsender documentation 2014-12-10 22:48:09 +01:00
Lev Berman
e04b0aff74 An attempt to resolve #977, add signal to be sent when request is dropped by the scheduler 2014-11-27 15:10:15 +03:00
tpeng
a69f042d10 add 2 more test cases and minor doc fixes 2014-11-19 15:31:07 +01:00
tpeng
fa84730e70 avoid download large response
introduce DOWNLOAD_MAXSIZE and DOWNLOAD_WARNSIZE in settings and
download_maxsize/download_warnsize in spider/request meta, so
downloader stop downloading as soon as the received data exceed the
limit. also check the twsisted response's length in advance to stop
downloading as early as possible.
2014-11-12 12:28:02 +01:00
Lazar-T
13f83f0da0 typo 2014-11-10 06:28:41 +05:00
HalfCrazy
b21a28cc9a Afterwords->Afterwards 2014-11-10 06:28:09 +05:00
Pablo Hoffman
efe589c643 Merge pull request #882 from ahlen/feature/csvfeed-quotechar
[MRG+1] Allow to specify the quotechar in CSVFeedSpider
2014-11-04 11:32:59 -02:00
Lazar-T
38dcf50cd6 comma instead of fullstop 2014-10-25 09:19:50 +06:00
Pablo Hoffman
675fd5ba04 Merge pull request #898 from scrapy/download-timeout
[MRG] DOC document download_timeout
2014-10-24 16:52:42 -02:00
Pablo Hoffman
0dce283459 Merge pull request #893 from kmike/less-ads
[MRG] DOC simplify extension docs
2014-10-21 17:13:59 -02:00
Mikhail Korobov
7d68b084a4 DOC document download_timeout Request.meta key and download_timeout spider attribute. 2014-10-07 04:23:11 +06:00
Mikhail Korobov
ea3b372b4f DOC typo fix in leaks.rst 2014-10-02 15:20:13 +06:00
Pablo Hoffman
e7843d35de Merge pull request #894 from kmike/leaks-docs
Leaks docs
2014-10-02 01:14:54 -03:00
Pablo Hoffman
5835224eee Merge pull request #896 from scrapy/robotstxt-once
[MRG] process robots.txt once
2014-10-02 00:58:55 -03:00
Mikhail Korobov
6fcf9dce50 DOC document from_crawler method for item pipelines; add an example. 2014-09-25 03:13:51 +06:00
Mikhail Korobov
36eec8f413 dont_obey_robotstxt meta key; don't process requests to /robots.txt 2014-09-23 00:10:43 +06:00
Mikhail Korobov
bdbca1e2d7 DOC request queue memory usage 2014-09-21 07:30:44 +06:00
Mikhail Korobov
bc0f481a73 DOC bring back notes about multiple spiders per process because it is now documented how to do that 2014-09-21 07:12:01 +06:00
Mikhail Korobov
a122fdbfea Update leaks.rst: there is now only a single spider in a process. 2014-09-21 06:54:00 +06:00
Mikhail Korobov
e435b3e3a3 DOC simplify extension docs 2014-09-21 00:19:24 +06:00
John-Scott Atlakson
a312ebfb43 Update request-response.rst
Fixed minor typo
2014-09-14 22:06:31 +06:00
Mikael Åhlén
47b6dff9f1 Allow to specify the quotechar in CSVFeedSpider 2014-09-13 02:14:57 +02:00
Julia Medina
16e62e9c9b Per-spider settings documentation 2014-09-01 21:56:57 -03:00
Daniel Graña
ccde3317d7 Merge pull request #816 from Curita/api-cleanup
GSoC API cleanup
2014-09-01 21:55:36 -03:00
Mikhail Korobov
774ab74ad2 Merge pull request #864 from younghz/master
Duplicate comma in request-response.rst
2014-08-28 18:52:51 +06:00
Uyounghz
d49766a6ac Duplicate comma in request-response.rst 2014-08-28 19:58:58 +08:00
Daniel Graña
841dd5f1f5 Update webservice.rst 2014-08-18 17:48:01 -03:00
Daniel Graña
d684ecad7b Merge pull request #846 from rocioar/master
fix dont_merge_cookies bad behaviour when set to false on meta
2014-08-18 13:54:11 -03:00
Daniel Graña
a9292cfab7 jsonrpc webservice moved to https://github.com/scrapy/scrapy-jsonrpc repository 2014-08-15 23:28:13 -03:00
Rocio Aramberri
51b0bd281d fix dont settings on meta behaviour, add docs and tests 2014-08-15 13:47:42 -07:00
Julia Medina
3547ca6e61 Add example on running spiders outside projects 2014-08-14 11:50:33 -03:00
Julia Medina
419026615f Deprecate Crawler.spiders attribute 2014-08-14 09:19:41 -03:00
Julia Medina
c90977ca98 Drop support for scrapy.project.crawler (And scrapy.stats consequently) 2014-08-12 14:02:56 -03:00
Julia Medina
900a487682 Support multiple simultaneous LogObservers listening different crawlers 2014-08-12 14:02:56 -03:00
Julia Medina
d40273561d CrawlerProcess cleanup changes 2014-08-12 14:02:55 -03:00
Julia Medina
980e30a187 Crawler interface cleanup 2014-08-12 14:02:55 -03:00
Julia Medina
d7038b2a13 SpiderManager interface cleanup 2014-08-12 14:02:55 -03:00
Julia Medina
39c6a80f9d Both getdict and getlist return copies of the requested values 2014-08-12 14:02:55 -03:00
Julia Medina
3ae971468f Add Settings.copy, freeze and frozencopy method 2014-08-12 14:02:55 -03:00
Julia Medina
84fa004793 Add from_crawler class method to base Spider 2014-08-11 11:23:57 -03:00