1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 22:43:57 +00:00

65 Commits

Author SHA1 Message Date
Berker Peksag
31e5f164d4 Import unittest.mock if available.
mock is in the stdlib since Python 3.3.
2015-03-08 15:03:04 +02:00
Julia Medina
d68615a5af Test the parse command locally instead of against an external url 2015-01-19 10:28:25 -03:00
immerrr
82b187f283 S3DownloadHandler: fix auth for requests with quoted paths/query params 2014-12-11 18:14:36 +03:00
Lev Berman
fdb6bb07c0 #977 - test dropping requests 2014-11-28 10:53:33 +03:00
Lev Berman
e04b0aff74 An attempt to resolve #977, add signal to be sent when request is dropped by the scheduler 2014-11-27 15:10:15 +03:00
Pablo Hoffman
c31fb87335 Merge pull request #954 from kalessin/int-download-timeout
Force to read DOWNLOAD_TIMEOUT as int (for example to pass using environment variable)
2014-11-26 17:14:18 -02:00
Pablo Hoffman
dedea72774 Merge pull request #946 from tpeng/limit-response-size
avoid download large response
2014-11-25 17:57:55 -02:00
tpeng
cd19382754 attemp to fix travis fails 2014-11-25 14:20:25 +01:00
Daniel Graña
8d8e1b2c0c mitmproxy 0.10.1 needs netlib 0.10.1 too 2014-11-21 12:15:02 -02:00
Daniel Graña
314db3db8b pin mitmproxy 0.10.1 as >0.11 does not work with tests 2014-11-21 10:54:43 -02:00
Martin Olveyra
7910fa0172 Force to read DOWNLOAD_TIMEOUT as int (for example to pass using
environment variable)
2014-11-21 01:09:32 -02:00
tpeng
a69f042d10 add 2 more test cases and minor doc fixes 2014-11-19 15:31:07 +01:00
tpeng
fa84730e70 avoid download large response
introduce DOWNLOAD_MAXSIZE and DOWNLOAD_WARNSIZE in settings and
download_maxsize/download_warnsize in spider/request meta, so
downloader stop downloading as soon as the received data exceed the
limit. also check the twsisted response's length in advance to stop
downloading as early as possible.
2014-11-12 12:28:02 +01:00
Pablo Hoffman
efe589c643 Merge pull request #882 from ahlen/feature/csvfeed-quotechar
[MRG+1] Allow to specify the quotechar in CSVFeedSpider
2014-11-04 11:32:59 -02:00
Mikhail Korobov
36eec8f413 dont_obey_robotstxt meta key; don't process requests to /robots.txt 2014-09-23 00:10:43 +06:00
Mikhail Korobov
49645d4bf9 TST small cleanup of a cookie test 2014-09-21 05:31:34 +06:00
Mikhail Korobov
c543fe6e4c Merge pull request #878 from andrewshir/master
Fix bug for ".local" host name
2014-09-21 05:24:54 +06:00
andrewshir
e583c030db Test for local domains (without dots) added 2014-09-14 14:24:16 +06:00
Mikael Åhlén
22da1783bd added a test-case for wrong quotechar 2014-09-13 03:47:40 +02:00
Mikael Åhlén
47b6dff9f1 Allow to specify the quotechar in CSVFeedSpider 2014-09-13 02:14:57 +02:00
Daniel Graña
5bcabfe9c9 SPIDER_MODULES can be set as a csv string 2014-09-10 23:25:57 -03:00
Daniel Graña
ec93c0fdcc Add the tests changes for previous commit 2014-09-10 12:05:18 -03:00
Daniel Graña
99971dc8a8 Do not pop the crawler from the managed list 2014-09-09 20:59:07 +00:00
Julia Medina
c2592b39fd Test verifying that CrawlerRunner populates spider class settings 2014-09-01 21:56:57 -03:00
Julia Medina
9ef3972cfb Per-spider settings tests 2014-09-01 21:56:57 -03:00
Daniel Graña
ccde3317d7 Merge pull request #816 from Curita/api-cleanup
GSoC API cleanup
2014-09-01 21:55:36 -03:00
yakxxx
e4689556f0 SgmlLinkExtractor - fix for parsing <area> tag with Unicode present 2014-08-28 18:55:58 +02:00
Daniel Graña
d684ecad7b Merge pull request #846 from rocioar/master
fix dont_merge_cookies bad behaviour when set to false on meta
2014-08-18 13:54:11 -03:00
Daniel Graña
a9292cfab7 jsonrpc webservice moved to https://github.com/scrapy/scrapy-jsonrpc repository 2014-08-15 23:28:13 -03:00
Rocio Aramberri
51b0bd281d fix dont settings on meta behaviour, add docs and tests 2014-08-15 13:47:42 -07:00
Julia Medina
70f2010db1 Change error type when updating frozen settings 2014-08-14 11:59:25 -03:00
Julia Medina
419026615f Deprecate Crawler.spiders attribute 2014-08-14 09:19:41 -03:00
Julia Medina
c90977ca98 Drop support for scrapy.project.crawler (And scrapy.stats consequently) 2014-08-12 14:02:56 -03:00
Julia Medina
900a487682 Support multiple simultaneous LogObservers listening different crawlers 2014-08-12 14:02:56 -03:00
Julia Medina
870438e5f4 Update tests utils, fixing get_crawler and removing docrawl 2014-08-12 14:02:56 -03:00
Julia Medina
d7038b2a13 SpiderManager interface cleanup 2014-08-12 14:02:55 -03:00
Julia Medina
3ae971468f Add Settings.copy, freeze and frozencopy method 2014-08-12 14:02:55 -03:00
Julia Medina
a995727117 Connect spider_closed signal after a crawler is bound to a Spider 2014-08-12 14:02:55 -03:00
Julia Medina
eb0253e530 Update from_crawler method as well as set_crawler on CrawlSpider 2014-08-11 11:24:01 -03:00
Julia Medina
84fa004793 Add from_crawler class method to base Spider 2014-08-11 11:23:57 -03:00
Pablo Hoffman
69a665eb32 Merge pull request #832 from ivannotes/master
Bugfix for leaking Proxy-Authorization header to targeted host
2014-08-04 15:20:51 -03:00
Daniel Graña
511a26978f Merge pull request #817 from nramirezuy/fix-startproject-project-name
[MRG] Fix startproject project name
2014-08-04 11:55:35 -03:00
Mikhail Korobov
e62bbf0766 PY3 top-level shortcuts work 2014-08-02 00:24:32 +06:00
Daniel Graña
ee21eaa6f3 Merge pull request #835 from scrapy/modernize-asserts
[MRG] modernize some of the asserts
2014-08-01 15:19:36 -03:00
Mikhail Korobov
1f59a69a97 PY3 port scrapy.link 2014-08-02 00:16:01 +06:00
Mikhail Korobov
628a8d8a15 TST modernize some of the asserts 2014-08-01 23:44:23 +06:00
Nuno Maximiano
08224c92f4 add project name validation 2014-08-01 14:36:30 -03:00
Mikhail Korobov
e2c1226838 PY3 port scrapy.contrib.feedexport 2014-08-01 23:28:28 +06:00
Mikhail Korobov
51bd9f7d64 TST enable doctests for Python 3 2014-08-01 23:12:52 +06:00
Mikhail Korobov
444052a2f9 PY3 port scrapy.utils.misc.arg_to_iter; fix scrapy.utils.misc.md5sum doctest 2014-08-01 22:56:03 +06:00