1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 23:24:01 +00:00

2871 Commits

Author SHA1 Message Date
Pablo Hoffman
10ed28b9d0 SitemapSpider: added support for sitemap urls ending in .xml and .xml.gz, even if they have a wrong content type 2012-01-03 12:17:17 -02:00
Pablo Hoffman
fb44f303a9 extras/makedeb.py: no longer obtaining version from git 2012-01-02 13:28:51 -02:00
Pablo Hoffman
db92bc8c40 bumped version to 0.15.1, mainly to avoid package upgrade issues with new versioning based on git describe 0.15.1 2012-01-02 13:07:15 -02:00
Pablo Hoffman
b6220b8e95 use git describe for building version from git, and removed support for building version from hg 2012-01-02 13:05:26 -02:00
Pablo Hoffman
1f87d7ff4b Merge pull request #75 from darkrho/httpcache-stats
tests: fixed httpcache testcase.
2012-01-01 09:30:31 -08:00
Rolando Espinoza La fuente
93eb5b32dd tests: fixed httpcache testcase. 2011-12-30 23:11:16 -04:00
Pablo Hoffman
a36c8691d7 Merge pull request #74 from darkrho/httpcache-stats
httpcache: keep stats of cache hit/miss/store and don't store already cached response
2011-12-30 18:15:43 -08:00
Rolando Espinoza La fuente
503fdf39fe httpcache: don't store already cached response. 2011-12-30 14:21:07 -04:00
Rolando Espinoza La fuente
f2966eebc7 httpcache: keep stats of cache hit/miss/store. 2011-12-30 14:10:18 -04:00
Pablo Hoffman
9064188035 removed unused import 2011-12-28 15:21:10 -02:00
Pablo Hoffman
150f82e600 some some changes to scrapyd listjobs.json api:
* the api is now a GET instead of POST (for consistency)
* the api also returns pending and finished jobs, in addition to running
  ones
* only the last 100 finished jobs are kept (can be changed through the
  finished_to_keep setting)
2011-12-28 15:17:52 -02:00
Pablo Hoffman
1dfbe5d7a8 scrapyd.webservice: relocate ListJobs resource for better consistency 2011-12-28 14:36:24 -02:00
Pablo Hoffman
f214c94912 CrawlSpider: don't follow links from non-HTML responses 2011-12-27 21:22:38 -02:00
Pablo Hoffman
bda9c97c78 Merge pull request #48 from simonratner/delete-logs-by-mtime
Delete old logs based on file mtime.
2011-12-23 13:17:00 -08:00
Pablo Hoffman
0be421fbf0 fixed reference to tutorial directory 2011-12-23 18:57:11 -02:00
Pablo Hoffman
41fd3c4f6c doc: removed duplicated callback argument from Request.replace() 2011-12-23 15:55:46 -02:00
Pablo Hoffman
0eeff76227 fixed formatting of scrapyd doc 2011-12-20 03:18:37 -02:00
Daniel Graña
64ba6e7982 Dump stacks for all running threads and fix engine status dumped by StackTraceDump extension 2011-12-15 17:05:45 -02:00
Pablo Hoffman
023232f7d4 added comment about why we disable ssl on boto images upload 2011-12-15 14:23:45 -02:00
Daniel Graña
aea060e144 Merge branch '0.14' 2011-12-14 13:09:00 -02:00
Daniel Graña
63d583d9be SSL handshaking hangs when doing too many parallel connections to S3 2011-12-14 13:06:12 -02:00
Daniel Graña
bcb31988f2 change tutorial to follow changes on dmoz site 2011-12-14 13:03:31 -02:00
Rolando Espinoza La fuente
98f3f87530 Avoid _disconnectedDeferred AttributeError exception in Twisted>=11.1.0 2011-12-12 18:11:43 +00:00
Pablo Hoffman
9b84914736 Merge pull request #62 from darkrho/issue-58
Avoid _disconnectedDeferred AttributeError exception in Twisted>=11.1.0
2011-12-12 10:10:08 -08:00
Rolando Espinoza La fuente
9c04945785 Avoid _disconnectedDeferred AttributeError exception in Twisted>=11.1.0 2011-12-04 23:42:41 -04:00
Martin Olveyra
175a4b5957 allow spider to set autothrottle max concurrency 2011-12-02 15:50:53 -02:00
Pablo Hoffman
4fe42dc6fe Merge pull request #61 from kalessin/master
allow spider to set autothrottle max concurrency
2011-12-01 12:47:14 -08:00
Martin Olveyra
7b0184eb59 allow spider to set autothrottle max concurrency 2011-12-01 18:44:26 -02:00
Pablo Hoffman
e29f9e5b24 bumped version to 0.15 0.15.0 2011-11-17 14:44:34 -02:00
Pablo Hoffman
0f649f0e30 bumped version to 0.14 0.14.0 2011-11-17 14:43:40 -02:00
Pablo Hoffman
6d13de4366 fixed "No free spider slots" bug when calling fetch() from scrapy shell 2011-11-14 20:03:43 -02:00
Pablo Hoffman
d37a788d22 improve handling of KeyError exception when creating spiders in spider manager. closes issue 49 2011-11-14 17:00:25 -02:00
Pablo Hoffman
36df87b4de ignore meta-refresh redirects embedded in <script> tags. related to issue 18 2011-11-14 16:54:13 -02:00
Pablo Hoffman
ec1ef0235f ignore meta-refresh redirect when embedded inside <noscript> tag. closes issue 18 2011-11-14 16:25:22 -02:00
Simon Ratner
7232c31f78 Delete old logs based on file mtime. 2011-11-11 11:53:00 -08:00
Pablo Hoffman
6cc40dc062 fixed bug in MEMUSAGE_NOTIFY_MAIL setting 2011-11-08 11:51:26 -02:00
Pablo Hoffman
37ad4f8791 added support for ajax crawleable urls 2011-10-28 16:33:12 -02:00
Pablo Hoffman
992af8d38f ubuntu repos: added support for oneiric release 2011-10-25 14:26:38 -02:00
Pablo Hoffman
f4821a123d Do not raise PartialDownloadError if Content-Length doesn't match the body size. This fixes the error reported in: https://groups.google.com/d/topic/scrapy-users/FQ25O3KPQuU/discussion 2011-10-25 13:04:58 -02:00
Pablo Hoffman
c085f81641 removed deprecation warning for spider.download_timeout attribute 2011-10-25 04:26:37 -02:00
Pablo Hoffman
c38c49d56a fixed PickeItemExporter bug, added unittest, and added pickle to suported feed exports formats 2011-10-25 02:36:51 -02:00
Pablo Hoffman
8bdf288428 made scrapyd doc more version agnostic 2011-10-23 05:29:54 -02:00
Pablo Hoffman
64b8e2648e added support for using '-' in scrapy crawl -o, to dump items to standard output 2011-10-23 03:06:59 -02:00
Pablo Hoffman
028bf3386d feed exports: removed dependency on file.tell() method, so that stdout output works 2011-10-23 03:05:06 -02:00
Pablo Hoffman
10ced29e18 changed feed exports storage api so that file/stdio outputs directly without using a temporary file 2011-10-23 02:49:17 -02:00
Pablo Hoffman
ade5efdc61 added -o option to scrapy crawl, a convenient shortcut for using feed exports 2011-10-22 20:53:49 -02:00
Pablo Hoffman
13cd9a1b0f remove deprecation warning for spider.user_agent attribute 2011-10-22 19:28:12 -02:00
Pablo Hoffman
43b79afc9c remove usage of assertLess() which is only available on python 2.7+ 2011-09-26 12:21:52 -03:00
Pablo Hoffman
431441cb52 updated documentation to remove references to old issue tracker and mercurial repos 2011-09-25 13:06:24 -03:00
Pablo Hoffman
ce03ccd4ec updated documentation about DEPTH_PRIORITY and DFO/BFO crawls 2011-09-23 13:22:25 -03:00