1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 13:44:00 +00:00

2914 Commits

Author SHA1 Message Date
Pablo Hoffman
81abb45000 fixed bug in new cookiejar documentation 2012-02-28 11:08:25 -02:00
Pablo Hoffman
26c8004125 added documentation for the new cookiejar Request.meta key 2012-02-27 19:58:58 -02:00
Pablo Hoffman
44d6da82fd Merge pull request #96 from kalessin/cookiesmultijar
allow to work with multiple cookie jars on the same spider
2012-02-27 13:48:43 -08:00
olveyra
c093ac5ec6 allow to work with multiple cookie jars on the same spider 2012-02-27 18:03:48 +00:00
Pablo Hoffman
4ed1a03521 Merge pull request #95 from scrapy/openmobilealliance-mimetype
Handle as html standard mimetype defined by Open Mobile Alliance
2012-02-24 10:28:32 -08:00
Daniel Graña
049f315ff4 Handle as html standard mimetype defined by Open Mobile Alliance 2012-02-24 16:16:35 -02:00
Pablo Hoffman
b1f011d740 use netloc instead of hostname in url_is_from_any_domain(). closes #50 2012-02-24 02:09:02 -02:00
Daniel Graña
08d2c2b9ee Merge branch 'GH92-image-buf-threading' 2012-02-23 19:25:28 -02:00
Daniel Graña
2dbf2a38a2 move buffer pointing to start of file before computing checksum. refs #92 2012-02-23 19:23:37 -02:00
Pablo Hoffman
e0de5f3eab Merge pull request #93 from dangra/GH92-image-buf-threading
compute image checksum before persisting images
2012-02-23 13:21:10 -08:00
Daniel Graña
3286ce4f42 Compute image checksum before persisting images. closes #92
Avoids threading issue accesing buffer
2012-02-23 19:17:21 -02:00
Pablo Hoffman
52483c55cd Merge pull request #94 from dangra/mediapipeline-cache-failures
remove as much information as possible from cached failure
2012-02-23 13:11:29 -08:00
Daniel Graña
5c73a0b1c1 remove leaking references in cached failures 2012-02-23 19:08:36 -02:00
Pablo Hoffman
e312a88582 MemoryUsage: use resident memory size (instead of virtual) for tracking memory usage 2012-02-23 17:42:02 -02:00
Pablo Hoffman
7fe7c3f3b1 MemoryUsage extension: close the spiders (instead of stopping the engine) when the limit is exceeded, providing a descriptive reason for the close. Also fixed default value of MEMUSAGE_ENABLED setting to match the documentation. 2012-02-23 17:05:06 -02:00
Pablo Hoffman
6769b92493 Merge branch 'master' of github.com:scrapy/scrapy 2012-02-19 05:59:30 -02:00
Pablo Hoffman
0939106872 fixed bug in MemoryUsage extension: get_engine_status() takes exactly 1 argument (0 given) 2012-02-19 05:59:21 -02:00
Daniel Graña
5b0f10e767 Merge pull request #90 from kalessin/master
identify autothrottle debug mode stats with slot key, in order to allow to track concurrency/delay issues with spiders which crawls more than one site.
2012-02-17 11:12:41 -08:00
Martin Olveyra
b094cd4c57 identify autothrottle debug mode stats with slot key, in order to allow to
track concurrency/delay issues with spiders which crawls more than one
site.
2012-02-17 16:53:48 -02:00
Pablo Hoffman
7b8942a648 updated StackTraceDump extension doc 2012-02-16 15:14:17 -02:00
Pablo Hoffman
fe2ce938ee also dump (scrapy.utils.trackref) live references in StackTraceDump extension 2012-02-16 14:58:11 -02:00
Pablo Hoffman
900bf08fb6 fixed struct.error on http compression middleware. closes #87 2012-02-11 20:28:09 -02:00
Daniel Graña
acf69dac44 ajax crawling wasn't expanding for unicode urls 2012-02-07 14:44:06 -02:00
Daniel Graña
2e18f0db33 Catch start_requests iterator errors. refs #83 2012-01-27 17:41:38 -02:00
Daniel Graña
7201d074c1 Merge branch 'issue82' 2012-01-25 19:17:26 -02:00
Daniel Graña
eb8e98461d Add some comments and references to github issues. closes #82 2012-01-25 19:15:59 -02:00
Daniel Graña
2840865746 Allow overriding ClientContextFactory and enable SSL bug workarounds by default. refs #82 2012-01-25 18:29:54 -02:00
Pablo Hoffman
a0f41f100c Merge pull request #80 from kalessin/master
autothrottle code improvements (download delay + style)
2012-01-16 12:24:04 -08:00
Martin Olveyra
1c6a5a9374 some minor improvements in autothrottle code style 2012-01-16 18:19:31 -02:00
Martin Olveyra
59cf9d9b1a allow to set minimal download delay for autothrottle extension. also
limit download delay to a minimal of spider.download_delay if given
2012-01-16 18:16:24 -02:00
Pablo Hoffman
fc52d8d5cf Merge pull request #79 from seriyps/master
~10x speed-up for libxml2 XPathSelector
2012-01-15 19:32:59 -08:00
Сергей Прохоров
a6a2120715 Speed-up libxml2 XPathSelector 2012-01-15 03:09:00 +04:00
Pablo Hoffman
85e2b493b4 make scrapyd debian package dependent on the same (or higher) version of scrapy package 2012-01-13 10:55:20 -02:00
Pablo Hoffman
2ee523b14a scrapyd: added Items link to completed jobs table 2012-01-12 17:43:44 -02:00
Pablo Hoffman
8a45dd121b scrapyd: fixed issue with ubuntu package: /var/lib/scrapyd/items dir not being created by default 2012-01-12 17:17:50 -02:00
Pablo Hoffman
ea77342b55 updated versioning doc according to recent changes 2012-01-05 11:50:28 -02:00
Pablo Hoffman
0b0bce7f3c scrapyd: added cancel.json and listjobs.json api methods to documentation 2012-01-05 11:23:25 -02:00
Pablo Hoffman
8f42633a94 scrapyd: added clarification about how to disable items feeds generation 2012-01-05 11:20:50 -02:00
Pablo Hoffman
531fa95f98 scrapyd: removed redundant .scrapy component from paths when using scrapyd in 'scrapy server' mode 2012-01-03 23:13:56 -02:00
Pablo Hoffman
dbda33efa6 scrapyd: added support for storing items by default
Items are stored the same way as logs, in jsonlines format.

Also renamed logs_to_keep setting to jobs_to_keep.
2012-01-03 23:08:54 -02:00
Pablo Hoffman
0693694bcf scrapyd: fixed documentation link 2012-01-03 23:02:25 -02:00
Pablo Hoffman
485bc180df scrapyd: improved web interface to also show pending and finished jobs 2012-01-03 23:02:25 -02:00
Pablo Hoffman
f07e968a93 scrapyd: added new cancel.json api to cancel pending/running jobs 2012-01-03 23:02:19 -02:00
Pablo Hoffman
10ed28b9d0 SitemapSpider: added support for sitemap urls ending in .xml and .xml.gz, even if they have a wrong content type 2012-01-03 12:17:17 -02:00
Pablo Hoffman
fb44f303a9 extras/makedeb.py: no longer obtaining version from git 2012-01-02 13:28:51 -02:00
Pablo Hoffman
db92bc8c40 bumped version to 0.15.1, mainly to avoid package upgrade issues with new versioning based on git describe 0.15.1 2012-01-02 13:07:15 -02:00
Pablo Hoffman
b6220b8e95 use git describe for building version from git, and removed support for building version from hg 2012-01-02 13:05:26 -02:00
Pablo Hoffman
1f87d7ff4b Merge pull request #75 from darkrho/httpcache-stats
tests: fixed httpcache testcase.
2012-01-01 09:30:31 -08:00
Rolando Espinoza La fuente
93eb5b32dd tests: fixed httpcache testcase. 2011-12-30 23:11:16 -04:00
Pablo Hoffman
a36c8691d7 Merge pull request #74 from darkrho/httpcache-stats
httpcache: keep stats of cache hit/miss/store and don't store already cached response
2011-12-30 18:15:43 -08:00