Pablo Hoffman
6084be3b2e
added iter_all() function to scrapy.util.trackref module and improved memory leaks documentation. also added a new FAQ antry about memory issues
2009-11-28 16:21:59 -02:00
Pablo Hoffman
de35eee3cd
Automated merge after applying ajones patch
2009-11-28 12:31:05 -02:00
Pablo Hoffman
207aae2b04
Fixed logging of "Spider closed" message in engine
2009-11-26 19:12:33 -02:00
Pablo Hoffman
34fcf6ba5b
Added informative message when trying to use trackref and it's not enabled
2009-11-26 18:37:45 -02:00
Pablo Hoffman
fd50891113
Keep track of pending next-request calls in the engine and downloader, to cancel them properly when the spider is closed. This also avoids keeping spider references alive, after they are closed.
2009-11-26 17:00:16 -02:00
Pablo Hoffman
c0f1c8de04
use DelayedCall.active() instead DelayedCall.called, to support cancelled tasks
2009-11-26 16:10:28 -02:00
Pablo Hoffman
88417a3ed1
Fixed bug in LiveStats webconsole module which was keeping references to spiders alive, after they were closed
2009-11-25 22:51:58 -02:00
Pablo Hoffman
a2854e3948
Added hack to speed up processing of IgnoreRequest errors ( #125 )
2009-11-25 22:28:59 -02:00
Pablo Hoffman
a48516b105
removed obsolete remove_escape_chars function - use replace_escape_chars instead
2009-11-25 22:09:47 -02:00
ajones1@gmail.com
0dcf47a706
fix name errors on robots.txt middleware during spider_close
2009-11-25 12:59:28 -08:00
Pablo Hoffman
a36909a3f9
Added BaseSpider objects to trackref
2009-11-25 16:59:47 -02:00
Pablo Hoffman
9311741236
Fixed memory leak in CachingResolver
2009-11-25 16:54:36 -02:00
Pablo Hoffman
1cfd959828
remove wrong support for returning Responses in scheduler middlewares
2009-11-25 11:20:43 -02:00
Pablo Hoffman
1a81ab6f07
renamed extension: DelayedCloseDomain to SpiderCloseDelay
...
--HG--
rename : scrapy/contrib/delayedclosedomain.py => scrapy/contrib/spiderclosedelay.py
2009-11-21 15:17:38 -02:00
Pablo Hoffman
326090d4a4
Reordered setting to preseve alphabetical order
2009-11-21 15:16:29 -02:00
Pablo Hoffman
a49aef2beb
Renamed exception: DontCloseDomain to DontCloseSpider ( closes #120 )
2009-11-21 15:06:03 -02:00
Pablo Hoffman
0d75a3a636
Automated merge with http://hg.scrapy.org/scrapy-stable/
2009-11-20 10:38:51 -02:00
Ismael Carnales
f86f62c5d9
Use setuptools for install, if not present fallback to distutils
2009-11-20 09:30:06 -02:00
Pablo Hoffman
dd662e09d8
some minor fixes to scheduler middleware doc
2009-11-19 12:23:54 -02:00
Pablo Hoffman
8c14abadbb
Automated merge with http://hg.scrapy.org/scrapy-stable/
2009-11-19 11:23:50 -02:00
Pablo Hoffman
9ff4dbf636
renamed file missing from previous commit
...
--HG--
rename : scrapy/xlib/patches.py => scrapy/xlib/twisted_250_monkeypatches.py
2009-11-19 11:23:36 -02:00
Pablo Hoffman
f4e93700bd
Automated merge with http://hg.scrapy.org/scrapy-stable/
2009-11-19 10:44:02 -02:00
Pablo Hoffman
bf55a4708f
prevent 'import scrapy' from failing when twisted module is not available, also moved twisted 2.5.0 monkeypatch into a more specific module name
2009-11-19 10:41:36 -02:00
Pablo Hoffman
c4f77c4da0
minor fixes to images doc (thanks amccloud)
2009-11-16 11:15:25 -02:00
Pablo Hoffman
0d6aee1f12
updated wrong documentation
2009-11-13 20:03:56 -02:00
Daniel Grana
445d8cd9f0
delay next_request check after stoping a spider close to avoid 100% cpu usage loops in some cases
2009-11-06 22:02:12 -02:00
Pablo Hoffman
aeab5370cb
StatsCollector: ported methods to receive spider instances ( closes #113 ), removed list_domains() method, added iter_spider_stats() method
2009-11-14 20:28:59 -02:00
Pablo Hoffman
c4c6e7c8cd
Automated merge with http://hg.scrapy.org/scrapy-stable/
2009-11-13 20:04:39 -02:00
Pablo Hoffman
505dfe6f7e
fixed exception when running scrapy shell on a non-textual response ( fixes #116 )
2009-11-13 17:21:59 -02:00
Pablo Hoffman
f3e861c89e
replaced old reference to domain instead of spider
2009-11-13 14:44:03 -02:00
Pablo Hoffman
07655d05ea
renamed REQUESTS_PER_SPIDER setting to CONCURRENT_REQUESTS_PER_SPIDER
2009-11-13 14:38:22 -02:00
Pablo Hoffman
564abd10ad
Refactored HttpCache middleware:
...
* simplified code
* performance improvements
* removed awkward/unused domain sectorization
* it can now receive Settings on constructor
* added unittests
* added documentation about filesystem storage structure
Also made scrapy.conf.Settings objects instantiable with a dict which is used to override default settings.
2009-11-13 14:25:47 -02:00
Pablo Hoffman
db7fec1fef
fixed doc typo
2009-11-12 12:17:39 -02:00
Pablo Hoffman
415dec4e16
made offsite middleware log messages when filtering out requests
2009-11-12 10:17:21 -02:00
Pablo Hoffman
ee08d38ab6
removed deprecated SCRAPYSETTINGS_MODULE environment variable
2009-11-06 16:56:17 -02:00
Pablo Hoffman
49e39bf1ba
fixed typo
2009-11-06 16:49:48 -02:00
Pablo Hoffman
4a2a20489d
removed deprecated ScrapedItem (previously kept for backwards compatibility)
2009-11-06 16:39:15 -02:00
Pablo Hoffman
791f4932dd
added clarification about versioning and api stability
2009-11-06 16:28:51 -02:00
Pablo Hoffman
9bf4e87753
removed deprecated scrapy.xpath module (previously kept for backwards compatibility)
2009-11-06 16:20:38 -02:00
Pablo Hoffman
40646d3cd1
replaced remaining uses of log.msg() 'domain' argument to use 'spider' instead
2009-11-06 16:11:37 -02:00
Pablo Hoffman
fee1b751e5
deprecated 'domain' argument in log.msg()
2009-11-06 16:02:12 -02:00
Pablo Hoffman
74d0e82dbe
renamed CloseDomain extension to CloseSpider, and renamed CLOSEDOMAIN_* settings to CLOSESPIDER_*
...
--HG--
rename : scrapy/contrib/closedomain.py => scrapy/contrib/closespider.py
2009-11-06 15:54:17 -02:00
Pablo Hoffman
cff4592b64
changed label
2009-11-06 15:48:49 -02:00
Pablo Hoffman
919cd5b789
renamed setting CONCURRENT_DOMAINS to CONCURRENT_SPIDERS
2009-11-06 15:44:11 -02:00
Pablo Hoffman
d604dca96d
renamed setting REQUESTS_PER_DOMAIN to REQUESTS_PER_SPIDER
2009-11-06 15:42:11 -02:00
Pablo Hoffman
580d82468e
fixed images pipeline bug caused by recent api changes
2009-11-06 14:29:37 -02:00
Pablo Hoffman
e5cae1e6c9
renamed setting
2009-11-06 14:19:56 -02:00
Pablo Hoffman
7728a23e99
Changed item pipeline API to pass spider references (instead of domain names) to process_item() method
2009-11-06 13:46:36 -02:00
Pablo Hoffman
a432c1ee40
updated logging doc to include new spider argument in log functions
2009-11-04 14:49:24 -02:00
Pablo Hoffman
69058e3b50
fixed bug in log.msg when receiving new spider argument
2009-11-04 14:45:56 -02:00