Daniel Grana
aba16c20c4
Automated merge with ssh://hg.scrapy.org/scrapy
2009-07-09 14:38:56 -03:00
Daniel Grana
a8de5cef6e
remove xlib hack that appends scrapy/xlib to sys.path
2009-07-09 14:37:59 -03:00
Ismael Carnales
32c25f5a36
complete the newitem tests
2009-07-09 13:03:54 -03:00
Ismael Carnales
25b53df191
merge with trunk
2009-07-09 13:02:49 -03:00
Pablo Hoffman
60e7b80798
removed signal docs from core.signals module, to leave them only in once place (the doc)
2009-07-09 12:57:10 -03:00
Ismael Carnales
f31f75c0e2
remove required attribute from newitem (until we add a validation framework)
2009-07-09 12:54:02 -03:00
Ismael Carnales
9e3e41f946
added more newitem documentation in proposed
2009-07-09 11:29:04 -03:00
Pablo Hoffman
b071681cd4
removed duplicated spiders doc (which used autodoc)
2009-07-09 11:14:33 -03:00
Pablo Hoffman
4f19115a80
removed old setting from default_settings.py, updated doc of CONCURRENT_ITEMS setting
2009-07-09 10:56:15 -03:00
Pablo Hoffman
a4b728f2b2
Scraper: added lower limit for responses sizes, removed redundant line
2009-07-09 10:55:30 -03:00
Pablo Hoffman
8b26e49636
Added new ItemProcessor component to Scraper component
2009-07-08 23:48:06 -03:00
Pablo Hoffman
42b86a385f
removed wtf line
2009-07-08 18:19:54 -03:00
pablo
5cbafaea7f
StackTraceDump extension: using USR2 signal to avoid collision with other stuff that uses USR1 (such as twistd log rotation)
2009-07-08 09:19:35 -03:00
Daniel Grana
b83851dcc3
remove unused lines from shell command
2009-07-07 16:24:59 -03:00
Daniel Grana
8e5ede7179
shell command was broken by recent commits because scrapyengine.crawl does not returns a deferred anymore, now we use scrapyengine.schedule that returns the deferred of the download response
2009-07-07 16:22:23 -03:00
damian
1ba98606c2
test.test_utils_url: update parameter name; utils.url: minor code clean up
2009-07-07 12:35:24 -03:00
damian
460f690c5c
utils.url: add_or_replace_parameter function fixed, quoted urls support and test cases added
2009-07-07 11:20:26 -03:00
pablo
c205f7d8e5
added missing comment for non-trivial code
2009-07-06 20:38:39 -03:00
Daniel Grana
a15dc94340
images: images uploaded trough amazon s3 special spider must be scheduled
2009-07-06 16:16:49 -03:00
Daniel Grana
2e52005847
rewrite RequestLimitMiddleware spidermw so it does not consume spider output at once
...
--HG--
rename : scrapy/contrib/spidermiddleware/limit.py => scrapy/contrib/spidermiddleware/requestlimit.py
2009-07-06 15:35:36 -03:00
Pablo Hoffman
31b3d7ce1e
Added flow control mechanism to new Scraper component, to prevent cases where memory fills because of requests being downloaded much faster than they can be processed (by the spiders)
2009-07-06 15:31:50 -03:00
Daniel Grana
4f1d388733
Cleanup scrapyengine.crawl by moving functionality inside a new component named Scraper
2009-07-06 15:31:50 -03:00
Daniel Grana
3cb18dbbbb
Move itempipeline functionality outside of engine as a spidermiddleware
2009-07-06 15:31:50 -03:00
pablo
2ce43ebbec
made downloader/scheduler/spider middlewares code more consistent, added enabled/disabled/loaded informational attributes to all of them
2009-07-06 01:07:45 -03:00
Daniel Grana
f467c233b2
downloader: process queue inmediately after downloading the response
2009-07-03 01:32:24 -03:00
Pablo Hoffman
0c4c153819
improved Scrapy documentation index for better usability
2009-07-01 09:51:57 -03:00
Pablo Hoffman
af6db1691e
added scrapy.log.logmessage_received signal
2009-06-26 12:27:03 -03:00
Pablo Hoffman
80cd534f92
removed redundant botname from log lines
2009-06-25 16:48:04 -03:00
Pablo Hoffman
18301b7e66
downloader: performance improvement for sites that use download delay (replace datetime by time)
2009-06-25 14:13:45 -03:00
Pablo Hoffman
7933e00ebd
set more proper request priority for robots middleware and media pipeline
2009-06-25 12:10:55 -03:00
Pablo Hoffman
c22d2b1587
engine: added domain_is_open() method, added docstring for domain_is_closed() method
2009-06-25 09:56:38 -03:00
Pablo Hoffman
8de09fe4dd
improved documentation of Downloader._download() method and fixed bug with process_queue() being called too early
2009-06-24 17:08:16 -03:00
Daniel Grana
830cd4f19f
Restore download process queue processing after finish with recent transferred response
2009-06-24 13:45:50 -03:00
Pablo Hoffman
87df33ce0a
s/_next_request_called/_next_request_pending/
2009-06-24 10:36:36 -03:00
Pablo Hoffman
51029e37a3
engine: removed obsolete docstring and simplified next_request method
2009-06-24 10:34:44 -03:00
Daniel Grana
d7d18d27df
avoid rescheduling next_request calls
2009-06-24 10:28:34 -03:00
Pablo Hoffman
2b65f20c26
engine: removed redundant line and unused import
2009-06-23 21:50:46 -03:00
Daniel Grana
7578ab00a2
Automated merge with ssh://hg.scrapy.org/scrapy
2009-06-23 16:47:58 -03:00
Daniel Grana
73b60788c1
log framework errors at the end of crawling
2009-06-23 16:47:32 -03:00
Pablo Hoffman
93fcf6e314
added web console docstring pointing to documentation, improved telnet console docstring
2009-06-23 16:11:23 -03:00
Pablo Hoffman
834ac9fca0
Some telnet console changes:
...
- added telnet console documentation
- added documentation for debugging memory leaks with guppy
- sorted out shell alises
- set default port (TELNETCONSOLE_PORT) to 6023
2009-06-23 16:08:58 -03:00
Daniel Grana
32bae8040d
add basic mustbe_deferred tests
2009-06-23 14:59:03 -03:00
daniel
9b78f929de
remove obsolete deferred_imap util, use coiterate+imap instead
2009-06-23 14:45:16 -07:00
dgrana
e9c724e5b7
no need for two callbacks while processing scraping responses
2009-06-23 13:26:24 -07:00
dgrana
6809375815
restore call to next_request inside pipeline output processor
2009-06-23 13:19:35 -07:00
dgrana
21ed9a24a3
fix replace of deferred_imap by coiterate+imap and fix broken engine test
2009-06-23 13:14:22 -07:00
dgrana
49b413ba40
merge
2009-06-23 13:00:01 -07:00
dgrana
38f184e42a
remove calls to chain_deferred and deferred_imap
2009-06-23 12:53:13 -07:00
Pablo Hoffman
a8fd107ad4
engine: some extra simplifications and removed debug mode
2009-06-22 23:01:41 -03:00
Pablo Hoffman
a1cca0da50
removed obsolete file
2009-06-22 22:55:18 -03:00