Pablo Hoffman
|
1a153d47f3
|
improved invalid xpath exception message in xpath selectors, and added unittests
|
2009-07-11 17:19:20 -03:00 |
|
Pablo Hoffman
|
fc64360a34
|
removed unused lines
|
2009-07-11 16:42:32 -03:00 |
|
Pablo Hoffman
|
e3caf00d7a
|
simplified implementation of spider manager by removing knowledge of enabled spiders
|
2009-07-10 16:41:02 -03:00 |
|
dgrana
|
4cd1fa9c32
|
generate dropin.cache for spiders under tests
|
2009-07-10 05:29:27 +01:00 |
|
Pablo Hoffman
|
9270810840
|
improved usage of urljoin_rfc function, adding unittests and encoding where needed
|
2009-07-09 18:45:40 -03:00 |
|
Daniel Grana
|
d5d2c5c924
|
update documentation to recent pydispatcher import path change
|
2009-07-09 17:13:30 -03:00 |
|
Daniel Grana
|
18fbd7c7eb
|
Automated merge with ssh://hg.scrapy.org/scrapy
|
2009-07-09 16:58:07 -03:00 |
|
Daniel Grana
|
eff8ea6173
|
remove response from item_passed and item_dropped signal api
|
2009-07-09 16:57:03 -03:00 |
|
Pablo Hoffman
|
5da32d9f6d
|
fixed Sphinx warning
|
2009-07-09 16:50:13 -03:00 |
|
Pablo Hoffman
|
ae7333d598
|
added simplejson optional dependency to doc
|
2009-07-09 16:49:20 -03:00 |
|
Daniel Grana
|
aba16c20c4
|
Automated merge with ssh://hg.scrapy.org/scrapy
|
2009-07-09 14:38:56 -03:00 |
|
Daniel Grana
|
a8de5cef6e
|
remove xlib hack that appends scrapy/xlib to sys.path
|
2009-07-09 14:37:59 -03:00 |
|
Ismael Carnales
|
32c25f5a36
|
complete the newitem tests
|
2009-07-09 13:03:54 -03:00 |
|
Ismael Carnales
|
25b53df191
|
merge with trunk
|
2009-07-09 13:02:49 -03:00 |
|
Pablo Hoffman
|
60e7b80798
|
removed signal docs from core.signals module, to leave them only in once place (the doc)
|
2009-07-09 12:57:10 -03:00 |
|
Ismael Carnales
|
f31f75c0e2
|
remove required attribute from newitem (until we add a validation framework)
|
2009-07-09 12:54:02 -03:00 |
|
Ismael Carnales
|
9e3e41f946
|
added more newitem documentation in proposed
|
2009-07-09 11:29:04 -03:00 |
|
Pablo Hoffman
|
b071681cd4
|
removed duplicated spiders doc (which used autodoc)
|
2009-07-09 11:14:33 -03:00 |
|
Pablo Hoffman
|
4f19115a80
|
removed old setting from default_settings.py, updated doc of CONCURRENT_ITEMS setting
|
2009-07-09 10:56:15 -03:00 |
|
Pablo Hoffman
|
a4b728f2b2
|
Scraper: added lower limit for responses sizes, removed redundant line
|
2009-07-09 10:55:30 -03:00 |
|
Pablo Hoffman
|
8b26e49636
|
Added new ItemProcessor component to Scraper component
|
2009-07-08 23:48:06 -03:00 |
|
Pablo Hoffman
|
42b86a385f
|
removed wtf line
|
2009-07-08 18:19:54 -03:00 |
|
pablo
|
5cbafaea7f
|
StackTraceDump extension: using USR2 signal to avoid collision with other stuff that uses USR1 (such as twistd log rotation)
|
2009-07-08 09:19:35 -03:00 |
|
Daniel Grana
|
b83851dcc3
|
remove unused lines from shell command
|
2009-07-07 16:24:59 -03:00 |
|
Daniel Grana
|
8e5ede7179
|
shell command was broken by recent commits because scrapyengine.crawl does not returns a deferred anymore, now we use scrapyengine.schedule that returns the deferred of the download response
|
2009-07-07 16:22:23 -03:00 |
|
damian
|
1ba98606c2
|
test.test_utils_url: update parameter name; utils.url: minor code clean up
|
2009-07-07 12:35:24 -03:00 |
|
damian
|
460f690c5c
|
utils.url: add_or_replace_parameter function fixed, quoted urls support and test cases added
|
2009-07-07 11:20:26 -03:00 |
|
pablo
|
c205f7d8e5
|
added missing comment for non-trivial code
|
2009-07-06 20:38:39 -03:00 |
|
Daniel Grana
|
a15dc94340
|
images: images uploaded trough amazon s3 special spider must be scheduled
|
2009-07-06 16:16:49 -03:00 |
|
Daniel Grana
|
2e52005847
|
rewrite RequestLimitMiddleware spidermw so it does not consume spider output at once
--HG--
rename : scrapy/contrib/spidermiddleware/limit.py => scrapy/contrib/spidermiddleware/requestlimit.py
|
2009-07-06 15:35:36 -03:00 |
|
Pablo Hoffman
|
31b3d7ce1e
|
Added flow control mechanism to new Scraper component, to prevent cases where memory fills because of requests being downloaded much faster than they can be processed (by the spiders)
|
2009-07-06 15:31:50 -03:00 |
|
Daniel Grana
|
4f1d388733
|
Cleanup scrapyengine.crawl by moving functionality inside a new component named Scraper
|
2009-07-06 15:31:50 -03:00 |
|
Daniel Grana
|
3cb18dbbbb
|
Move itempipeline functionality outside of engine as a spidermiddleware
|
2009-07-06 15:31:50 -03:00 |
|
pablo
|
2ce43ebbec
|
made downloader/scheduler/spider middlewares code more consistent, added enabled/disabled/loaded informational attributes to all of them
|
2009-07-06 01:07:45 -03:00 |
|
Daniel Grana
|
f467c233b2
|
downloader: process queue inmediately after downloading the response
|
2009-07-03 01:32:24 -03:00 |
|
Pablo Hoffman
|
0c4c153819
|
improved Scrapy documentation index for better usability
|
2009-07-01 09:51:57 -03:00 |
|
Pablo Hoffman
|
af6db1691e
|
added scrapy.log.logmessage_received signal
|
2009-06-26 12:27:03 -03:00 |
|
Pablo Hoffman
|
80cd534f92
|
removed redundant botname from log lines
|
2009-06-25 16:48:04 -03:00 |
|
Pablo Hoffman
|
18301b7e66
|
downloader: performance improvement for sites that use download delay (replace datetime by time)
|
2009-06-25 14:13:45 -03:00 |
|
Pablo Hoffman
|
7933e00ebd
|
set more proper request priority for robots middleware and media pipeline
|
2009-06-25 12:10:55 -03:00 |
|
Pablo Hoffman
|
c22d2b1587
|
engine: added domain_is_open() method, added docstring for domain_is_closed() method
|
2009-06-25 09:56:38 -03:00 |
|
Pablo Hoffman
|
8de09fe4dd
|
improved documentation of Downloader._download() method and fixed bug with process_queue() being called too early
|
2009-06-24 17:08:16 -03:00 |
|
Daniel Grana
|
830cd4f19f
|
Restore download process queue processing after finish with recent transferred response
|
2009-06-24 13:45:50 -03:00 |
|
Pablo Hoffman
|
87df33ce0a
|
s/_next_request_called/_next_request_pending/
|
2009-06-24 10:36:36 -03:00 |
|
Pablo Hoffman
|
51029e37a3
|
engine: removed obsolete docstring and simplified next_request method
|
2009-06-24 10:34:44 -03:00 |
|
Daniel Grana
|
d7d18d27df
|
avoid rescheduling next_request calls
|
2009-06-24 10:28:34 -03:00 |
|
Pablo Hoffman
|
2b65f20c26
|
engine: removed redundant line and unused import
|
2009-06-23 21:50:46 -03:00 |
|
Daniel Grana
|
7578ab00a2
|
Automated merge with ssh://hg.scrapy.org/scrapy
|
2009-06-23 16:47:58 -03:00 |
|
Daniel Grana
|
73b60788c1
|
log framework errors at the end of crawling
|
2009-06-23 16:47:32 -03:00 |
|
Pablo Hoffman
|
93fcf6e314
|
added web console docstring pointing to documentation, improved telnet console docstring
|
2009-06-23 16:11:23 -03:00 |
|