Ismael Carnales
6d127d7fcf
added some missing middlewares tests
2009-09-04 12:29:43 -03:00
Pablo Hoffman
aefb94063a
more updates to spider middleware doc
2009-09-04 13:46:04 -03:00
Pablo Hoffman
d04640be5c
some improvements to spider middleware doc
2009-09-04 13:29:16 -03:00
Pablo Hoffman
96bb223c13
removed (pretty useless) DebugMiddleware
2009-09-04 12:59:58 -03:00
Pablo Hoffman
86de5180fc
fixed bug in robots middleware reported by fencer in #101
2009-09-04 12:36:29 -03:00
Pablo Hoffman
dad05957f3
added comment downloader backout policy
2009-09-04 01:16:58 -03:00
Daniel Grana
2ae11e9220
meassure downloader backout based on active requests that includes those in downlodermw plus queue
2009-09-03 16:58:36 -03:00
Daniel Grana
8b4304d5eb
change csv exporter to check flag inmediately instead of calling another function
2009-09-03 16:26:05 -03:00
Daniel Grana
5ff8ed8686
media_pipeline: let failures reach item_completed
...
--HG--
extra : rebase_source : ec574d95957646498f021c25953ac1405d1c748c
2009-09-03 16:10:45 -03:00
Pablo Hoffman
8a715701ec
fixed another doc typo
2009-09-03 14:31:00 -03:00
Pablo Hoffman
760cf452c5
automatic merge
2009-09-03 14:30:17 -03:00
Pablo Hoffman
0d493615ab
some code rearrangement without functionality changes
2009-09-03 14:29:20 -03:00
Daniel Grana
0e7b2a6da5
write header line by default when using csv exporter
...
--HG--
extra : rebase_source : 2d2d7153dde5e3f77e682e16d2e4408f732f234e
2009-09-03 13:58:39 -03:00
Pablo Hoffman
c33d6bbdbd
avoid shutting down the reactor from two places, for now
2009-09-03 12:41:04 -03:00
Ismael Carnales
3c1bb7bc40
fixed typo in djangoitems doc (thanks anibal)
2009-09-03 11:23:25 -03:00
Pablo Hoffman
be76490002
avoid stopping reactor if it's already in shutdown stage (where reactor.running is still True)
2009-09-03 10:25:55 -03:00
Pablo Hoffman
0643010f0c
engine: stopping reactor when there's nothing left to do
2009-09-03 09:47:59 -03:00
Pablo Hoffman
a7f5c6f878
Some enhancements to Scrapy core:
...
- added graceful shutdown (with one ^C) and forced shutdown (two ^C)
- added optional loglevel to IgnoreRequest exception
- calling Spider Manager close_domain() function from engine to avoid race
conditions caused by signal dispatch order
- moved twisted reactor controlling logic outside the engine and into the
scrapy manager
- made log.err function respect the current log level filter
2009-09-03 08:27:48 -03:00
Pablo Hoffman
51eb641fc7
fix bug in scrapy shell which was hiding the objects fetch/view/shelp when started without a url as argument
2009-09-02 20:53:55 -03:00
Pablo Hoffman
37929f42b3
spider manager: added protection to avoid reloading non-spider modules
2009-09-02 20:43:05 -03:00
Pablo Hoffman
11f296b753
removed hack for switching standard descriptors in favor of using LOG_STDOUT=False by default
2009-09-02 16:22:45 -03:00
Daniel Grana
8ed743f265
remove obsolete s3 images pipeline
2009-09-02 15:38:53 -03:00
Pablo Hoffman
45bc12e268
added scrapy.utils.pdb module with set_trace() function
2009-09-02 12:09:36 -03:00
Daniel Grana
19c0093e50
fix images pipeline tests
2009-09-02 01:31:08 -03:00
Daniel Grana
11c820232b
remove MEDIA_NAME from imagespipeline
...
--HG--
extra : rebase_source : 976f3ca2fccf185399a733904b9763498ebf43f6
2009-09-01 19:18:05 -03:00
Daniel Grana
d6adc141de
imagepipeline: simplify configuraiton using a single setting to setup different backend stores
...
--HG--
extra : rebase_source : cd0d5f560dcde0003974e360282a233ba387c5be
2009-09-01 18:00:56 -03:00
Daniel Grana
b05fd80640
imagepipeline: return a dict instead of a path joined with checksum with a hash
...
--HG--
extra : rebase_source : cd450c66be44381e905264ecff3f709bb9d302d4
2009-09-01 18:00:55 -03:00
Daniel Grana
91ff9c9d50
mediapipeline: simplify and return results of item_media_* to item_completed in order
...
--HG--
extra : rebase_source : 71228e29b0a969df06ac62d90d6e3cefc03ecdfc
2009-09-01 18:00:55 -03:00
Pablo Hoffman
db5420f8d5
reload spider modules silently
2009-09-02 00:48:29 -03:00
Pablo Hoffman
596d2c4479
moved CoreStats extension to scrapy.contrib.corestats
...
--HG--
rename : scrapy/stats/corestats.py => scrapy/contrib/corestats.py
2009-09-01 23:00:49 -03:00
Pablo Hoffman
5ce2b66c56
added proper recycling of spider resources to spider manager
2009-09-01 22:53:04 -03:00
Pablo Hoffman
34d559003d
removed empty scrapy.contrib.spider module
2009-09-01 22:50:36 -03:00
Pablo Hoffman
6a50af05d7
removed useless SpiderReloader extension
2009-09-01 22:49:15 -03:00
Pablo Hoffman
79851aefa6
moved SpiderProfiler extension to scrapy.contrib_exp and removed references from documentation
...
--HG--
rename : scrapy/contrib/spider/profiler.py => scrapy/contrib_exp/spiderprofiler.py
2009-09-01 22:38:37 -03:00
Pablo Hoffman
d3c51fd6f2
improved images pipeline documentation
2009-09-01 21:07:47 -03:00
Pablo Hoffman
18fd635124
another doc typo
2009-09-01 12:52:40 -03:00
Pablo Hoffman
538cc9803a
fixed doc typo
2009-09-01 12:47:53 -03:00
Pablo Hoffman
e1b33ef165
added missing module from previous commit
2009-09-01 09:04:41 -03:00
Pablo Hoffman
df0e1f005f
exporters doc: fixed example and some typos
2009-09-01 08:56:54 -03:00
Pablo Hoffman
f68dedb284
commented out code that raises sporadically
2009-08-31 22:48:01 -03:00
Pablo Hoffman
ac8f46ce9e
added File Export Pipeline reference to Exporters doc
2009-08-31 21:01:35 -03:00
Pablo Hoffman
8d006e9ea1
moved item exporters doc to stable doc
...
--HG--
rename : docs/experimental/exporters.rst => docs/topics/exporters.rst
2009-08-31 20:47:12 -03:00
Pablo Hoffman
0b152c99b5
added File Export Pipeline, a wrapper to use Item Exporters as Item Pipelines
2009-08-31 20:40:41 -03:00
Pablo Hoffman
3c6deaeffc
Some additional improvements to scrapy.command.cmdline logic:
...
- calling scrapymanager.configure() for all commands
- finally added some unittests to check cmdline behaviour!
2009-08-31 18:53:55 -03:00
Pablo Hoffman
6c58d06f0f
wrapped some big lines
2009-08-31 18:50:20 -03:00
Pablo Hoffman
b6c1392f9b
added 'settings' command for querying scrapy settings
2009-08-31 18:43:51 -03:00
Pablo Hoffman
60299c49ca
raise NotConfigured in web/telnetconsole when disabled
2009-08-31 13:42:44 -03:00
Pablo Hoffman
8fab524978
moved engine.getstatus() method to scrapy.utils.engine function, to leave reporting logic out of engine code. added est() shortcut to telnet console
2009-08-31 12:44:32 -03:00
Pablo Hoffman
dd80f6acdf
MemoryUsage: changed .virtual property to methodd. SpiderProfiler: removed dependency on MemoryUsage extension
2009-08-31 12:14:30 -03:00
Pablo Hoffman
e8e760c974
more simplifications to scrapy engine: removed addtasks method
2009-08-31 12:04:01 -03:00