Ismael Carnales
fd41f06056
added doc on how to enable an Item Pipeline component
2009-09-16 14:19:16 -03:00
Ismael Carnales
404e7e09d7
changed spider doc references in BaseSpider class
2009-09-16 14:10:11 -03:00
Daniel Grana
062730cbd8
fix csv exporter documentation
2009-09-16 00:17:50 -03:00
Daniel Grana
08aedcbe76
Automated merge with ssh://hg.scrapy.org/scrapy
2009-09-15 09:35:24 -03:00
Daniel Grana
e8514445dc
bugfix image thumb when no thumb is configured
2009-09-15 09:35:21 -03:00
Pablo Hoffman
02e228ad76
added support for returning deferreds in spider manager close_domain() method, and making sure engine_stopped signal is always sent (even when no spiders have run)
2009-09-15 09:27:30 -03:00
Pablo Hoffman
66ff3b3013
renamed defer_failed to defer_fail
2009-09-15 08:44:31 -03:00
Pablo Hoffman
56b292e057
XmlItemExporter: added built-in support for exporting multi-valued fields (for convenience)
2009-09-14 22:05:52 -03:00
Pablo Hoffman
e8960bf616
added runspider command to run spiders directly, without having to create a project
2009-09-14 22:05:14 -03:00
Pablo Hoffman
fcbbb5001e
ported spiderctl web console extensin to work with new core based on spider references
2009-09-14 20:35:47 -03:00
Pablo Hoffman
47aa716630
adapted web console to use unix timestamps for uptime instead of datetime
2009-09-14 20:35:07 -03:00
Pablo Hoffman
bc463bc9e8
using time.time() instead of datetime.utcnow() in engine.start_time atttribute
2009-09-14 20:28:46 -03:00
Pablo Hoffman
7e07f76edd
made pending_spiders attribute protected in spider scheduler
2009-09-14 20:27:54 -03:00
Pablo Hoffman
9b68432624
commented out line that was preventing errors from propagating to the request errback
2009-09-14 12:27:29 -03:00
Pablo Hoffman
2322312c63
Logging requests instead of responses in 'Crawled ...' messages
2009-09-14 10:31:29 -03:00
Pablo Hoffman
6f64dfe579
renamed spider manager close_domain() method to close_spider()
2009-09-14 10:06:54 -03:00
Pablo Hoffman
99467d4e6e
Changed (unstable) scheduler middleware API to receive spider (instead of domain) in enqueue_request method
2009-09-13 20:51:43 -03:00
Pablo Hoffman
00873cd16c
Another Spider Manager simplification: removed add_spider() method
2009-09-12 21:27:58 -03:00
Pablo Hoffman
dc82550058
Do not impose an arbitrary encoding in spider templates, because we don't know beforehand what enconding our users will use in their editors.
2009-09-12 21:04:21 -03:00
Pablo Hoffman
58bdd7303c
fixed deprecated import in 'crawl' spider template (thanks Anibal)
2009-09-12 20:59:04 -03:00
Pablo Hoffman
90407d2789
added missing colon
2009-09-12 19:33:16 -03:00
Pablo Hoffman
3f30fee6ea
added first (not yet stable) revision of Crawler class, which allows to use the Scrapy crawler from stand-alone scripts
2009-09-12 19:32:23 -03:00
Pablo Hoffman
1381c1e50a
removed (no longer needed) hack in setup.py
2009-09-12 14:50:05 -03:00
Pablo Hoffman
921fc4f3bf
Big Scrapy core refactoring to pass around spider references instead of domains.
...
This is to avoid accessing the scrapy.spider.spiders singleton for "resolving"
spiders, which is considered an "evil" practice because it ties us to the
singleton model for the spider resolver, which is a bad thing.
This change will also work as the foundation for the API cleaning that we'll
perform for 0.8. We decided to introduce this change now to have a more common
basecode between 0.7 and 0.8, which will allow us to better support 0.7 until
0.8 is released.
However, this change doesn't modify the stable/documented API, nor does it
change the core logic. Those changes will land on the 0.8 branch, after 0.7 is
released.
--HG--
rename : scrapy/contrib/domainsch.py => scrapy/contrib/spiderscheduler.py
2009-09-12 14:34:18 -03:00
Pablo Hoffman
655cfe138d
removed unused imports
2009-09-11 19:38:31 -03:00
Pablo Hoffman
0c292e3350
removed hacky --callback option to crawl command
2009-09-11 19:36:00 -03:00
Pablo Hoffman
e854d0d6ef
removed redundant --nopipelines function. same behaviour can be obtained by clearing the ITEM_PIPELINES setting
2009-09-11 19:32:05 -03:00
Pablo Hoffman
8d49dc2fb5
changed IMAGES_THUMBS setting to a dict instead of a list of tuples, and more improvements to images pipeline doc
2009-09-11 17:36:00 -03:00
Pablo Hoffman
e20f766792
fixed some typos
2009-09-11 16:55:37 -03:00
Pablo Hoffman
c2fe350f72
more changes to images pipeline doc
2009-09-11 16:53:36 -03:00
Ismael Carnales
ada46a2dbb
styled imagesp doc
2009-09-11 15:30:46 -03:00
Pablo Hoffman
be0f2beef0
more cleanup to scheduler middelware doc, and permanentely moved to experimental doc
2009-09-11 13:27:31 -03:00
Pablo Hoffman
0af052b68f
removed confusing title
2009-09-11 12:19:18 -03:00
Pablo Hoffman
f3240748cb
changed link to scheduler middleware doc, now in experimental
2009-09-11 12:03:23 -03:00
Ismael Carnales
3998a0cb58
added more scheduler middleware documentation, and moved it to experimental
...
--HG--
rename : docs/topics/scheduler-middleware.rst => docs/experimental/scheduler-middleware.rst
2009-09-11 11:58:53 -03:00
Pablo Hoffman
d242a20573
updated images pipeline doc
2009-09-11 11:47:12 -03:00
Daniel Grana
40d38b18d8
Automated merge with ssh://hg.scrapy.org/scrapy
2009-09-11 11:04:47 -03:00
Daniel Grana
96bc6780d3
imagespipeline: change scraped_url to url
2009-09-11 11:03:53 -03:00
Pablo Hoffman
0174bee4bc
simplified implementation of scrapy.fetcher
2009-09-10 19:27:47 -03:00
Pablo Hoffman
f1bb8dc2a3
first cleanup of spider manager api
...
- removed asdict() and reload() methods
- added list() method
- removed default spider
2009-09-10 19:06:46 -03:00
Pablo Hoffman
f85813cd94
added FAQ entry about scrapy recipes and community spiders
2009-09-10 18:32:50 -03:00
Daniel Grana
734464825b
allow to override httpclientfactory
2009-09-10 16:28:08 -03:00
Pablo Hoffman
269724a2b7
added Debugger extension, removed StackTraceDump from extensions available by default
2009-09-08 22:32:17 -03:00
Pablo Hoffman
2974c2c4b5
some additional checks on using unicode url/body in Request/Response objects
2009-09-07 15:20:41 -03:00
Pablo Hoffman
7a88c0d8e5
shell: fixed bug when typing exit() in python console - fixes #103
2009-09-07 14:57:50 -03:00
Ismael Carnales
4ddfa9a2a3
stlyed downloaded middleware doc
2009-09-07 12:18:57 -03:00
Ismael Carnales
e3df11e5bb
added module directive to spidermw documentation
2009-09-07 12:03:24 -03:00
Ismael Carnales
30c2ad3f0c
added urllength spider middleware test
2009-09-07 11:14:47 -03:00
Ismael Carnales
c4ad2bea5d
added urlfilter spidermw test
2009-09-07 11:14:47 -03:00
Ismael Carnales
43bd00dea2
added referer spider middleware test
2009-09-07 11:14:46 -03:00