Pablo Hoffman
58bdd7303c
fixed deprecated import in 'crawl' spider template (thanks Anibal)
2009-09-12 20:59:04 -03:00
Pablo Hoffman
90407d2789
added missing colon
2009-09-12 19:33:16 -03:00
Pablo Hoffman
3f30fee6ea
added first (not yet stable) revision of Crawler class, which allows to use the Scrapy crawler from stand-alone scripts
2009-09-12 19:32:23 -03:00
Pablo Hoffman
1381c1e50a
removed (no longer needed) hack in setup.py
2009-09-12 14:50:05 -03:00
Pablo Hoffman
921fc4f3bf
Big Scrapy core refactoring to pass around spider references instead of domains.
...
This is to avoid accessing the scrapy.spider.spiders singleton for "resolving"
spiders, which is considered an "evil" practice because it ties us to the
singleton model for the spider resolver, which is a bad thing.
This change will also work as the foundation for the API cleaning that we'll
perform for 0.8. We decided to introduce this change now to have a more common
basecode between 0.7 and 0.8, which will allow us to better support 0.7 until
0.8 is released.
However, this change doesn't modify the stable/documented API, nor does it
change the core logic. Those changes will land on the 0.8 branch, after 0.7 is
released.
--HG--
rename : scrapy/contrib/domainsch.py => scrapy/contrib/spiderscheduler.py
2009-09-12 14:34:18 -03:00
Pablo Hoffman
655cfe138d
removed unused imports
2009-09-11 19:38:31 -03:00
Pablo Hoffman
0c292e3350
removed hacky --callback option to crawl command
2009-09-11 19:36:00 -03:00
Pablo Hoffman
e854d0d6ef
removed redundant --nopipelines function. same behaviour can be obtained by clearing the ITEM_PIPELINES setting
2009-09-11 19:32:05 -03:00
Pablo Hoffman
8d49dc2fb5
changed IMAGES_THUMBS setting to a dict instead of a list of tuples, and more improvements to images pipeline doc
2009-09-11 17:36:00 -03:00
Pablo Hoffman
e20f766792
fixed some typos
2009-09-11 16:55:37 -03:00
Pablo Hoffman
c2fe350f72
more changes to images pipeline doc
2009-09-11 16:53:36 -03:00
Ismael Carnales
ada46a2dbb
styled imagesp doc
2009-09-11 15:30:46 -03:00
Pablo Hoffman
be0f2beef0
more cleanup to scheduler middelware doc, and permanentely moved to experimental doc
2009-09-11 13:27:31 -03:00
Pablo Hoffman
0af052b68f
removed confusing title
2009-09-11 12:19:18 -03:00
Pablo Hoffman
f3240748cb
changed link to scheduler middleware doc, now in experimental
2009-09-11 12:03:23 -03:00
Ismael Carnales
3998a0cb58
added more scheduler middleware documentation, and moved it to experimental
...
--HG--
rename : docs/topics/scheduler-middleware.rst => docs/experimental/scheduler-middleware.rst
2009-09-11 11:58:53 -03:00
Pablo Hoffman
d242a20573
updated images pipeline doc
2009-09-11 11:47:12 -03:00
Daniel Grana
40d38b18d8
Automated merge with ssh://hg.scrapy.org/scrapy
2009-09-11 11:04:47 -03:00
Daniel Grana
96bc6780d3
imagespipeline: change scraped_url to url
2009-09-11 11:03:53 -03:00
Pablo Hoffman
0174bee4bc
simplified implementation of scrapy.fetcher
2009-09-10 19:27:47 -03:00
Pablo Hoffman
f1bb8dc2a3
first cleanup of spider manager api
...
- removed asdict() and reload() methods
- added list() method
- removed default spider
2009-09-10 19:06:46 -03:00
Pablo Hoffman
f85813cd94
added FAQ entry about scrapy recipes and community spiders
2009-09-10 18:32:50 -03:00
Daniel Grana
734464825b
allow to override httpclientfactory
2009-09-10 16:28:08 -03:00
Pablo Hoffman
269724a2b7
added Debugger extension, removed StackTraceDump from extensions available by default
2009-09-08 22:32:17 -03:00
Pablo Hoffman
2974c2c4b5
some additional checks on using unicode url/body in Request/Response objects
2009-09-07 15:20:41 -03:00
Pablo Hoffman
7a88c0d8e5
shell: fixed bug when typing exit() in python console - fixes #103
2009-09-07 14:57:50 -03:00
Ismael Carnales
4ddfa9a2a3
stlyed downloaded middleware doc
2009-09-07 12:18:57 -03:00
Ismael Carnales
e3df11e5bb
added module directive to spidermw documentation
2009-09-07 12:03:24 -03:00
Ismael Carnales
30c2ad3f0c
added urllength spider middleware test
2009-09-07 11:14:47 -03:00
Ismael Carnales
c4ad2bea5d
added urlfilter spidermw test
2009-09-07 11:14:47 -03:00
Ismael Carnales
43bd00dea2
added referer spider middleware test
2009-09-07 11:14:46 -03:00
Ismael Carnales
1c700749ae
added offside spider middleware test
2009-09-07 11:14:45 -03:00
Ismael Carnales
083635ebaf
added depth spider middleware test
2009-09-07 11:14:43 -03:00
Ismael Carnales
f0b5892aa5
fixed stats downloadermw test
2009-09-07 11:14:38 -03:00
Pablo Hoffman
82ca5e26f5
renamed test_xpath.py to test_selector.py
...
--HG--
rename : scrapy/tests/test_xpath.py => scrapy/tests/test_selector.py
2009-09-07 09:38:24 -03:00
Pablo Hoffman
4023914b10
added support for instantiating TextResponse (or any subclass) with unicode urls, improved organization of request/response unittests
2009-09-07 09:37:46 -03:00
Pablo Hoffman
e2bd1be995
better aws code arrangement
...
--HG--
rename : scrapy/tests/test_aws.py => scrapy/tests/test_utils_aws.py
2009-09-04 18:07:51 -03:00
Pablo Hoffman
827aa19c6e
removed obsolete scrapy.utils.db module
2009-09-04 17:38:14 -03:00
Pablo Hoffman
7466150286
removed some more obsolete middlewares
2009-09-04 17:32:34 -03:00
Pablo Hoffman
861a803cc3
removed obsolete RestrictMiddleware
2009-09-04 17:22:56 -03:00
Pablo Hoffman
0631102153
removed backwards compatibility for old errorpages downloader middlware
2009-09-04 17:19:03 -03:00
Ismael Carnales
043e7355f7
added some missing spidermw tests
2009-09-04 14:11:56 -03:00
Ismael Carnales
7e2587169b
added missing middleware docs
2009-09-04 12:39:02 -03:00
Ismael Carnales
6d127d7fcf
added some missing middlewares tests
2009-09-04 12:29:43 -03:00
Pablo Hoffman
aefb94063a
more updates to spider middleware doc
2009-09-04 13:46:04 -03:00
Pablo Hoffman
d04640be5c
some improvements to spider middleware doc
2009-09-04 13:29:16 -03:00
Pablo Hoffman
96bb223c13
removed (pretty useless) DebugMiddleware
2009-09-04 12:59:58 -03:00
Pablo Hoffman
86de5180fc
fixed bug in robots middleware reported by fencer in #101
2009-09-04 12:36:29 -03:00
Pablo Hoffman
dad05957f3
added comment downloader backout policy
2009-09-04 01:16:58 -03:00
Daniel Grana
2ae11e9220
meassure downloader backout based on active requests that includes those in downlodermw plus queue
2009-09-03 16:58:36 -03:00