1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 17:43:51 +00:00

1718 Commits

Author SHA1 Message Date
Pablo Hoffman
58bdd7303c fixed deprecated import in 'crawl' spider template (thanks Anibal) 2009-09-12 20:59:04 -03:00
Pablo Hoffman
90407d2789 added missing colon 2009-09-12 19:33:16 -03:00
Pablo Hoffman
3f30fee6ea added first (not yet stable) revision of Crawler class, which allows to use the Scrapy crawler from stand-alone scripts 2009-09-12 19:32:23 -03:00
Pablo Hoffman
1381c1e50a removed (no longer needed) hack in setup.py 2009-09-12 14:50:05 -03:00
Pablo Hoffman
921fc4f3bf Big Scrapy core refactoring to pass around spider references instead of domains.
This is to avoid accessing the scrapy.spider.spiders singleton for "resolving"
spiders, which is considered an "evil" practice because it ties us to the
singleton model for the spider resolver, which is a bad thing.

This change will also work as the foundation for the API cleaning that we'll
perform for 0.8. We decided to introduce this change now to have a more common
basecode between 0.7 and 0.8, which will allow us to better support 0.7 until
0.8 is released.

However, this change doesn't modify the stable/documented API, nor does it
change the core logic. Those changes will land on the 0.8 branch, after 0.7 is
released.

--HG--
rename : scrapy/contrib/domainsch.py => scrapy/contrib/spiderscheduler.py
2009-09-12 14:34:18 -03:00
Pablo Hoffman
655cfe138d removed unused imports 2009-09-11 19:38:31 -03:00
Pablo Hoffman
0c292e3350 removed hacky --callback option to crawl command 2009-09-11 19:36:00 -03:00
Pablo Hoffman
e854d0d6ef removed redundant --nopipelines function. same behaviour can be obtained by clearing the ITEM_PIPELINES setting 2009-09-11 19:32:05 -03:00
Pablo Hoffman
8d49dc2fb5 changed IMAGES_THUMBS setting to a dict instead of a list of tuples, and more improvements to images pipeline doc 2009-09-11 17:36:00 -03:00
Pablo Hoffman
e20f766792 fixed some typos 2009-09-11 16:55:37 -03:00
Pablo Hoffman
c2fe350f72 more changes to images pipeline doc 2009-09-11 16:53:36 -03:00
Ismael Carnales
ada46a2dbb styled imagesp doc 2009-09-11 15:30:46 -03:00
Pablo Hoffman
be0f2beef0 more cleanup to scheduler middelware doc, and permanentely moved to experimental doc 2009-09-11 13:27:31 -03:00
Pablo Hoffman
0af052b68f removed confusing title 2009-09-11 12:19:18 -03:00
Pablo Hoffman
f3240748cb changed link to scheduler middleware doc, now in experimental 2009-09-11 12:03:23 -03:00
Ismael Carnales
3998a0cb58 added more scheduler middleware documentation, and moved it to experimental
--HG--
rename : docs/topics/scheduler-middleware.rst => docs/experimental/scheduler-middleware.rst
2009-09-11 11:58:53 -03:00
Pablo Hoffman
d242a20573 updated images pipeline doc 2009-09-11 11:47:12 -03:00
Daniel Grana
40d38b18d8 Automated merge with ssh://hg.scrapy.org/scrapy 2009-09-11 11:04:47 -03:00
Daniel Grana
96bc6780d3 imagespipeline: change scraped_url to url 2009-09-11 11:03:53 -03:00
Pablo Hoffman
0174bee4bc simplified implementation of scrapy.fetcher 2009-09-10 19:27:47 -03:00
Pablo Hoffman
f1bb8dc2a3 first cleanup of spider manager api
- removed asdict() and reload() methods
- added list() method
- removed default spider
2009-09-10 19:06:46 -03:00
Pablo Hoffman
f85813cd94 added FAQ entry about scrapy recipes and community spiders 2009-09-10 18:32:50 -03:00
Daniel Grana
734464825b allow to override httpclientfactory 2009-09-10 16:28:08 -03:00
Pablo Hoffman
269724a2b7 added Debugger extension, removed StackTraceDump from extensions available by default 2009-09-08 22:32:17 -03:00
Pablo Hoffman
2974c2c4b5 some additional checks on using unicode url/body in Request/Response objects 2009-09-07 15:20:41 -03:00
Pablo Hoffman
7a88c0d8e5 shell: fixed bug when typing exit() in python console - fixes #103 2009-09-07 14:57:50 -03:00
Ismael Carnales
4ddfa9a2a3 stlyed downloaded middleware doc 2009-09-07 12:18:57 -03:00
Ismael Carnales
e3df11e5bb added module directive to spidermw documentation 2009-09-07 12:03:24 -03:00
Ismael Carnales
30c2ad3f0c added urllength spider middleware test 2009-09-07 11:14:47 -03:00
Ismael Carnales
c4ad2bea5d added urlfilter spidermw test 2009-09-07 11:14:47 -03:00
Ismael Carnales
43bd00dea2 added referer spider middleware test 2009-09-07 11:14:46 -03:00
Ismael Carnales
1c700749ae added offside spider middleware test 2009-09-07 11:14:45 -03:00
Ismael Carnales
083635ebaf added depth spider middleware test 2009-09-07 11:14:43 -03:00
Ismael Carnales
f0b5892aa5 fixed stats downloadermw test 2009-09-07 11:14:38 -03:00
Pablo Hoffman
82ca5e26f5 renamed test_xpath.py to test_selector.py
--HG--
rename : scrapy/tests/test_xpath.py => scrapy/tests/test_selector.py
2009-09-07 09:38:24 -03:00
Pablo Hoffman
4023914b10 added support for instantiating TextResponse (or any subclass) with unicode urls, improved organization of request/response unittests 2009-09-07 09:37:46 -03:00
Pablo Hoffman
e2bd1be995 better aws code arrangement
--HG--
rename : scrapy/tests/test_aws.py => scrapy/tests/test_utils_aws.py
2009-09-04 18:07:51 -03:00
Pablo Hoffman
827aa19c6e removed obsolete scrapy.utils.db module 2009-09-04 17:38:14 -03:00
Pablo Hoffman
7466150286 removed some more obsolete middlewares 2009-09-04 17:32:34 -03:00
Pablo Hoffman
861a803cc3 removed obsolete RestrictMiddleware 2009-09-04 17:22:56 -03:00
Pablo Hoffman
0631102153 removed backwards compatibility for old errorpages downloader middlware 2009-09-04 17:19:03 -03:00
Ismael Carnales
043e7355f7 added some missing spidermw tests 2009-09-04 14:11:56 -03:00
Ismael Carnales
7e2587169b added missing middleware docs 2009-09-04 12:39:02 -03:00
Ismael Carnales
6d127d7fcf added some missing middlewares tests 2009-09-04 12:29:43 -03:00
Pablo Hoffman
aefb94063a more updates to spider middleware doc 2009-09-04 13:46:04 -03:00
Pablo Hoffman
d04640be5c some improvements to spider middleware doc 2009-09-04 13:29:16 -03:00
Pablo Hoffman
96bb223c13 removed (pretty useless) DebugMiddleware 2009-09-04 12:59:58 -03:00
Pablo Hoffman
86de5180fc fixed bug in robots middleware reported by fencer in #101 2009-09-04 12:36:29 -03:00
Pablo Hoffman
dad05957f3 added comment downloader backout policy 2009-09-04 01:16:58 -03:00
Daniel Grana
2ae11e9220 meassure downloader backout based on active requests that includes those in downlodermw plus queue 2009-09-03 16:58:36 -03:00