Rolando Espinoza La fuente
8db67b17a3
scrapy manager refactor
...
* ExecutionManager
* deprecated runonce(*args)
* changed start() to start(keep_alive=Bool)
* changed crawl(*args) to crawl(requests, spider=None)
* if no spider given, tries to resolve spider
for each request
* added crawl_url(url, spider=None)
* added crawl_request(request, spider=None)
* added crawl_domain(domain)
* added crawl_spider(spider)
* updated commands: crawl, runspider, start
* updated webconsole
* updated crawler
* updated tests.test_engine
* updated utils.fetch
2010-04-01 17:16:38 -03:00
Pablo Hoffman
32f9c5fe68
removed old untested (and probably broken) code
2010-04-01 04:05:53 -03:00
Pablo Hoffman
4dc886e319
Improved comment
2010-03-31 18:26:35 -03:00
Pablo Hoffman
83d5eff0b7
More refactoring to encoding handling in TextResponse and subclasses
2010-03-31 18:21:41 -03:00
Pablo Hoffman
de896fa62d
Refactored implementation of Request.replace() and Response.replace()
2010-03-31 16:29:53 -03:00
Pablo Hoffman
2ed8a5bfb5
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-27 13:25:06 -03:00
Pablo Hoffman
2299deda66
updated wrong link in doc
2010-03-26 14:02:33 -03:00
Pablo Hoffman
7cf2f87e27
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-26 08:29:34 -03:00
Daniel Grana
996a1b3574
fix handling of relative base urls in get_base_url util
...
--HG--
extra : rebase_source : eb552219e6bf40bc0d2e35968c367105233b6ecc
2010-03-25 15:50:34 -03:00
Pablo Hoffman
1330697c3d
Some improvements to Response encoding support:
...
* added encoding aliases, configurable through a new ENCODING_ALIASES setting
* Response.encoding now returns the real encoding detected for the body
* simplified TextResponse API by removing body_encoding() and
headers_encoding() methods
* Response.encoding now tries to infer the encoding from the body always (it
was done before only on HtmlResponse and TextResponse)
* removed scrapy.utils.encoding.add_encoding_alias() function
* updated implementation of scrapy.utils.response function to reflect these API
changes
* updated documentation to reflect API changes
2010-03-25 15:47:10 -03:00
Daniel Grana
173e94386b
Support relative url used in base tag. closes #148
...
--HG--
extra : rebase_source : 1bff87c127a7e9d8d12c772b3068feb11eb5d97f
2010-03-25 12:38:37 -03:00
Pablo Hoffman
9ddcd1095d
sort setting alphabetically
2010-03-25 11:45:06 -03:00
Pablo Hoffman
cb49567ca6
Removed wrong line added in previous commit
2010-03-24 12:15:18 -03:00
Pablo Hoffman
45411926b5
Improved encoding support by explicitly passing encoding to all str_to_unicode() and unicode_to_str() calls
2010-03-24 12:14:07 -03:00
Pablo Hoffman
4fa833c849
Added LOG_ENCODING setting
2010-03-24 12:13:38 -03:00
Pablo Hoffman
87e68e7438
Made MailSender non IO-blocking, and improved MailSender documentation
2010-03-22 13:37:37 -03:00
Pablo Hoffman
1dfc79b5d0
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-20 20:48:11 -03:00
Pablo Hoffman
264cd2e035
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-19 10:32:42 -03:00
Pablo Hoffman
403a21ec74
removed obsolete scrapy.crawler module
2010-03-12 17:28:33 -02:00
Pablo Hoffman
54ae2c36d0
better implementation of open_in_browser() tests
2010-03-12 10:19:50 -02:00
Pablo Hoffman
38a296aa2c
Added tests to open_in_browser() function
2010-03-12 09:52:39 -02:00
Pablo Hoffman
2ab94d75e2
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-03-12 09:32:35 -02:00
Pablo Hoffman
39e4df0cff
removed unmaintained (and untested) contrib_exp ShoveItemPipeline
2010-03-10 00:10:36 -02:00
Pablo Hoffman
a505a9d490
minor code refactoring on scrapy.command.cmdline module
2010-03-04 11:09:16 -02:00
Pablo Hoffman
4c1ec0c97e
replaced hacky command_executed dict by standard signal
2010-03-04 10:58:18 -02:00
Pablo Hoffman
861f9691c7
removed partly-obsolete module scrapy.contrib.groupsettings
2010-03-04 10:40:41 -02:00
Pablo Hoffman
d12cd22d5e
switched default scheduler order to DFO, which consumes less memory by default
2010-03-04 10:15:58 -02:00
Daniel Grana
700be3202b
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
2010-02-24 15:44:41 -02:00
Daniel Grana
2322322ee6
Add missing priority and errback arguments to Request.replace method signature
2010-02-24 15:43:09 -02:00
Pablo Hoffman
180c091fb2
Fixed encoding issue (reported in #135 ) when the encoding declared in the HTTP header is unknown. This is the patch proposed by Rolando, with an update to the Request/Response documentation.
2010-02-24 14:01:29 -02:00
Pablo Hoffman
bbef0fe870
Automated merge with http://hg.scrapy.org/users/rolando/scrapy/
2010-02-20 11:12:37 -02:00
Rolando Espinoza La fuente
7b1ad321e3
examples/experimental: added imdb top movies spider
2010-02-19 21:31:17 -04:00
Pablo Hoffman
cb99edd153
simplified and improved AUTHORS file
2010-02-19 23:16:55 -02:00
Pablo Hoffman
a3d22c7240
Automated merge with http://hg.scrapy.org/scrapy-0.8/
2010-02-19 23:11:24 -02:00
Pablo Hoffman
60961e5499
minor documentation fix (refs #135 )
2010-02-19 23:09:48 -02:00
Pablo Hoffman
c1f8198639
Added RANDOMIZE_DOWNLOAD_DELAY setting
2010-02-19 21:53:18 -02:00
Rolando Espinoza La fuente
4a053a762f
examples/experimental: added gooledir crawler
2010-02-19 18:28:16 -04:00
Rolando Espinoza La fuente
a6a3f085a7
docs: added crawlspider v2 outline documentation
...
Sign-Off: Rolando Espinoza La fuente
2010-02-19 18:22:38 -04:00
Rolando Espinoza La fuente
17d1543929
contrib_exp: added crawlspider v2 package + tests
...
Sign-Off: Rolando Espinoza La fuente
2010-02-19 18:19:01 -04:00
Rolando Espinoza La fuente
7ddd4441e3
utils.python: added equal_attributes() to compare two objects arbitrary attributes
...
Sign-Off: Rolando Espinoza La fuente
2010-02-19 17:57:48 -04:00
Rolando Espinoza La fuente
7235040936
merged upstream
2010-02-19 17:41:45 -04:00
Pablo Hoffman
23fcf48a89
Automated merge with http://hg.scrapy.org/scrapy-0.8/
2010-02-19 16:34:01 -02:00
Pablo Hoffman
53dfc4d3dd
fixed bug which was causing the DOWNLOAD_DELAY setting to be ignored (the spider download_delay attribute was working though)
2010-02-19 16:32:30 -02:00
Pablo Hoffman
a67c389728
Automated merge with http://hg.scrapy.org/scrapy-0.8/
2010-02-19 15:44:23 -02:00
Pablo Hoffman
51faec5dcd
fixed bug which was considering DOWNLOAD_DELAY as an int setting, where it should be a float
2010-02-19 15:42:54 -02:00
Daniel Grana
8dc95bf105
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
2010-02-18 16:52:45 -02:00
Daniel Grana
91f4d6dc51
docs: adds another spider example that yields multiples requests/items from a single callback
2010-02-18 16:51:05 -02:00
Pablo Hoffman
d337aeb7e7
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-01-31 18:11:43 -02:00
Pablo Hoffman
57d60eae39
sort settings doc alphabetically by setting name
2010-01-31 18:11:13 -02:00
Pablo Hoffman
843b371968
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-01-28 10:56:49 -02:00