Pablo Hoffman
7826869cb2
Added missing colon
2010-09-28 16:44:53 -03:00
Martin Santos
0bf9e4627c
added support to CloseSpider extension, for close the spider after N pages have been crawled. Using the CLOSESPIDER_PAGECOUNT setting. closes #253
2010-09-28 16:29:37 -03:00
Pablo Hoffman
279dcc245f
Fixed role name in Sphinx doc
2010-09-26 01:01:06 -03:00
Pablo Hoffman
9599bde3e9
Removed RequestLimitMiddleware
2010-09-22 16:09:13 -03:00
Pablo Hoffman
ed4aec187f
Ported code to use new unified access to spider settings, keeping backwards compatibility for old spider attributes. Refs #245
2010-09-22 16:09:13 -03:00
Pablo Hoffman
b6c2b55e5b
Splitted settings classes from settings singleton. Closes #244
...
--HG--
rename : scrapy/conf/__init__.py => scrapy/conf.py
rename : scrapy/conf/default_settings.py => scrapy/settings/default_settings.py
rename : scrapy/tests/test_conf.py => scrapy/tests/test_settings.py
2010-09-22 15:47:33 -03:00
Shuaib
9288f622f9
Added formname parameter for FormRequest.from_response
2010-09-20 08:33:24 -03:00
Pablo Hoffman
bf467fc37a
Check 'dont_merge_cookies' membership in request.meta, instead of getting its value
2010-09-10 15:29:15 -03:00
Pablo Hoffman
7d14a52234
Reference dont_merge_cookies in list of special Request.meta keys
2010-09-09 21:54:26 -03:00
Pablo Hoffman
7f21a6384f
Documented handle_httpstatus_list request.meta key
2010-09-09 21:50:40 -03:00
Pablo Hoffman
f1c943543a
Added dont_retry request.meta key to make RetryMiddleware ignore requests. Closes #234
2010-09-09 21:43:44 -03:00
Pablo Hoffman
9f01e3e79e
Added dont_redirect request.meta key to make RedirectMiddleware ignore requests. Closes #233
2010-09-09 21:37:35 -03:00
Pablo Hoffman
7da79b90fe
Make url/body attributes of Request/Response objects read-only - use replace() to change them. Deprecation warning left for backwards compatibilty.
2010-09-08 00:15:11 -03:00
Pablo Hoffman
c1aab2f58e
Copy callback/errback attributes when copying Requests
2010-09-08 00:15:09 -03:00
Pablo Hoffman
e9ebebb230
Removed UrlFilterMiddleware from scrapy.contrib - see this snippet for an alternative: http://snippets.scrapy.org/snippets/12/
2010-09-07 17:51:02 -03:00
Daniel Grana
12b04b068f
make download_timeout configurable by request. closes #229
...
--HG--
extra : rebase_source : e57dfd4aeb98d48b04fc4d0c6469e9a85e4b33a8
2010-09-07 13:01:40 -03:00
Pablo Hoffman
9158e9d682
Some changes to Scrapyd to support multiple configuration files, to make it easier to deploy Scrapyd applications. Also documented 'egg_runner' and 'application' options
...
--HG--
rename : debian/scrapyd.cfg => debian/000-default
rename : scrapyd/default_scrapyd.cfg => scrapyd/default_scrapyd.conf
2010-09-07 09:17:25 -03:00
Daniel Grana
3414bf13ee
remove request_uploaded signal and move response_received and response_downloaded to downloader manager. closes #228
...
--HG--
extra : rebase_source : 4af0d2a01b34de8a21048bb7f4a66bfc484b3b8f
2010-09-06 23:23:14 -03:00
Pablo Hoffman
766f2d910d
Renamed Request Handlers to Download Handlers
2010-09-05 19:35:53 -03:00
Pablo Hoffman
a5cf71cb06
Updated Ubuntu package signing key location
2010-09-05 19:04:15 -03:00
Pablo Hoffman
6bf52fb50e
Make telnet console and web service try a range of ports for binding, instead of just one. Closes #226
2010-09-05 06:48:08 -03:00
Pablo Hoffman
14e985b076
Updated Command line tool documentation
2010-09-05 05:29:58 -03:00
Pablo Hoffman
1190f97944
Updated settings documentation
2010-09-05 04:58:14 -03:00
Pablo Hoffman
ebdb733e95
Updated some old messages in Scrapy shell doc
2010-09-05 04:45:43 -03:00
Pablo Hoffman
bf34094e5a
Added versionadded:: notice to new documentation topics
2010-09-04 03:30:45 -03:00
Daniel Grana
9f4b1e47a4
damn, really fix httpcache docs
2010-09-04 03:26:41 -03:00
Daniel Grana
7ad901640b
fix httpcache docs
2010-09-04 03:23:08 -03:00
Daniel Grana
1abaa79469
Make ignored schemes configurable in HttpCacheMiddleware. closes #224
...
--HG--
extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75
2010-09-04 02:58:43 -03:00
Pablo Hoffman
7b9fa7fbaa
Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225
2010-09-04 02:23:04 -03:00
Pablo Hoffman
37e9c5d78e
Added new Scrapy service with support for:
...
* multiple projects
* uploading scrapy projects as Python eggs
* scheduling spiders using a JSON API
Documentation is added along with the code.
Closes #218 .
--HG--
rename : debian/scrapy-service.default => debian/scrapyd.default
rename : debian/scrapy-service.dirs => debian/scrapyd.dirs
rename : debian/scrapy-service.install => debian/scrapyd.install
rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides
rename : debian/scrapy-service.postinst => debian/scrapyd.postinst
rename : debian/scrapy-service.postrm => debian/scrapyd.postrm
rename : debian/scrapy-service.upstart => debian/scrapyd.upstart
rename : extras/scrapy.tac => extras/scrapyd.tac
2010-09-03 15:54:42 -03:00
Pablo Hoffman
758d21b2f9
Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217
2010-08-31 16:03:08 -03:00
Pablo Hoffman
e7b3247a18
Updated some missing references to scrapy-ws script
2010-08-27 01:05:59 -03:00
Pablo Hoffman
e2ed27e4fd
Added documentation for Ubuntu packages. Refs #211
2010-08-23 21:28:32 -03:00
Pablo Hoffman
6585c1a28f
removed (somewhat hacky) MAIL_DEBUG setting
2010-08-22 22:42:00 -03:00
Pablo Hoffman
cbfec4bb0e
Renamed webservice ManagerResource to CrawlerResource
...
--HG--
rename : scrapy/contrib/webservice/manager.py => scrapy/contrib/webservice/crawler.py
2010-08-22 05:48:03 -03:00
Pablo Hoffman
7546a0805c
Removed webservice Spiders and Extensions resources since they can now be accessed through the Execution Manager (aka. Crawler) resource
2010-08-22 05:38:46 -03:00
Pablo Hoffman
c1225e0f45
"parse" command refactoring. This fixes #173 and renders #106 invalid.
2010-08-22 05:04:17 -03:00
Pablo Hoffman
9fccc11363
Moved scrapy.extension.extensions singleton to a "extensions" attribute of the scrapy.project.crawler singleton. Refs #189
2010-08-22 02:15:11 -03:00
Pablo Hoffman
faf7a7da83
Moved scrapymanager singleton to scrapy.project module. Refs #189
...
Detail of changes:
* Moved scrapy.core.manager.ExecutionManager class to scrapy.crawler.Crawler
* Added scrapy.project.crawler singleton to reference a singleton instance of
Crawler class (previously known as scrapymanager)
* Left an alias scrapy.core.manager.scrapymanager to scrapy.project.crawler for
backwards compatibility (to be removed in Scrapy 0.11)
2010-08-22 02:10:53 -03:00
Pablo Hoffman
053d45e79f
Splitted stats collector classes from stats collection facility ( #204 )
...
* moved scrapy.stats.collector.__init__ module to scrapy.statscol
* moved scrapy.stats.collector.simpledb module to scrapy.contrib.statscol
* moved signals from scrapy.stats.signals to scrapy.signals
* moved scrapy/stats/__init__.py to scrapy/stats.py
* updated documentation and tests accordingly
--HG--
rename : scrapy/stats/collector/simpledb.py => scrapy/contrib/statscol.py
rename : scrapy/stats/__init__.py => scrapy/stats.py
rename : scrapy/stats/collector/__init__.py => scrapy/statscol.py
2010-08-22 01:24:07 -03:00
Pablo Hoffman
c276c48c91
Added settings to Scrapy shell variables
2010-08-21 05:10:06 -03:00
Pablo Hoffman
68f9fcffe8
genspider command refactoring. Also updated tests and doc
2010-08-21 04:46:48 -03:00
Pablo Hoffman
0da6132136
Made command-line too output more concise
2010-08-21 03:37:59 -03:00
Pablo Hoffman
50621b7ef3
Renamed command "start" to "runserver". Closes #209
...
--HG--
rename : scrapy/commands/start.py => scrapy/commands/runserver.py
2010-08-21 01:42:55 -03:00
Pablo Hoffman
9aefa242d5
Applied documentation patch provided by Lucian Ursu ( closes #207 )
2010-08-21 01:26:35 -03:00
Pablo Hoffman
f782245c5a
Removed obsolete files
2010-08-21 01:24:39 -03:00
Pablo Hoffman
1d3b9e2ca8
Scrapy shell refactoring
2010-08-20 11:26:14 -03:00
Pablo Hoffman
7858244dca
Scrapy shell: moved python console starting code to scrapy.utils.console and get rid of noisy console banners
2010-08-20 01:33:02 -03:00
Pablo Hoffman
2ff5a83b7a
Added persistent execution queue (based on SQLite), and a new 'queue' command to control it. Closes #198
2010-08-19 02:55:52 -03:00
Pablo Hoffman
94ead94bf6
Improved documentation of Scrapy command-line tool
...
--HG--
rename : docs/topics/cmdline.rst => docs/topics/commands.rst
2010-08-19 00:04:52 -03:00