Daniel Grana
7ad901640b
fix httpcache docs
2010-09-04 03:23:08 -03:00
Daniel Grana
1abaa79469
Make ignored schemes configurable in HttpCacheMiddleware. closes #224
...
--HG--
extra : rebase_source : 2e6e8b93c642290f9bd6eb634eb4c8cd6da07c75
2010-09-04 02:58:43 -03:00
Pablo Hoffman
7b9fa7fbaa
Don't filter out requests coming from spiders that don't define allowed_domains. Closes #225
2010-09-04 02:23:04 -03:00
Pablo Hoffman
37e9c5d78e
Added new Scrapy service with support for:
...
* multiple projects
* uploading scrapy projects as Python eggs
* scheduling spiders using a JSON API
Documentation is added along with the code.
Closes #218 .
--HG--
rename : debian/scrapy-service.default => debian/scrapyd.default
rename : debian/scrapy-service.dirs => debian/scrapyd.dirs
rename : debian/scrapy-service.install => debian/scrapyd.install
rename : debian/scrapy-service.lintian-overrides => debian/scrapyd.lintian-overrides
rename : debian/scrapy-service.postinst => debian/scrapyd.postinst
rename : debian/scrapy-service.postrm => debian/scrapyd.postrm
rename : debian/scrapy-service.upstart => debian/scrapyd.upstart
rename : extras/scrapy.tac => extras/scrapyd.tac
2010-09-03 15:54:42 -03:00
Pablo Hoffman
758d21b2f9
Simplified images pipeline by allowing it to be used without having to override it in your project. Closes #217
2010-08-31 16:03:08 -03:00
Pablo Hoffman
e7b3247a18
Updated some missing references to scrapy-ws script
2010-08-27 01:05:59 -03:00
Pablo Hoffman
e2ed27e4fd
Added documentation for Ubuntu packages. Refs #211
2010-08-23 21:28:32 -03:00
Pablo Hoffman
6585c1a28f
removed (somewhat hacky) MAIL_DEBUG setting
2010-08-22 22:42:00 -03:00
Pablo Hoffman
cbfec4bb0e
Renamed webservice ManagerResource to CrawlerResource
...
--HG--
rename : scrapy/contrib/webservice/manager.py => scrapy/contrib/webservice/crawler.py
2010-08-22 05:48:03 -03:00
Pablo Hoffman
7546a0805c
Removed webservice Spiders and Extensions resources since they can now be accessed through the Execution Manager (aka. Crawler) resource
2010-08-22 05:38:46 -03:00
Pablo Hoffman
c1225e0f45
"parse" command refactoring. This fixes #173 and renders #106 invalid.
2010-08-22 05:04:17 -03:00
Pablo Hoffman
9fccc11363
Moved scrapy.extension.extensions singleton to a "extensions" attribute of the scrapy.project.crawler singleton. Refs #189
2010-08-22 02:15:11 -03:00
Pablo Hoffman
faf7a7da83
Moved scrapymanager singleton to scrapy.project module. Refs #189
...
Detail of changes:
* Moved scrapy.core.manager.ExecutionManager class to scrapy.crawler.Crawler
* Added scrapy.project.crawler singleton to reference a singleton instance of
Crawler class (previously known as scrapymanager)
* Left an alias scrapy.core.manager.scrapymanager to scrapy.project.crawler for
backwards compatibility (to be removed in Scrapy 0.11)
2010-08-22 02:10:53 -03:00
Pablo Hoffman
053d45e79f
Splitted stats collector classes from stats collection facility ( #204 )
...
* moved scrapy.stats.collector.__init__ module to scrapy.statscol
* moved scrapy.stats.collector.simpledb module to scrapy.contrib.statscol
* moved signals from scrapy.stats.signals to scrapy.signals
* moved scrapy/stats/__init__.py to scrapy/stats.py
* updated documentation and tests accordingly
--HG--
rename : scrapy/stats/collector/simpledb.py => scrapy/contrib/statscol.py
rename : scrapy/stats/__init__.py => scrapy/stats.py
rename : scrapy/stats/collector/__init__.py => scrapy/statscol.py
2010-08-22 01:24:07 -03:00
Pablo Hoffman
c276c48c91
Added settings to Scrapy shell variables
2010-08-21 05:10:06 -03:00
Pablo Hoffman
68f9fcffe8
genspider command refactoring. Also updated tests and doc
2010-08-21 04:46:48 -03:00
Pablo Hoffman
0da6132136
Made command-line too output more concise
2010-08-21 03:37:59 -03:00
Pablo Hoffman
50621b7ef3
Renamed command "start" to "runserver". Closes #209
...
--HG--
rename : scrapy/commands/start.py => scrapy/commands/runserver.py
2010-08-21 01:42:55 -03:00
Pablo Hoffman
9aefa242d5
Applied documentation patch provided by Lucian Ursu ( closes #207 )
2010-08-21 01:26:35 -03:00
Pablo Hoffman
f782245c5a
Removed obsolete files
2010-08-21 01:24:39 -03:00
Pablo Hoffman
1d3b9e2ca8
Scrapy shell refactoring
2010-08-20 11:26:14 -03:00
Pablo Hoffman
7858244dca
Scrapy shell: moved python console starting code to scrapy.utils.console and get rid of noisy console banners
2010-08-20 01:33:02 -03:00
Pablo Hoffman
2ff5a83b7a
Added persistent execution queue (based on SQLite), and a new 'queue' command to control it. Closes #198
2010-08-19 02:55:52 -03:00
Pablo Hoffman
94ead94bf6
Improved documentation of Scrapy command-line tool
...
--HG--
rename : docs/topics/cmdline.rst => docs/topics/commands.rst
2010-08-19 00:04:52 -03:00
Pablo Hoffman
34554da201
Deprecated scrapy-ctl.py command in favour of simpler "scrapy" command. Closes #199 . Also updated documenation accordingly and added convenient scrapy.bat script for running from Windows.
...
--HG--
rename : debian/scrapy-ctl.1 => debian/scrapy.1
rename : docs/topics/scrapy-ctl.rst => docs/topics/cmdline.rst
2010-08-18 19:48:32 -03:00
Pablo Hoffman
a71521bfba
Default per-command settings are now specified in the default_settings attribute of the command object. Closes #201
2010-08-17 18:30:13 -03:00
Pablo Hoffman
ad3fd0afe8
fixed minor formatting issue with new feed exports doc
2010-08-17 14:37:59 -03:00
Pablo Hoffman
e741a807d2
Added new Feed exports extension with documentation and storage tests. Closes #197 .
...
Also deprecated File export pipeline (to be removed in Scrapy 0.11).
Still need to add tests for FeedExport main extension code.
2010-08-17 14:27:48 -03:00
Pablo Hoffman
3e3a66620b
Added support for returning deferreds from (some) signal handlers. Closes #193
2010-08-14 21:10:37 -03:00
Pablo Hoffman
1df2c17b78
updated old documentation references
2010-08-12 20:45:11 -03:00
Pablo Hoffman
43d47e5d9b
Some improvements to Item Pipeline ( closes #195 ):
...
* Made Item Pipeline Manager a subclass of scrapy.middleware.MiddlewareManager
* Added open_spider/close_spider methods with support for returning deferreds from them
* Inverted the process_item() arguments to be more friendly with deferred
callbacks (backwards compatibility kept through arguments introspection)
* Updated documentation with new methods and process_item() arguments change
2010-08-12 10:48:37 -03:00
Pablo Hoffman
9d38a99aa8
updated missing doc reference from previous commit
2010-08-10 17:47:04 -03:00
Pablo Hoffman
784722774b
moved scrapy.core.signals to scrapy.signals, keeping backwards compatibility
2010-08-10 17:40:53 -03:00
Pablo Hoffman
c359a34d7d
moved scrapy.core.exceptions to scrapy.exceptions, keeping backwards compatibility
...
--HG--
rename : scrapy/core/exceptions.py => scrapy/exceptions.py
2010-08-10 17:36:48 -03:00
Pablo Hoffman
c7d9f6e270
Added JSON item exporter with doc and unittests ( closes #192 ), and also:
...
* put all json exporters in scrapy.contrib.exporters and deprecated
scrapy.contrib.exporters.jsonlines to reduce module nesting
* use JSON exporter with EXPORT_FORMAT=json in file export pipeline
2010-08-07 15:52:59 -03:00
Pablo Hoffman
49851d7f55
Automated merge with http://hg.scrapy.org/scrapy-0.9
2010-08-02 17:20:55 -03:00
Pablo Hoffman
6c68e4ce15
fixed documentation typo
2010-08-02 17:20:13 -03:00
Pablo Hoffman
453e7bf38c
Scrapy logging refactoring ( closes #188 ):
...
* added Twisted log observer for Scrapy, with unittests
* use numeric values from Python logging module for log levels
* removed scrapy.log.exc() function - use scrapy.log.err() instead
* removed logmessage_received signal - write a (twisted) log observer instead
* dropped support for obsolete `domain` argument
* dropped support for old setting names: LOGLEVEL, LOGFILE (replaced by LOG_LEVEL, LOG_FILE)
* deprecated `component` argument
2010-08-02 08:49:14 -03:00
Ismael Carnales
e145ec686c
Replaced default spider manager (TwistedPluginSpiderManger) with a simpler one that doesn't depend on Twisted Plugins infrastructure.
2010-07-30 17:30:32 -03:00
Pablo Hoffman
e2290a5359
Some changes to Crawl spider:
...
* added process_request attribute to rules
* removed docstrings, since it duplicates documentation
2010-07-22 18:40:35 -03:00
Daniel Grana
3e013f564b
update docs for defaultheaders middleware and change spider attribute to match global setting name
2010-07-16 16:17:08 -03:00
Daniel Grana
6883a99c1e
Automated merge with ssh://hg.scrapy.org/scrapy-0.9
2010-07-16 14:56:00 -03:00
Pablo Hoffman
b91d40ba78
Fixed grammar error in doc (patch by stav) - closes #176
2010-07-16 11:34:18 -03:00
Ping Yin
b3a65d3313
HTTPCACHE: Don't cache response with codes in HTTPCACHE_IGNORE_HTTP_CODES
2010-07-09 13:14:25 -03:00
Pablo Hoffman
bd16d1cd48
Added SMTP-AUTH support to scrapy.mail ( closes #149 )
2010-06-13 17:14:46 -03:00
Pablo Hoffman
6a33d6c4d0
* Added Scrapy Web Service with documentation and tests.
...
* Marked Web Console as deprecated.
* Removed Web Console documentation to discourage its use.
2010-06-09 13:46:22 -03:00
Pablo Hoffman
73305b1eb3
Added support for Requests without callbacks ( #166 ) - the Spider.parse() method
...
is used in those cases.
Also removed Request.deferred attribute.
2010-06-08 18:18:02 -03:00
Pablo Hoffman
38b5793152
Some changes to telnet console:
...
* moved module from scrapy.management.telnet to scrapy.telnet (to minimize
nested modules)
* added signal for updating telnet console variables (fixes #165 )
--HG--
rename : scrapy/management/telnet.py => scrapy/telnet.py
2010-06-02 17:49:18 -03:00
Pablo Hoffman
031eb1e5ed
removed no longer used SpiderScheduler (obsoleted by ExecutionQueue)
2010-05-28 17:27:15 -03:00
Ismael Carnales
a71dc295af
Some mail improvements and tests.
...
* Add mail_sent signal and use it in MailSender
* Add MAIL_DEBUG setting to not send mails when testing
* Add MailSender tests
2010-05-28 16:51:47 -03:00