1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 20:04:28 +00:00

2532 Commits

Author SHA1 Message Date
Pablo Hoffman
ffad8e08e7 Support passing all keyword arguments to ExecutionQueue append_spider_name and append_url 2010-08-27 13:45:14 -03:00
Pablo Hoffman
4eb0383dd2 Added Scrapy to scrapy --version 2010-08-27 11:15:37 -03:00
Pablo Hoffman
e7b3247a18 Updated some missing references to scrapy-ws script 2010-08-27 01:05:59 -03:00
Pablo Hoffman
aab38be498 Print Scrapy on first log line 2010-08-27 00:53:26 -03:00
Pablo Hoffman
e14cc2c12a Moved scrapy-ws script to extras/ and fixed broken methods due to changes in web service API
--HG--
rename : bin/scrapy-ws.py => extras/scrapy-ws.py
2010-08-27 00:33:08 -03:00
Pablo Hoffman
648f700ed1 Fixed log formatter tests
--HG--
rename : scrapy/tests/test_contrib_logformatter.py => scrapy/tests/test_logformatter.py
2010-08-26 23:23:58 -03:00
Pablo Hoffman
ad18d4a70e Added pluggable log formatter 2010-08-26 23:19:35 -03:00
Pablo Hoffman
d1e260a8d4 Simplified engine by removing the configure() and kill() methods. Also simplified the Spider Manager by removing the close_spider() method 2010-08-26 22:20:04 -03:00
Pablo Hoffman
f6c11af4c2 Moved module: scrapy.core.queue to scrapy.queue
--HG--
rename : scrapy/core/queue.py => scrapy/queue.py
2010-08-26 21:15:32 -03:00
Pablo Hoffman
747f090f94 Improved Twisted version detection (wasn't working for Twisted 10.0.0) 2010-08-26 20:32:26 -03:00
Pablo Hoffman
e95f7f63f9 Fixed typo 2010-08-25 21:06:10 -03:00
Pablo Hoffman
a82a4be3ab Added docstring to test_engine.py 2010-08-25 21:04:47 -03:00
Pablo Hoffman
59d18cf99e Fixed crawler reference 2010-08-25 19:59:51 -03:00
Pablo Hoffman
40b590cad3 Moved scrapy.cfg auto-discovery to scrapy.conf.EnvironmentSettings class 2010-08-25 19:59:30 -03:00
Pablo Hoffman
ef7a097272 Replaced old manager references with crawler 2010-08-25 19:31:04 -03:00
Pablo Hoffman
8fc78c4d0a Refactoring of Crawler, Commands, Execution Queue and Spider Manager:
Commands changes:

* removed (somewhat hacky) --init argument from settings command
* added set_crawler method to Commands, and a ``crawler`` property that returns
  a configured crawler. This way, commands that don't require a crawler (such
  as startproject) won't need to configure one.

Execution Queue changes:

* changed SERVICE_QUEUE_FILE setting to SQLITE_DB
* removed SERVICE_QUEUE setting
* added QUEUE_CLASS setting for defining the class to use for the execution queue
* added SERVER_QUEUE_CLASS setting for defining the class to use for the
  execution queue in server mode (runserver command)

Spider Manager changes:

* simplified SpiderManager API by removing the load() method
* added from_settings classmethod to SpiderManager
* added spider_modules constructor argument to SpiderManager

Crawler changes:

* added install() method to Crawler (to install it in scrapy.project) and
  uninstall() to remove it
* use CrawlerProcess.install() in scrapy.cmdline
* use crawler.install() and crawler.uninstall() in tests that a crawler in
  scrapy.project
* make telnet console and webservice play nicer with twisted by stopping
  listening when then engine goes down
* refactored Scrapy engine tests - it no longer uses the crawler singleton.
  Closes #215.
2010-08-25 19:24:36 -03:00
Pablo Hoffman
eb51b9f785 Removed obsolete setting 2010-08-25 06:41:19 -03:00
Pablo Hoffman
bea2f94357 Instantiate SpiderManager in Crawler constructor 2010-08-25 05:33:08 -03:00
Pablo Hoffman
8d24175c81 Added CrawlerProcess class, isolating all (Twisted) reactor-controlling code
into this class and leaving the Crawler class free of any reactor control.

This allows embedding the Scrapy crawler in other Twisted applications, or any
Twisted asynchronous code.

The "scrapy" command will use the new CrawlerProcess class, which resembles the
behaviour of the old (all-in-one) Crawler class.

Closes #214.

Also moved log.start() call from Crawler class to scrapy.cmdline module.
2010-08-25 05:24:02 -03:00
Pablo Hoffman
54024d1d3f Removed unneeded line 2010-08-25 05:06:45 -03:00
Pablo Hoffman
8e83f527b3 Removed scrapy-sqs script, as it has been superseded by the new scrapy 'queue' command 2010-08-23 22:04:49 -03:00
Pablo Hoffman
e2ed27e4fd Added documentation for Ubuntu packages. Refs #211 2010-08-23 21:28:32 -03:00
Pablo Hoffman
9ee92686c7 Moved spidermanager tests module according to policies
--HG--
rename : scrapy/tests/test_contrib_spidermanager/__init__.py => scrapy/tests/test_spidermanager/__init__.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/__init__.py => scrapy/tests/test_spidermanager/test_spiders/__init__.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider0.py => scrapy/tests/test_spidermanager/test_spiders/spider0.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider1.py => scrapy/tests/test_spidermanager/test_spiders/spider1.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider2.py => scrapy/tests/test_spidermanager/test_spiders/spider2.py
rename : scrapy/tests/test_contrib_spidermanager/test_spiders/spider3.py => scrapy/tests/test_spidermanager/test_spiders/spider3.py
2010-08-23 00:25:03 -03:00
Pablo Hoffman
58b4cc2c32 Some minor fixes to contribution Contributing documentation 2010-08-23 00:23:14 -03:00
Pablo Hoffman
6585c1a28f removed (somewhat hacky) MAIL_DEBUG setting 2010-08-22 22:42:00 -03:00
Pablo Hoffman
e189861b46 Fixed Item Loader bug that was preventing values that evaluate to False from being loaded. Patch contributed by Anibal Pacheco. Closes #174 2010-08-22 22:07:44 -03:00
Pablo Hoffman
254d517da1 Moved module: scrapy.contrib.spidermanager to scrapy.spidermanager
--HG--
rename : scrapy/contrib/spidermanager.py => scrapy/spidermanager.py
2010-08-22 21:53:47 -03:00
Pablo Hoffman
cf8a085f44 Minor improvement to bash autocompletion 2010-08-22 20:10:11 -03:00
Pablo Hoffman
fd784cd131 Fixed tests on Windows 2010-08-22 19:37:20 -03:00
Pablo Hoffman
33686fa563 better skip test message 2010-08-22 19:08:52 -03:00
Pablo Hoffman
6da1162839 minor fixes to FAQ 2010-08-22 19:08:45 -03:00
Pablo Hoffman
b3753d34eb Added FAQ entry about feed exports 2010-08-22 05:59:30 -03:00
Pablo Hoffman
cbfec4bb0e Renamed webservice ManagerResource to CrawlerResource
--HG--
rename : scrapy/contrib/webservice/manager.py => scrapy/contrib/webservice/crawler.py
2010-08-22 05:48:03 -03:00
Pablo Hoffman
7546a0805c Removed webservice Spiders and Extensions resources since they can now be accessed through the Execution Manager (aka. Crawler) resource 2010-08-22 05:38:46 -03:00
Pablo Hoffman
e5474d8cc6 Made ExtensionManager a subclass of MiddlewareManager 2010-08-22 05:33:08 -03:00
Pablo Hoffman
c1225e0f45 "parse" command refactoring. This fixes #173 and renders #106 invalid. 2010-08-22 05:04:17 -03:00
Pablo Hoffman
9fccc11363 Moved scrapy.extension.extensions singleton to a "extensions" attribute of the scrapy.project.crawler singleton. Refs #189 2010-08-22 02:15:11 -03:00
Pablo Hoffman
52c1e137e5 Moved scrapy.spider.spiders singleton to a "spiders" attribute of the scrapy.project.crawler singleton. Refs #189
Warning: this is a backwards incompatible change.

--HG--
rename : scrapy/spider/models.py => scrapy/spider.py
2010-08-22 02:15:10 -03:00
Pablo Hoffman
faf7a7da83 Moved scrapymanager singleton to scrapy.project module. Refs #189
Detail of changes:

* Moved scrapy.core.manager.ExecutionManager class to scrapy.crawler.Crawler
* Added scrapy.project.crawler singleton to reference a singleton instance of
  Crawler class (previously known as scrapymanager)
* Left an alias scrapy.core.manager.scrapymanager to scrapy.project.crawler for
  backwards compatibility (to be removed in Scrapy 0.11)
2010-08-22 02:10:53 -03:00
Pablo Hoffman
053d45e79f Splitted stats collector classes from stats collection facility (#204)
* moved scrapy.stats.collector.__init__ module to scrapy.statscol
* moved scrapy.stats.collector.simpledb module to scrapy.contrib.statscol
* moved signals from scrapy.stats.signals to scrapy.signals
* moved scrapy/stats/__init__.py to scrapy/stats.py
* updated documentation and tests accordingly

--HG--
rename : scrapy/stats/collector/simpledb.py => scrapy/contrib/statscol.py
rename : scrapy/stats/__init__.py => scrapy/stats.py
rename : scrapy/stats/collector/__init__.py => scrapy/statscol.py
2010-08-22 01:24:07 -03:00
Pablo Hoffman
c276c48c91 Added settings to Scrapy shell variables 2010-08-21 05:10:06 -03:00
Pablo Hoffman
b8b7b5ad74 Improved some commands descriptions 2010-08-21 05:03:38 -03:00
Pablo Hoffman
e6d2a3087e example projects: added scrapy.cfg and removed scrapy-ctl.py 2010-08-21 04:56:19 -03:00
Pablo Hoffman
68f9fcffe8 genspider command refactoring. Also updated tests and doc 2010-08-21 04:46:48 -03:00
Pablo Hoffman
0da6132136 Made command-line too output more concise 2010-08-21 03:37:59 -03:00
Pablo Hoffman
0d9e75c684 Added bash completion for the Scrapy command-line tool. Closes #210 2010-08-21 03:23:45 -03:00
Pablo Hoffman
f80ae9af66 Fixed missing reference to old 'start' command. Refs #209 2010-08-21 01:44:08 -03:00
Pablo Hoffman
50621b7ef3 Renamed command "start" to "runserver". Closes #209
--HG--
rename : scrapy/commands/start.py => scrapy/commands/runserver.py
2010-08-21 01:42:55 -03:00
Pablo Hoffman
9aefa242d5 Applied documentation patch provided by Lucian Ursu (closes #207) 2010-08-21 01:26:35 -03:00
Pablo Hoffman
f782245c5a Removed obsolete files 2010-08-21 01:24:39 -03:00