1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-28 04:44:21 +00:00

2822 Commits

Author SHA1 Message Date
Pablo Hoffman
a034d078c8 Changed Debian packaging to use the scrapy version in the package name, so we can have multiple Scrapy versions in the same apt repo
--HG--
rename : debian/scrapy.1 => debian/scrapy-files/scrapy.1
rename : debian/000-default => debian/scrapyd-files/000-default
rename : debian/scrapyd.upstart => debian/scrapyd.scrapyd.upstart
rename : debian/scrapy.1 => extras/scrapy.1
2010-11-17 17:03:00 -02:00
Pablo Hoffman
67adb2a05f Always use micro versions in Scrapy from now on 2010-11-17 00:09:14 -02:00
Pablo Hoffman
5a5364d0c1 Updated documentation to point out that simplejson is now required if using Python 2.5, and to recommended switching to Python 2.6 2010-11-16 03:31:04 -02:00
Pablo Hoffman
7c712eeda1 Removed scrapy.xlib.simplejson module. Scrapy now requires simplejson if running on Python 2.5. Closes #289 2010-11-16 03:11:12 -02:00
Pablo Hoffman
28cf8625b6 Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-11-16 02:37:19 -02:00
Pablo Hoffman
5ded8251d2 Fixed bug with deferred spider queues. Closes #288 2010-11-16 02:36:18 -02:00
Pablo Hoffman
b3c96c698d Fixed bug with deploy command if ~/.netrc doesn't exist. Closes #286 2010-11-13 16:38:30 -02:00
Pablo Hoffman
e1f419e9e9 canonicalize_url(): ignore case in domain names 2010-11-12 16:47:36 -02:00
Martin Olveyra
b4cc2d91f4 Allow to reapply a labelled region so to allow to use ignored regions inside repeated variants 2010-11-12 13:30:39 -02:00
Pablo Hoffman
08bbbc2f82 shell: properly refresh all vars when fetching a new request 2010-11-11 18:05:36 -02:00
Pablo Hoffman
5c18f02ade Only instantiate XPath selectors if the response is of the proper type. Closes #285 2010-11-11 18:00:02 -02:00
Pablo Hoffman
4e800e90d3 Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-11-10 13:36:05 -02:00
Pablo Hoffman
7698ff2a7b Disabled help() in telnet console. Closes #284 2010-11-10 13:35:36 -02:00
Pablo Hoffman
d988ca1ec2 Some changes to scrapy deploy command:
* changed deploy section names to [deploy:target]
* project is now passed through a -p|--project option
* version can now be set in the target configuration
* switched meaning of -l and -L options

* updated documentation accordingly
2010-11-08 17:01:06 -02:00
Pablo Hoffman
37c9d5feff minor update to queue command doc 2010-11-08 02:19:54 -02:00
Pablo Hoffman
5bdffadbe3 Simplified get_spider_list_from_eggfile() function now that it doesn't need to chdir to a custom directory (Scrapy now works when it's unable to create the SQLite database) 2010-11-05 11:48:12 -02:00
Pablo Hoffman
31bbcc9476 Raise error when egg is corrupt in activate_egg(). Use a more descriptive name for temporary dirs in get_spider_list_from_eggfile(). Make scrapyd webservice pass egg_runner to get_spider_list_from_eggfile() 2010-11-05 11:24:33 -02:00
Pablo Hoffman
de4909faca get_spider_list_from_eggfile(): more improvements to error messages, and support passing eggruner module as argument 2010-11-04 18:56:11 -02:00
Pablo Hoffman
7ba972d8cf get_spider_list_from_eggfile(): fail if unable to extract spider list 2010-11-04 16:27:47 -02:00
Pablo Hoffman
369a149d60 fixed bug with deploy command in windows environments: WindowsError: [Error 32] The process cannot access the file because it is being used by another process 2010-11-03 17:16:08 -02:00
Pablo Hoffman
a6c6dfe9ba Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-11-01 16:10:40 -02:00
Pablo Hoffman
1a2146d11a Fixed bug detecting response types for some urls. Closes #281 2010-11-01 16:09:33 -02:00
Pablo Hoffman
0f69e7a191 Some changes to HTTP Cache middleware:
* made it use the project data storage by default (closes #279)
* added HTTPCACHE_ENABLED setting (False by default) to enable it
* made HTTPCACHE_DIR = 'httpcache' by default (inside the project data storage)
* simplified HTTPCACHE_EXPIRATION_SECS semantics: zero means don't expire,
  dropped support for negative numbers
* other minor doc improvements
2010-11-01 02:38:15 -02:00
Pablo Hoffman
3c94c6cb9b fixed sphinx doc id 2010-11-01 02:31:20 -02:00
Pablo Hoffman
0b7e815888 Added Didier to AUTHORS 2010-11-01 01:25:30 -02:00
dfdeshom
130276605b Bind the web server and telnet server to a configurable interface (WEBSERVICE_HOST). The default is to bind to all interfaces. Also add documentation for WEBSERVICE_HOST and TELNETCONSOLE_HOST. 2010-11-01 00:59:04 -02:00
Pablo Hoffman
b76c5c597f * Added support for project data storage (closes #276)
* Documented project file structure
* Moved default location of SQLite database to project data storage dir (closes #277)
2010-10-31 03:25:37 -02:00
Pablo Hoffman
dfa6745e91 Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-10-30 16:05:53 -02:00
Pablo Hoffman
a0d9b43031 fixed typo in scrapyd doc 2010-10-30 16:05:32 -02:00
Pablo Hoffman
3d96016da1 runtests.sh: switched to 'text' repoter in trial 2010-10-30 16:03:00 -02:00
Pablo Hoffman
f73449fed6 removed LxmlItemLoader, as it has been obsoleted by the new lxml selector backend 2010-10-30 16:02:14 -02:00
Pablo Hoffman
d67152ab0f Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-10-30 01:56:12 -02:00
Pablo Hoffman
75451cbe84 scrapyd doc: fixed delversion.json example 2010-10-30 01:56:00 -02:00
Pablo Hoffman
20efdc0273 added --egg argument to scrapy deploy command, and log message when building the egg 2010-10-29 16:21:36 -02:00
Pablo Hoffman
85dec82688 Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-10-29 03:42:54 -02:00
Pablo Hoffman
1cc5cba69b Fixed bug logging Passed items. Closes #274 2010-10-29 03:42:21 -02:00
Pablo Hoffman
836e40896a minor fixes for python 2.5 compatibility 2010-10-29 02:23:10 -02:00
Pablo Hoffman
22283854d4 avoid stripping trailing spaces on lxml-based selectors. closes #270 2010-10-27 21:39:28 -02:00
Pablo Hoffman
7f646541c3 added trackref stats to memory debugger report. closes #272 2010-10-27 21:18:58 -02:00
Pablo Hoffman
1d5c56089c Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-10-27 14:42:47 -02:00
Pablo Hoffman
c3e5b4bb03 changed pid file name to scrapyd 2010-10-27 14:42:22 -02:00
Pablo Hoffman
2bba87f69f Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-10-27 14:20:21 -02:00
Pablo Hoffman
f47b9f608c simplified lockfile used by scrapyd (/var/run/scrapyd.pid instead of /var/run/scrapyd/scrapyd.pid). closes #271 2010-10-27 14:18:40 -02:00
Daniel Grana
bc2d78406c MediaPipeline fails to assign crawler.engine.download as download function because crawler is configured after pipelines are loaded 2010-10-27 12:36:24 -02:00
Pablo Hoffman
158f75450b Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-10-27 09:18:48 -02:00
Pablo Hoffman
6f4be21d4c changed robots.txt forbidden log level to DEBUG. closes #268 2010-10-27 09:17:58 -02:00
Pablo Hoffman
e625a8d56e moved all similar selector tests to common selector tests, to reuse them among all backends 2010-10-27 08:54:32 -02:00
Pablo Hoffman
d1f63237ad refactored selectors tests, by splitting tests in: common tests, lxml-specific tests and libxml2-specific tests. refs #147 2010-10-27 08:37:02 -02:00
Pablo Hoffman
665578bfe8 fixed imports in scrapy.xlib.simplejson 2010-10-27 08:05:08 -02:00
Pablo Hoffman
9b9ab37804 fixed bug with boolean results in lxml-based selectors 2010-10-27 08:03:37 -02:00