1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-23 14:24:19 +00:00

2486 Commits

Author SHA1 Message Date
Pablo Hoffman
46e5d694e6 Scrapyd: return project and version in addversion.json 2010-11-29 17:22:28 -02:00
Pablo Hoffman
bbffa59497 Some changes to Scrapyd:
* Always start one process per spider
* Added max_proc_per_cpu option (defaults to 4)
* Return the number of spiders (instead of a list of them) in schedule.json
2010-11-29 17:19:05 -02:00
Pablo Hoffman
42e8346d06 fixed failing test on win32 2010-11-29 10:27:12 -02:00
Pablo Hoffman
3cda681755 utils.testproc: make spawned process use the original CWD, instead of the temporary one created by twisted trial 2010-11-29 09:56:14 -02:00
Pablo Hoffman
1d726063d6 * Added tests for shell/fetch/version commands (closes #255)
* Fixed bug causing Scrapy shell to fail if started without any argument (closes #294)
2010-11-28 18:14:45 -02:00
Pablo Hoffman
6f82ea19de Fixed bug in addversion.json with old Twisted versions. Closes #293 2010-11-25 12:12:42 -02:00
Pablo Hoffman
2557777c39 Updated doc referring to HTTP cache middleware 2010-11-24 13:27:44 -02:00
Pablo Hoffman
d59ef48231 Fixed SgmlLinkExtractor bug which failed to recognize <base> tags when using restrict_xpaths 2010-11-23 17:28:29 -02:00
Pablo Hoffman
426b6fa100 docs/intro/install.rst: added -U flag to easy_install command 2010-11-22 13:50:19 -02:00
Pablo Hoffman
91e6753035 scrapy.bat: minor fix to support spaces in python installation dir (windows) 2010-11-22 00:39:45 -02:00
Pablo Hoffman
91a7c25797 * Made Response.meta attribute map to Request.meta attribute. Closes #290
* Record redirected URLs in redirect middleware. Closes #291
2010-11-18 12:51:54 -02:00
Pablo Hoffman
ac007802d6 Simplified installation guide, including lxml as alternative dependency to libxml2. Closes #280 2010-11-17 21:32:23 -02:00
Pablo Hoffman
2897061b98 Make scrapy conflict with previous versionsof the debian package 2010-11-17 17:30:22 -02:00
Pablo Hoffman
3926ca45b3 debian/control: Added scrapy/scrapyd to Provides 2010-11-17 17:08:21 -02:00
Pablo Hoffman
a034d078c8 Changed Debian packaging to use the scrapy version in the package name, so we can have multiple Scrapy versions in the same apt repo
--HG--
rename : debian/scrapy.1 => debian/scrapy-files/scrapy.1
rename : debian/000-default => debian/scrapyd-files/000-default
rename : debian/scrapyd.upstart => debian/scrapyd.scrapyd.upstart
rename : debian/scrapy.1 => extras/scrapy.1
2010-11-17 17:03:00 -02:00
Pablo Hoffman
67adb2a05f Always use micro versions in Scrapy from now on 2010-11-17 00:09:14 -02:00
Pablo Hoffman
5a5364d0c1 Updated documentation to point out that simplejson is now required if using Python 2.5, and to recommended switching to Python 2.6 2010-11-16 03:31:04 -02:00
Pablo Hoffman
7c712eeda1 Removed scrapy.xlib.simplejson module. Scrapy now requires simplejson if running on Python 2.5. Closes #289 2010-11-16 03:11:12 -02:00
Pablo Hoffman
28cf8625b6 Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-11-16 02:37:19 -02:00
Pablo Hoffman
5ded8251d2 Fixed bug with deferred spider queues. Closes #288 2010-11-16 02:36:18 -02:00
Pablo Hoffman
b3c96c698d Fixed bug with deploy command if ~/.netrc doesn't exist. Closes #286 2010-11-13 16:38:30 -02:00
Pablo Hoffman
e1f419e9e9 canonicalize_url(): ignore case in domain names 2010-11-12 16:47:36 -02:00
Martin Olveyra
b4cc2d91f4 Allow to reapply a labelled region so to allow to use ignored regions inside repeated variants 2010-11-12 13:30:39 -02:00
Pablo Hoffman
08bbbc2f82 shell: properly refresh all vars when fetching a new request 2010-11-11 18:05:36 -02:00
Pablo Hoffman
5c18f02ade Only instantiate XPath selectors if the response is of the proper type. Closes #285 2010-11-11 18:00:02 -02:00
Pablo Hoffman
4e800e90d3 Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-11-10 13:36:05 -02:00
Pablo Hoffman
7698ff2a7b Disabled help() in telnet console. Closes #284 2010-11-10 13:35:36 -02:00
Pablo Hoffman
d988ca1ec2 Some changes to scrapy deploy command:
* changed deploy section names to [deploy:target]
* project is now passed through a -p|--project option
* version can now be set in the target configuration
* switched meaning of -l and -L options

* updated documentation accordingly
2010-11-08 17:01:06 -02:00
Pablo Hoffman
37c9d5feff minor update to queue command doc 2010-11-08 02:19:54 -02:00
Pablo Hoffman
5bdffadbe3 Simplified get_spider_list_from_eggfile() function now that it doesn't need to chdir to a custom directory (Scrapy now works when it's unable to create the SQLite database) 2010-11-05 11:48:12 -02:00
Pablo Hoffman
31bbcc9476 Raise error when egg is corrupt in activate_egg(). Use a more descriptive name for temporary dirs in get_spider_list_from_eggfile(). Make scrapyd webservice pass egg_runner to get_spider_list_from_eggfile() 2010-11-05 11:24:33 -02:00
Pablo Hoffman
de4909faca get_spider_list_from_eggfile(): more improvements to error messages, and support passing eggruner module as argument 2010-11-04 18:56:11 -02:00
Pablo Hoffman
7ba972d8cf get_spider_list_from_eggfile(): fail if unable to extract spider list 2010-11-04 16:27:47 -02:00
Pablo Hoffman
369a149d60 fixed bug with deploy command in windows environments: WindowsError: [Error 32] The process cannot access the file because it is being used by another process 2010-11-03 17:16:08 -02:00
Pablo Hoffman
a6c6dfe9ba Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-11-01 16:10:40 -02:00
Pablo Hoffman
1a2146d11a Fixed bug detecting response types for some urls. Closes #281 2010-11-01 16:09:33 -02:00
Pablo Hoffman
0f69e7a191 Some changes to HTTP Cache middleware:
* made it use the project data storage by default (closes #279)
* added HTTPCACHE_ENABLED setting (False by default) to enable it
* made HTTPCACHE_DIR = 'httpcache' by default (inside the project data storage)
* simplified HTTPCACHE_EXPIRATION_SECS semantics: zero means don't expire,
  dropped support for negative numbers
* other minor doc improvements
2010-11-01 02:38:15 -02:00
Pablo Hoffman
3c94c6cb9b fixed sphinx doc id 2010-11-01 02:31:20 -02:00
Pablo Hoffman
0b7e815888 Added Didier to AUTHORS 2010-11-01 01:25:30 -02:00
dfdeshom
130276605b Bind the web server and telnet server to a configurable interface (WEBSERVICE_HOST). The default is to bind to all interfaces. Also add documentation for WEBSERVICE_HOST and TELNETCONSOLE_HOST. 2010-11-01 00:59:04 -02:00
Pablo Hoffman
b76c5c597f * Added support for project data storage (closes #276)
* Documented project file structure
* Moved default location of SQLite database to project data storage dir (closes #277)
2010-10-31 03:25:37 -02:00
Pablo Hoffman
dfa6745e91 Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-10-30 16:05:53 -02:00
Pablo Hoffman
a0d9b43031 fixed typo in scrapyd doc 2010-10-30 16:05:32 -02:00
Pablo Hoffman
3d96016da1 runtests.sh: switched to 'text' repoter in trial 2010-10-30 16:03:00 -02:00
Pablo Hoffman
f73449fed6 removed LxmlItemLoader, as it has been obsoleted by the new lxml selector backend 2010-10-30 16:02:14 -02:00
Pablo Hoffman
d67152ab0f Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-10-30 01:56:12 -02:00
Pablo Hoffman
75451cbe84 scrapyd doc: fixed delversion.json example 2010-10-30 01:56:00 -02:00
Pablo Hoffman
20efdc0273 added --egg argument to scrapy deploy command, and log message when building the egg 2010-10-29 16:21:36 -02:00
Pablo Hoffman
85dec82688 Automated merge with http://hg.scrapy.org/scrapy-0.10 2010-10-29 03:42:54 -02:00
Pablo Hoffman
1cc5cba69b Fixed bug logging Passed items. Closes #274 2010-10-29 03:42:21 -02:00