scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 22:23:52 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	6f2cea4775	added deprecation messages to queue and runserver commands	2010-12-12 20:04:35 -02:00
Pablo Hoffman	b19ff21acd	scrapyd: added support for deferred spider queues	2010-12-10 15:55:40 -02:00
Martin Olveyra	02ccca01eb	use safe_url_string in canonicalize_url, to avoid to convert safe characters into percent representation. Lead to errors with many sites (RFC3986). closes #297	2010-12-08 16:28:38 -02:00
Pablo Hoffman	6a1b69c93f	renamed command 'scrapyd' to 'server', and deprecated 'runserver' and 'queue' commands --HG-- rename : scrapy/commands/scrapyd.py => scrapy/commands/server.py	2010-11-30 20:23:27 -02:00
Pablo Hoffman	831dc818d6	scrapyd: added more information webui homepage	2010-11-30 18:43:59 -02:00
Pablo Hoffman	a3d30c35fe	scrapyd: log url where web console can be accesed	2010-11-30 17:58:34 -02:00
Pablo Hoffman	823fd9822c	scrapyd: fixed bug discovering the current project scrapy.cfg file	2010-11-30 16:19:17 -02:00
Pablo Hoffman	7b84591ea9	added command for starting a scrapyd server for the current project	2010-11-30 15:52:15 -02:00
Pablo Hoffman	5a46ce47ee	scrapyd: add extra_sources consturctor argument, and also read scrapyd configuratoin from current project's scrapy.cfg file	2010-11-30 15:47:05 -02:00
Pablo Hoffman	c02d6db6a3	scrapyd: force application to receive config as argument	2010-11-30 15:46:24 -02:00
Pablo Hoffman	85890a5092	scrapyd: log process logfile when process starts/finishes	2010-11-30 15:45:42 -02:00
Pablo Hoffman	5c4f562ec4	scrapyd: changed keys used in poller message to _project, _spider, _job, and added link to log file in web ui	2010-11-30 13:03:20 -02:00
Pablo Hoffman	df54ed0041	Some Scrapyd enhancements: * added minimal web ui * return unique id per job (spider scheduled) * store one log per spider run (job) and rotate them, keeping the last N logs (where N is configurable through settings)	2010-11-30 02:26:31 -02:00
Pablo Hoffman	46e5d694e6	Scrapyd: return project and version in addversion.json	2010-11-29 17:22:28 -02:00
Pablo Hoffman	bbffa59497	Some changes to Scrapyd: * Always start one process per spider * Added max_proc_per_cpu option (defaults to 4) * Return the number of spiders (instead of a list of them) in schedule.json	2010-11-29 17:19:05 -02:00
Pablo Hoffman	42e8346d06	fixed failing test on win32	2010-11-29 10:27:12 -02:00
Pablo Hoffman	3cda681755	utils.testproc: make spawned process use the original CWD, instead of the temporary one created by twisted trial	2010-11-29 09:56:14 -02:00
Pablo Hoffman	1d726063d6	* Added tests for shell/fetch/version commands (closes #255 ) * Fixed bug causing Scrapy shell to fail if started without any argument (closes #294)	2010-11-28 18:14:45 -02:00
Pablo Hoffman	6f82ea19de	Fixed bug in addversion.json with old Twisted versions. Closes #293	2010-11-25 12:12:42 -02:00
Pablo Hoffman	2557777c39	Updated doc referring to HTTP cache middleware	2010-11-24 13:27:44 -02:00
Pablo Hoffman	d59ef48231	Fixed SgmlLinkExtractor bug which failed to recognize <base> tags when using restrict_xpaths	2010-11-23 17:28:29 -02:00
Pablo Hoffman	426b6fa100	docs/intro/install.rst: added -U flag to easy_install command	2010-11-22 13:50:19 -02:00
Pablo Hoffman	91e6753035	scrapy.bat: minor fix to support spaces in python installation dir (windows)	2010-11-22 00:39:45 -02:00
Pablo Hoffman	91a7c25797	* Made Response.meta attribute map to Request.meta attribute. Closes #290 * Record redirected URLs in redirect middleware. Closes #291	2010-11-18 12:51:54 -02:00
Pablo Hoffman	ac007802d6	Simplified installation guide, including lxml as alternative dependency to libxml2. Closes #280	2010-11-17 21:32:23 -02:00
Pablo Hoffman	2897061b98	Make scrapy conflict with previous versionsof the debian package	2010-11-17 17:30:22 -02:00
Pablo Hoffman	3926ca45b3	debian/control: Added scrapy/scrapyd to Provides	2010-11-17 17:08:21 -02:00
Pablo Hoffman	a034d078c8	Changed Debian packaging to use the scrapy version in the package name, so we can have multiple Scrapy versions in the same apt repo --HG-- rename : debian/scrapy.1 => debian/scrapy-files/scrapy.1 rename : debian/000-default => debian/scrapyd-files/000-default rename : debian/scrapyd.upstart => debian/scrapyd.scrapyd.upstart rename : debian/scrapy.1 => extras/scrapy.1	2010-11-17 17:03:00 -02:00
Pablo Hoffman	67adb2a05f	Always use micro versions in Scrapy from now on	2010-11-17 00:09:14 -02:00
Pablo Hoffman	5a5364d0c1	Updated documentation to point out that simplejson is now required if using Python 2.5, and to recommended switching to Python 2.6	2010-11-16 03:31:04 -02:00
Pablo Hoffman	7c712eeda1	Removed scrapy.xlib.simplejson module. Scrapy now requires simplejson if running on Python 2.5. Closes #289	2010-11-16 03:11:12 -02:00
Pablo Hoffman	28cf8625b6	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-11-16 02:37:19 -02:00
Pablo Hoffman	5ded8251d2	Fixed bug with deferred spider queues. Closes #288	2010-11-16 02:36:18 -02:00
Pablo Hoffman	b3c96c698d	Fixed bug with deploy command if ~/.netrc doesn't exist. Closes #286	2010-11-13 16:38:30 -02:00
Pablo Hoffman	e1f419e9e9	canonicalize_url(): ignore case in domain names	2010-11-12 16:47:36 -02:00
Martin Olveyra	b4cc2d91f4	Allow to reapply a labelled region so to allow to use ignored regions inside repeated variants	2010-11-12 13:30:39 -02:00
Pablo Hoffman	08bbbc2f82	shell: properly refresh all vars when fetching a new request	2010-11-11 18:05:36 -02:00
Pablo Hoffman	5c18f02ade	Only instantiate XPath selectors if the response is of the proper type. Closes #285	2010-11-11 18:00:02 -02:00
Pablo Hoffman	4e800e90d3	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-11-10 13:36:05 -02:00
Pablo Hoffman	7698ff2a7b	Disabled help() in telnet console. Closes #284	2010-11-10 13:35:36 -02:00
Pablo Hoffman	d988ca1ec2	Some changes to scrapy deploy command: * changed deploy section names to [deploy:target] * project is now passed through a -p\|--project option * version can now be set in the target configuration * switched meaning of -l and -L options * updated documentation accordingly	2010-11-08 17:01:06 -02:00
Pablo Hoffman	37c9d5feff	minor update to queue command doc	2010-11-08 02:19:54 -02:00
Pablo Hoffman	5bdffadbe3	Simplified get_spider_list_from_eggfile() function now that it doesn't need to chdir to a custom directory (Scrapy now works when it's unable to create the SQLite database)	2010-11-05 11:48:12 -02:00
Pablo Hoffman	31bbcc9476	Raise error when egg is corrupt in activate_egg(). Use a more descriptive name for temporary dirs in get_spider_list_from_eggfile(). Make scrapyd webservice pass egg_runner to get_spider_list_from_eggfile()	2010-11-05 11:24:33 -02:00
Pablo Hoffman	de4909faca	get_spider_list_from_eggfile(): more improvements to error messages, and support passing eggruner module as argument	2010-11-04 18:56:11 -02:00
Pablo Hoffman	7ba972d8cf	get_spider_list_from_eggfile(): fail if unable to extract spider list	2010-11-04 16:27:47 -02:00
Pablo Hoffman	369a149d60	fixed bug with deploy command in windows environments: WindowsError: [Error 32] The process cannot access the file because it is being used by another process	2010-11-03 17:16:08 -02:00
Pablo Hoffman	a6c6dfe9ba	Automated merge with http://hg.scrapy.org/scrapy-0.10	2010-11-01 16:10:40 -02:00
Pablo Hoffman	1a2146d11a	Fixed bug detecting response types for some urls. Closes #281	2010-11-01 16:09:33 -02:00
Pablo Hoffman	0f69e7a191	Some changes to HTTP Cache middleware: * made it use the project data storage by default (closes #279) * added HTTPCACHE_ENABLED setting (False by default) to enable it * made HTTPCACHE_DIR = 'httpcache' by default (inside the project data storage) * simplified HTTPCACHE_EXPIRATION_SECS semantics: zero means don't expire, dropped support for negative numbers * other minor doc improvements	2010-11-01 02:38:15 -02:00

... 5 6 7 8 9 ...

2799 Commits