scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 13:23:59 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	38b5793152	Some changes to telnet console: * moved module from scrapy.management.telnet to scrapy.telnet (to minimize nested modules) * added signal for updating telnet console variables (fixes #165) --HG-- rename : scrapy/management/telnet.py => scrapy/telnet.py	2010-06-02 17:49:18 -03:00
Pablo Hoffman	4595c92cc2	Core logic improvement: wait for Downloader and Scraper to close the spiders before going on and finish closing them	2010-06-01 13:49:01 -03:00
Pablo Hoffman	9523cab25c	Fixed bug that was causing the engine to notify the manager of spider closes too early	2010-06-01 11:07:04 -03:00
Ping Yin	fcdc4ee7d9	downloadermiddleware/redirect: always do "HEAD" if origin request method is HEAD Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-05-04 16:11:45 +08:00
Pablo Hoffman	031eb1e5ed	removed no longer used SpiderScheduler (obsoleted by ExecutionQueue)	2010-05-28 17:27:15 -03:00
Rolando Espinoza La fuente	e995c5c7ff	Skipped IBL tests if nltk/numpy are not available.	2010-05-28 16:53:17 -03:00
Ismael Carnales	a71dc295af	Some mail improvements and tests. * Add mail_sent signal and use it in MailSender * Add MAIL_DEBUG setting to not send mails when testing * Add MailSender tests	2010-05-28 16:51:47 -03:00
Pablo Hoffman	dfa7b23959	Fixed SpiderManager tests that failed with dropin.cache write permissions errors in some cases --HG-- rename : scrapy/tests/test_contrib_spidermanager/spider1.py => scrapy/tests/test_contrib_spidermanager/test_spiders/spider1.py rename : scrapy/tests/test_contrib_spidermanager/spider2.py => scrapy/tests/test_contrib_spidermanager/test_spiders/spider2.py	2010-05-26 11:58:31 -03:00
Pablo Hoffman	dff763c683	Removed Scrapy engine singleton from scrapy.core.engine.scrapyengine. Now engine can only be accesed through Scrapy Manager 'engine' attribute - ie. scrapy.core.manager.engine.	2010-05-26 10:29:32 -03:00
Pablo Hoffman	2d3135603e	added scrapy-ctl view command	2010-05-26 10:29:32 -03:00
Pablo Hoffman	2905a2083b	moved scrapy.command.models module to scrapy.command	2010-05-26 10:29:32 -03:00
Pablo Hoffman	14bfeabede	moved scrapy.command.cmdline module to scrapy.cmdline (keeping backwards compatibility until 0.10) --HG-- rename : scrapy/command/cmdline.py => scrapy/cmdline.py	2010-05-26 10:29:32 -03:00
Pablo Hoffman	56abafec61	moved scrapy.command.commands module to scrapy.commands --HG-- rename : scrapy/command/commands/__init__.py => scrapy/commands/__init__.py rename : scrapy/command/commands/crawl.py => scrapy/commands/crawl.py rename : scrapy/command/commands/fetch.py => scrapy/commands/fetch.py rename : scrapy/command/commands/genspider.py => scrapy/commands/genspider.py rename : scrapy/command/commands/list.py => scrapy/commands/list.py rename : scrapy/command/commands/parse.py => scrapy/commands/parse.py rename : scrapy/command/commands/runspider.py => scrapy/commands/runspider.py rename : scrapy/command/commands/settings.py => scrapy/commands/settings.py rename : scrapy/command/commands/shell.py => scrapy/commands/shell.py rename : scrapy/command/commands/start.py => scrapy/commands/start.py rename : scrapy/command/commands/startproject.py => scrapy/commands/startproject.py	2010-05-26 10:29:32 -03:00
Pablo Hoffman	cae22930c8	Added ExecutionQueue class for feeding spiders and requests to scrape. This class can (and is meant to) be subclassed by projects that want to use a custom mechanism for feeding spiders to crawl. For example, a queue that pulls spiders to scrape from Amazon SQS (an example will be added soon). Also introduced a rather big core refactoring of Scrapy manager and Scrapy engine.	2010-05-26 10:29:32 -03:00
Pablo Hoffman	8c1feb7ae4	Ported S3ImagesStore to use boto threads. This simplifies the code and makes the following things no longer needed: 1. custom spider for S3 requests (ex. _S3AmazonAWSSpider) 2. scrapy.contrib.aws.AWSMiddleware 3. scrapy.utils.aws	2010-05-26 10:29:32 -03:00
Daniel Grana	c8c19a8e53	Automated merge with ssh://hg.scrapy.org/scrapy	2010-05-21 17:54:41 -03:00
Daniel Grana	cce9c4da49	silence HttpError exceptions raised by httperror spidermiddleware if not handled by spider	2010-05-21 17:54:32 -03:00
Ping Yin	f2363afe6f	LinkExtractor: split _process_links from _extract_links Separate the extraction and process logic, so we can override in subclass easier. Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-27 14:58:11 +08:00
Ping Yin	6059221716	Compose: stop process on None value by default By doing this, we can use str.lower as a processor safely without checking whether the given value is None. By passing stop_on_none=False as keyword argument, this behaviour can be changed. Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-08 10:59:47 +08:00
Ping Yin	15b879f845	ItemLoader: Update docs for {add,replace,get}_{value,xpath} Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-05-18 17:54:25 +08:00
Ping Yin	8f53a72306	ItemLoader: add test for adding a dict value After arg_to_iter is changed to return [arg] if arg is a dict, the added test will pass. Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-24 21:21:12 +08:00
Ping Yin	8497301784	arg_to_iter: return [arg] if arg is a dict Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-24 21:20:23 +08:00
Ping Yin	bd844f690b	{add,replace}_xpath: add processors, kw args and allow field_name to be None Also add method get_xpath. Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-23 01:34:55 +08:00
Ping Yin	a6c315552c	ItemLoader: Update tests for {add,replace,get}_value Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-23 01:49:25 +08:00
Ping Yin	913b5db242	{add,replace,get}_value: accept keyword args, now only 're' if re given, extract data from the given value by this regex Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-23 01:45:01 +08:00
Ping Yin	ddfaf6049f	{add,replace}_value: add processors args and allow field_name to be None * value is first proccessed by processors before passing to input processor * if field_name is None, values for multiple fields may be added/replaced. The keys of the processed value are as the field names * add get_value function for the processor logic Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-23 01:42:55 +08:00
Ping Yin	cf35e09d35	ItemLoader: don't limit item to Item object Now, for example, item can be a dict Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-23 01:28:57 +08:00
Pablo Hoffman	bfd9cb42e5	Automated merge with http://hg.scrapy.org/scrapy-0.8	2010-05-17 20:11:27 -03:00
Pablo Hoffman	076cdfd585	Added documentation about contributing to Scrapy	2010-05-17 20:10:46 -03:00
Pablo Hoffman	7a55158fed	fixed documentation bug (thanks rhill for reporting)	2010-05-11 11:25:03 -03:00
Steven Almeroth	5d03405cac	FormRequest.from_response doc fix. closes #155 --HG-- extra : rebase_source : d54979f6a15e5e997072dcbbc6d43b426189312b	2010-04-26 22:28:07 -03:00
Pablo Hoffman	2121a30c74	added note about installing Zope.Interface in windows platforms	2010-04-24 18:19:52 -03:00
Daniel Grana	6c12106803	Remove shpinx warning introduced by shorter title overline	2010-04-18 23:42:56 -03:00
Lucian Ursu	2f8c052484	#154 : Language fixes to the documentation	2010-04-18 23:39:54 -03:00
Ping Yin	d42e5fdbac	linkextractor: unique after urljoin_rfc Now, '/foo.html' and 'http://example.org/foo.html' are considered as the same and only one is kept. Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-02 19:45:30 +08:00
Pablo Hoffman	1868ede549	bumped embedded pydispatch to 2.0.1	2010-05-14 16:38:04 -03:00
Pablo Hoffman	02b7ca7e8c	bumped embedded BeautifulSoup to 3.0.8.1	2010-05-14 16:30:50 -03:00
Daniel Grana	e528a77fa3	Automated merge with ssh://hg.scrapy.org/scrapy	2010-05-14 20:09:29 +01:00
Daniel Grana	b2f58207a4	avoid different behaviour in urljoin between pytho2.5 and python2.6+. see http://bugs.python.org/issue1432	2010-05-14 20:09:07 +01:00
Pablo Hoffman	c87a29eb9e	improved docstring	2010-05-14 14:48:34 -03:00
Pablo Hoffman	31843316bc	Added new instance based learning extraction library in scrapy.contrib.ibl. Documentation and tools will be added later.	2010-05-14 14:33:26 -03:00
Ping Yin	0b3bf5c6f6	downloader_handler: test HEAD method	2010-05-04 15:50:26 +08:00
Ping Yin	0aaa74d2bd	extract_regex: encoding arg defaults to 'utf-8' Sometimes it is not neccessary to pass the encoding argument. For example, when the text argument is unicode. So set a default encoding. Signed-off-by: Ping Yin <pkufranky@gmail.com>	2010-04-22 23:43:34 +08:00
Pablo Hoffman	dfdac356af	added missing default values to file xporter doc	2010-04-02 02:49:18 -03:00
Pablo Hoffman	2f75839e7a	Ignore noisy Twisted deprecation warnings	2010-03-27 13:23:13 -03:00
Pablo Hoffman	f19c939925	fixed doc typo	2010-03-26 08:28:32 -03:00
Pablo Hoffman	99a876754c	Improved "What else?" section of "Scrapy at a glance" overview	2010-03-20 20:24:18 -03:00
Pablo Hoffman	234fd709ad	fixed doc typo (thanks Victor)	2010-03-19 10:32:17 -03:00
Daniel Grana	184cf6684f	Remove HttpException references from docs. Since 0.7, scrapy returns non-200 as Response objects and does not raise HttpException anymore	2010-03-18 10:05:33 -03:00
Daniel Grana	17091902f3	Explicity say where to save item class in "Defining our item" section of tutorial	2010-03-12 14:12:49 -02:00

1 2 3 4 5 ...

2048 Commits