scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 01:23:56 +00:00

Author	SHA1	Message	Date
Daniel Graña	1461363809	Replace `contenttype` references by `type` The type to choose from is the selector type, not the input type. A content-type doesn't make sense in this context.	2013-10-16 17:37:25 -02:00
Daniel Graña	155ea08ea1	use `sel` name for Selector's instances in docs, internals and shell	2013-10-15 15:58:42 -02:00
Daniel Graña	ab9462a251	remove more references to libxml2	2013-10-14 16:37:14 -02:00
Daniel Graña	4645f9e03c	Updates docs to reflect unified selectors api	2013-10-14 16:31:20 -02:00
Daniel Graña	bf37f78572	Drop libxml2 selectors backend	2013-10-11 18:02:35 -02:00
Daniel Graña	6d598f0d94	Update selectors docs	2013-10-10 18:24:00 -02:00
Capi Etheriel	bc17e9d412	Adds HtmlCSSSelector and XmlCSSSelector classes, cssselect as optional dependency. Ported .get() from _Element and .text_content() from HTMLMixin Add CSS selectors to scrapy shell Documenting CSS Selectors: Constructing selectors Documenting CSS Selectors: Using Selectors Make CSS Selectors a default feature. Adds XPath powers to CSS Selectors and some syntactic sugar. Removes methods copied over from lxml.html.HtmlMixin. Updating docs to use new CSS Selector super powers. Documenting CSS Selectors: Regular Expressions Moving section after Nesting section, since it mentions it. Documenting CSS Selectors: Nesting Selectors Fix XPath specificity in lxml.selector.CSSSelectorMixin.text Cleaning up unused stuff from cssel.py Changing the behavior of lxml.selector.CSSSelectorMixin.text. Concatenating all of the descendant text nodes is more useful than returning it in pieces (there's xpath() if you need that). Documenting CSS Selectors: CSS Selector objects Documenting CSS Selectors: CSSSelectorList objects Documenting CSS Selectors: HtmlCSSSelector objects Documenting CSS Selectors: XmlCSSSelector objects Fixing some documentations typos and errors Enforcing the 80-char width lines Tidying up CSS selectors and CSSSelectorMixin objects Adding some missing references in documentation. Fixing lxml.selector.CSSSelectorList.text	2013-10-10 18:23:15 -02:00
Pablo Hoffman	8b9526a8f6	Merge pull request #400 from irgmedeiros/patch-2 Update the second code example	2013-10-07 07:57:18 -07:00
Pablo Hoffman	86c6e9433f	remove minor reference to 'scrapy server' command	2013-10-04 14:37:55 -03:00
Pablo Hoffman	37c24e01d7	document bindaddress request meta	2013-10-02 17:13:17 -03:00
Pablo Hoffman	a9c3519897	updated required twisted version to 10.0	2013-10-01 14:07:38 -03:00
Rolando Espinoza	d6e3eae527	docs: added section regarding setting up django's settings.	2013-09-30 09:58:10 -04:00
Rolando Espinoza	0cc1d870db	docs: minor tidy up sample code and missing shell prompts.	2013-09-30 09:58:10 -04:00
Loren Davie	8af0e89e85	Corrected typo.	2013-09-29 17:06:46 -04:00
Loren Davie	f49f5724d5	Added dynamic creation of item classes to practices.rst.	2013-09-28 09:00:48 -04:00
irgmedeiros	9b50409986	Update the second code example Update the second code example to reflect the last change in the first example.	2013-09-27 18:22:33 -03:00
irgmedeiros	d9e0fdc9aa	Update practices.rst With this modification scrapy runs the spider with project settings. The previous example ran only with default settings resulting in ignoring all user settings as pipelines for example.	2013-09-27 17:56:30 -03:00
Daniel Graña	265910aae6	Merge pull request #363 from taikano/sitemap_alternate also fetch alternate URLs from sitemaps, see #360	2013-09-26 09:15:02 -07:00
Pablo Hoffman	12280c2a95	fix sphinx references in doc	2013-09-25 15:13:17 -03:00
Pablo Hoffman	fc388f4636	Make ITEM_PIPELINE setting a dict This is for consistency with how spider and downloader middlewares are defined. ITEM_PIPELINE_BASE was also added and both remain empty. Backwards compatibility is kept (with a warning) with list-based ITEM_PIPELINES.	2013-09-23 17:50:43 -03:00
cacovsky	71b320914a	Update request-response.rst Fix small doc typo (too many backticks)	2013-09-18 11:45:25 -03:00
Stefan	6994959181	renamed to sitemap_alternate_links and added default value, see #360	2013-09-08 10:38:28 +02:00
Stefan	8ed2d0cda1	improved changes to allow retrieval of alternate links in sitemaps, see #360	2013-09-07 12:56:30 +02:00
Pablo Hoffman	86230c0ab8	added quantal & raring to support ubuntu releases	2013-08-22 21:49:55 -03:00
Mikhail Korobov	034ffae60f	Recommend Pillow instead of PIL. Closes GH-317.	2013-08-18 00:44:01 +06:00
Berend Iwema	32b6364bcd	#327 - Support STARTTLS / SSL option in email sender	2013-08-14 12:59:01 +02:00
Rocio Aramberri	d227d530f6	Added COMPRESSION_ENABLED setting to enable or disable the HttpCompressionMiddleware Added COMPRESSION_ENABLE setting to docs Added COMPRESSION_ENABLED setting to default settings	2013-08-01 11:31:28 -03:00
Dan	1ca31244b0	Fixed ordering of super argument call.	2013-07-16 14:50:10 -04:00
Dan	e12b689c4f	Updated documentation of spider arguments to include required super call.	2013-07-16 14:26:53 -04:00
Mikhail Korobov	1a1c93fafe	tiny FormRequest doc fix	2013-07-15 15:47:34 +06:00
Mikhail Korobov	ac2fadf3ab	DownloaderMiddleware.process_response docs fix "returns an exception" -> "raises an exception"	2013-07-08 19:41:58 +06:00
Mikhail Korobov	39e5da5f66	improve docs for DownloaderMiddleware.process_response	2013-07-08 19:17:29 +06:00
Pablo Hoffman	0f4b70f582	remove no deprecated request_scheduled signal It will be replaced by more accurate scheduler signals (proposal will come soon)	2013-06-27 11:23:24 -03:00
nramirezuy	bef8ade956	removed request_received and added request_scheduled	2013-06-26 16:45:46 -03:00
Pablo Hoffman	819b2776dd	Merge pull request #326 from berendiwema/master Include example of how to stop the reactor from script	2013-06-25 13:30:07 -07:00
nramirezuy	83b2774354	remove wrong default httpcache	2013-06-25 17:01:29 -03:00
Berend Iwema	aec314db09	added a bit more documentation on how to close the reactor when running scrapy from a script	2013-06-25 16:08:22 +02:00
Pablo Hoffman	bbde1d0e0b	Merge pull request #275 from stav/doc doc: Response.replace() cannot take meta argument	2013-06-24 11:09:28 -07:00
Capi Etheriel	50fa46d183	Document CrawlSpider.parse_start_urls method	2013-06-09 04:03:20 -03:00
Pablo Hoffman	8e49fed918	minor improvements to benchmarking doc	2013-05-16 13:23:13 -03:00
Pablo Hoffman	76087e336a	add scrapy bench command for benchmarking, with documentation	2013-05-16 13:15:25 -03:00
Pablo Hoffman	66311db23e	mention crawlera in best practices, as a way to deal with bans	2013-05-04 18:20:23 -03:00
Pablo Hoffman	9361c89573	remove scrapyd doc, as it was moved to its own repo	2013-04-27 04:15:42 -03:00
Nicolás Ramírez	6df274bba5	added copy method to item	2013-04-19 13:23:53 -03:00
Pablo Hoffman	96c2332e0e	fix inaccurate downloader middleware documentation. refs #280	2013-04-02 11:35:32 -03:00
Steven Almeroth	70179c7c0c	doc: remove trailing spaces	2013-03-21 13:57:39 -06:00
Steven Almeroth	0d7747d353	doc: Response.replace() cannot take meta argument >>> response.replace(meta={'foo':1}) Traceback (most recent call last): File "<input>", line 1, in <module> File "/srv/scrapy/scrapy-fork/scrapy/scrapy/http/response/text.py", line 45, in replace return Response.replace(self, args, kwargs) File "/srv/scrapy/scrapy-fork/scrapy/scrapy/http/response/__init__.py", line 77, in replace return cls(args, *kwargs) File "/srv/scrapy/scrapy-fork/scrapy/scrapy/http/response/text.py", line 22, in __init__ super(TextResponse, self).__init__(args, **kwargs) TypeError: __init__() got an unexpected keyword argument 'meta'	2013-03-21 13:49:55 -06:00
Pablo Hoffman	2a5c7ed4da	make Crawler.start() return a deferred that is fired when the crawl is finished	2013-03-20 14:48:59 -03:00
Pablo Hoffman	b347c14b5f	update engine status output on telnet console documentation	2013-03-18 19:12:12 -03:00
Shane Evans	5c2a82f1f7	fix typo	2013-03-17 19:34:55 +00:00

1 2 3 4 5 ...

469 Commits