Daniel Graña
265910aae6
Merge pull request #363 from taikano/sitemap_alternate
...
also fetch alternate URLs from sitemaps, see #360
2013-09-26 09:15:02 -07:00
Pablo Hoffman
12280c2a95
fix sphinx references in doc
2013-09-25 15:13:17 -03:00
Pablo Hoffman
fc388f4636
Make ITEM_PIPELINE setting a dict
...
This is for consistency with how spider and downloader middlewares are
defined. ITEM_PIPELINE_BASE was also added and both remain empty.
Backwards compatibility is kept (with a warning) with list-based
ITEM_PIPELINES.
2013-09-23 17:50:43 -03:00
Pablo Hoffman
b1d1a36a1e
add note about enclosing urls with quotes when running from command-line. closes GH-384
2013-09-18 18:01:28 -03:00
cacovsky
71b320914a
Update request-response.rst
...
Fix small doc typo (too many backticks)
2013-09-18 11:45:25 -03:00
Kumara Tharmalingam
bbb0603091
Fixed directory location for dmoz_spider.py file
...
It should be under 'tutorial/spiders' not 'dmoz/spiders'
2013-09-15 21:55:52 -07:00
Daniel Graña
0400b18efa
docs: list lxml as installation prerequisite
2013-09-09 12:44:26 -03:00
Stefan
6994959181
renamed to sitemap_alternate_links and added default value, see #360
2013-09-08 10:38:28 +02:00
Stefan
8ed2d0cda1
improved changes to allow retrieval of alternate links in sitemaps, see #360
2013-09-07 12:56:30 +02:00
Daniel Graña
e6b3ca0180
Add 0.18.2 release notes
2013-09-03 14:29:40 -03:00
Daniel Graña
0f00b1602a
merge 0.18 release notes
2013-08-27 18:47:46 -03:00
Pablo Hoffman
19ff9ac4f9
url/body attributes of Request/Response objects are now immutable
2013-08-23 12:43:22 -03:00
Pablo Hoffman
86230c0ab8
added quantal & raring to support ubuntu releases
2013-08-22 21:49:55 -03:00
Mikhail Korobov
034ffae60f
Recommend Pillow instead of PIL. Closes GH-317.
2013-08-18 00:44:01 +06:00
Berend Iwema
32b6364bcd
#327 - Support STARTTLS / SSL option in email sender
2013-08-14 12:59:01 +02:00
Pablo Hoffman
c0b26e3d49
minor updates to 0.18 release notes
2013-08-14 01:39:44 -03:00
Daniel Graña
ed5b9068d2
fix contributters list format
2013-08-12 11:26:16 -03:00
Daniel Graña
a6693c9a5c
updated release notes and bumped version to 0.18.0
2013-08-09 19:02:28 -03:00
Hart
c00c4d7148
correction to description of example XPath retrieval in overview doc
2013-08-03 17:08:58 -07:00
Hart
0ad01c34d4
fixed typo to parallel fix on 0.16 branch
2013-08-03 17:06:10 -07:00
Rocio Aramberri
d227d530f6
Added COMPRESSION_ENABLED setting to enable or disable the HttpCompressionMiddleware
...
Added COMPRESSION_ENABLE setting to docs
Added COMPRESSION_ENABLED setting to default settings
2013-08-01 11:31:28 -03:00
Dan
1ca31244b0
Fixed ordering of super argument call.
2013-07-16 14:50:10 -04:00
Dan
e12b689c4f
Updated documentation of spider arguments to include required super call.
2013-07-16 14:26:53 -04:00
Mikhail Korobov
1a1c93fafe
tiny FormRequest doc fix
2013-07-15 15:47:34 +06:00
Mikhail Korobov
ac2fadf3ab
DownloaderMiddleware.process_response docs fix
...
"returns an exception" -> "raises an exception"
2013-07-08 19:41:58 +06:00
Mikhail Korobov
39e5da5f66
improve docs for DownloaderMiddleware.process_response
2013-07-08 19:17:29 +06:00
Pablo Hoffman
0f4b70f582
remove no deprecated request_scheduled signal
...
It will be replaced by more accurate scheduler signals (proposal will
come soon)
2013-06-27 11:23:24 -03:00
nramirezuy
bef8ade956
removed request_received and added request_scheduled
2013-06-26 16:45:46 -03:00
Pablo Hoffman
819b2776dd
Merge pull request #326 from berendiwema/master
...
Include example of how to stop the reactor from script
2013-06-25 13:30:07 -07:00
nramirezuy
83b2774354
remove wrong default httpcache
2013-06-25 17:01:29 -03:00
Berend Iwema
aec314db09
added a bit more documentation on how to close the reactor when running scrapy from a script
2013-06-25 16:08:22 +02:00
Pablo Hoffman
bbde1d0e0b
Merge pull request #275 from stav/doc
...
doc: Response.replace() cannot take meta argument
2013-06-24 11:09:28 -07:00
Capi Etheriel
50fa46d183
Document CrawlSpider.parse_start_urls method
2013-06-09 04:03:20 -03:00
Daniel Graña
b4fca90bba
merge 0.16.5 release notes
2013-05-30 18:49:00 -03:00
cacovsky
8007762890
Add FAQ entry referencing Request.meta usage
2013-05-27 13:02:17 -03:00
Pablo Hoffman
845c64b89d
add benchmarking to 0.18 release notes
2013-05-17 10:38:42 -03:00
Pablo Hoffman
ca12886acb
update copyright notes
2013-05-16 15:05:52 -03:00
Pablo Hoffman
8e49fed918
minor improvements to benchmarking doc
2013-05-16 13:23:13 -03:00
Pablo Hoffman
76087e336a
add scrapy bench command for benchmarking, with documentation
2013-05-16 13:15:25 -03:00
Pablo Hoffman
66311db23e
mention crawlera in best practices, as a way to deal with bans
2013-05-04 18:20:23 -03:00
Pablo Hoffman
9361c89573
remove scrapyd doc, as it was moved to its own repo
2013-04-27 04:15:42 -03:00
Pablo Hoffman
d02da2f31f
ported code to use queuelib
2013-04-23 17:48:09 -03:00
Pablo Hoffman
7a1536f76e
Merge pull request #290 from nramirezuy/item-copy
...
added copy method to item
2013-04-19 09:27:44 -07:00
Nicolás Ramírez
6df274bba5
added copy method to item
2013-04-19 13:23:53 -03:00
Mikhail Korobov
b245d592aa
Update faq.rst
...
spider.DOWNLOAD_DELAY is deprecated
2013-04-18 02:42:15 +06:00
Juan M Uys
4de3aa4932
Update overview.rst
2013-04-08 14:13:15 +02:00
Pablo Hoffman
96c2332e0e
fix inaccurate downloader middleware documentation. refs #280
2013-04-02 11:35:32 -03:00
Steven Almeroth
70179c7c0c
doc: remove trailing spaces
2013-03-21 13:57:39 -06:00
Steven Almeroth
0d7747d353
doc: Response.replace() cannot take meta argument
...
>>> response.replace(meta={'foo':1})
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/srv/scrapy/scrapy-fork/scrapy/scrapy/http/response/text.py", line 45, in replace
return Response.replace(self, *args, **kwargs)
File "/srv/scrapy/scrapy-fork/scrapy/scrapy/http/response/__init__.py", line 77, in replace
return cls(*args, **kwargs)
File "/srv/scrapy/scrapy-fork/scrapy/scrapy/http/response/text.py", line 22, in __init__
super(TextResponse, self).__init__(*args, **kwargs)
TypeError: __init__() got an unexpected keyword argument 'meta'
2013-03-21 13:49:55 -06:00
Pablo Hoffman
2a5c7ed4da
make Crawler.start() return a deferred that is fired when the crawl is finished
2013-03-20 14:48:59 -03:00