1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-28 17:44:37 +00:00

4157 Commits

Author SHA1 Message Date
Max Arnold
d58a4639cd switch existing pipeline tests to use new file_path/image_path methods 2013-12-17 23:57:22 +07:00
Max Arnold
220ced163c simplify porting to Python 3 2013-12-17 23:19:40 +07:00
Max Arnold
86a6d6898b better marker name to detect overridden methods 2013-12-17 22:44:12 +07:00
Max Arnold
c29939862a add deprecation warnings for old file_key/image_key/thumb_key methods 2013-12-17 22:35:31 +07:00
Max Arnold
270e9190ae add new pipeline methods to get file/image/thumbnail paths
This change allows to pass request, response and spider context to filename
construction methods.
2013-12-17 21:50:40 +07:00
Pablo Hoffman
462e40acd0 Merge pull request #489 from RasPat1/patch-2
Note about selector class import
2013-12-16 06:40:59 -08:00
RasPat1
ff21281b95 Note about selector class import
This is the salient point of this code compared to the last example.  We have a selector now and this is how we use it.  Especially since the user has just come from the shell where the pre-instantiated selector is taken for granted.
2013-12-15 13:46:42 -05:00
Daniel Graña
8a7c5b5d81 Add 0.20.2 release notes
Conflicts:
	docs/news.rst
2013-12-09 18:33:46 -02:00
Daniel Graña
11359188a5 Merge pull request #484 from nyov/nyov/crawl-tmpl-selector
Update CrawlSpider Template with Selector changes
2013-12-09 12:21:27 -08:00
Benoit Blanchon
022538d3b0 BaseSgmlLinkExtractor: Fixed the missing space when the link has an inner tag 2013-12-07 20:43:22 +01:00
Benoit Blanchon
79179f59b2 BaseSgmlLinkExtractor: Added unit test of a link with an inner tag 2013-12-07 20:42:21 +01:00
Benoit Blanchon
62f7171942 BaseSgmlLinkExtractor: Fixed unknown_endtag() so that it only set current_link=None when the end tag match the opening tag 2013-12-07 19:47:50 +01:00
nyov
b6a200d02a Update CrawlSpider Template with Selector changes 2013-12-06 20:48:25 +00:00
Daniel Graña
72543c9ef0 Merge pull request #397 from duendex/duendex/proxyTunnel
Adds the functionality to do HTTPS downloads behind proxies using an
2013-12-03 14:48:42 -08:00
duendex
8ada8f5f36 Added a test case to ensure that passing the noconnect paramenter avoids trigerring the creation of a connect tunnel when downloading from a site with https scheme. 2013-12-03 12:55:44 -02:00
duendex
6427d60fd8 Fixed the location of the certificate required by libmproxy. 2013-12-03 11:45:02 -02:00
duendex
500490ee73 Corrected a test that used a dummy URL that unpurposedly had an https scheme and failed with PR 397. 2013-12-03 03:10:16 -02:00
duendex
f8dea74948 Added a delay to wait for the proxy to start. 2013-12-03 03:09:06 -02:00
duendex
247b330f08 Corrected typo in tox.ini 2013-12-02 21:34:35 -02:00
duendex
02bab270e8 Added mitmproxy as a requirement. 2013-12-02 20:25:12 -02:00
duendex
d69ba7c1ae Changed the proxy tests to use libmproxy instead of starting mitmdump as a separate process. 2013-12-02 20:21:43 -02:00
duendex
88bec496f2 The response matching re is now compiled once at module load time. 2013-12-02 20:19:42 -02:00
duendex
23c3288a6d Adds the option to omit the usage of a CONNECT tunnel by adding the noconnect
parameter to the URL of the proxy.
2013-12-02 20:19:42 -02:00
duendex
7f053cc1d2 Adds support for proxy authentication when openning a CONNECT tunnel. 2013-12-02 20:19:42 -02:00
duendex
628bfbcc3e Raises a custom TunnelError when the tunnel cannot be opened. Removed unnecesary comments. 2013-12-02 20:19:42 -02:00
duendex
58a98b0c04 Improved error handling. 2013-12-02 20:19:42 -02:00
duendex
36e4fc3785 Removed some trailing spaces that I left. 2013-12-02 20:19:42 -02:00
duendex
ae28c7d698 Adds the functionality to do HTTPS downloads behind proxies using an
HTTP CONNECT.
2013-12-02 20:19:41 -02:00
Pablo Hoffman
f2741c413e fix method name in tutorial. closes GH-480 2013-12-02 13:24:12 -02:00
Nikolay Golub
a651a75248 fix logging error with unicode spider name
Log message fails if spider name is in unicode, because "system" key in eventDict isn't encoded.
2013-11-30 21:08:56 +04:00
Daniel Graña
e34ffc0f42 Add 0.20.1 release notes
Conflicts:
	docs/news.rst
2013-11-28 16:25:57 -02:00
Daniel Graña
cfe588103c include_package_data is required to build wheels from published sources 2013-11-28 16:25:57 -02:00
Paul Tremberth
13002564b9 Add tests for OffsiteMiddleware() + use re.escape() in domains regexp 2013-11-28 15:40:22 +01:00
Pablo Hoffman
339861367e Merge pull request #425 from audiodude/master
DownloaderMiddleware docs: Update process_request and minor cleanups.
2013-11-25 10:33:35 -08:00
Daniel Graña
aeeba2147b travis-ci updated to pypy 2.2 2013-11-25 13:57:40 -02:00
Daniel Graña
36c8da2ad6 Merge pull request #461 from redapple/selectorloader
Add "unified" SelectorItemLoader (supports .add_css() and .add_xpath())
2013-11-22 12:10:39 -08:00
Pablo Hoffman
545f2601b0 Merge pull request #469 from redapple/xgzip
Remove "x-gzip" from Requests' "Accept-Encoding" header
2013-11-21 10:55:07 -08:00
Paul Tremberth
0d99babe40 Remove "x-gzip" from Requests' "Accept-Encoding" header 2013-11-21 18:58:24 +01:00
Paul Tremberth
14f5817d6b Modify ItemLoader to support XPath and CSS selectors
Deprecate XPathItemLoader (now an alias to the new ItemLoader)
2013-11-21 18:05:24 +01:00
Pablo Hoffman
f87be371a2 better names for HANDLE_* settings, and added doc 2013-11-21 14:33:17 -02:00
Daniel Graña
ab01e9e9e4 Merge pull request #466 from kalessin/httperror
allow to use settings for defining http error handling defaults
2013-11-21 03:55:44 -08:00
Martin Olveyra
55bee912a2 allow to use settings for defining http error handling defaults 2013-11-20 20:12:49 -02:00
Mikhail Korobov
8416cc7515 Merge pull request #465 from bjlange/master
Add note to item-pipeline documentation explaining order
2013-11-20 09:40:19 -08:00
Brian Lange
e4c1d8d37d Elaborate on use of order numbers 2013-11-19 17:51:50 -06:00
Daniel Graña
2564c21d4c add a tox env for Python 3.3 2013-11-19 20:15:15 -02:00
Daniel Graña
526a944eda lxml is required, no need to skip tests. 2013-11-19 20:14:48 -02:00
Daniel Graña
3f156ad845 Do not call body_as_unicode on non text responses. closes #462 2013-11-19 20:13:34 -02:00
Brian Lange
b878f60b5a Add note to item-pipeline documentation explaining order in the ITEM_PIPELINES setting. 2013-11-19 16:12:54 -06:00
Daniel Graña
ec7833a910 Deprecate body_or_str helper function only used by xml iterators 2013-11-19 19:21:54 -02:00
Pablo Hoffman
2d91c7136d Merge pull request #464 from kalessin/telnet
telnet client: fix unexisting reference to engine.slots
2013-11-19 05:22:58 -08:00