1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 06:43:43 +00:00

538 Commits

Author SHA1 Message Date
Paul Tremberth
61efacdd1f Add testcase for catching exception from open_spider() from pipeline 2016-11-08 11:35:42 +01:00
Paul Tremberth
db40852892 Do not interpret non-ASCII bytes in "Location" and percent-encode them (#2322)
* Do not interpret non-ASCII bytes in "Location" and percent-encode them

Fixes GH-2321

The idea is to not guess the encoding of "Location" header value
and simply percent-encode non-ASCII bytes,
which should then be re-interpreted correctly by the remote website
in whatever encoding was used originally.

See https://tools.ietf.org/html/rfc3987#section-3.2

This is similar to the changes to safe_url_string in
https://github.com/scrapy/w3lib/pull/45

* Remove unused import
2016-10-19 23:26:12 -03:00
Elias Dorneles
2d932c173c test abs path outside project as well 2016-09-30 15:07:58 -03:00
Elias Dorneles
25bd3b3fea add .scrapy when outside spider too, add tests 2016-09-29 18:30:42 -03:00
Paul Tremberth
41cd9f401f Merge pull request #2243 from pawelmhm/image-pipeline-2198
[MRG+1] [image & file pipeline] loading setting for user classes
2016-09-19 18:43:52 +02:00
Mikhail Korobov
5657f6b8ef Merge pull request #2258 from redapple/feed-export-started
[MRG+1] Feed exporter: start exporting only on first item
2016-09-19 14:40:30 +06:00
Mikhail Korobov
552368727a Merge pull request #2225 from Tethik/parse_command_rules_fix
[MRG+1] Two small fixes for when using the parse command and the '-r' flag (rules).
2016-09-19 14:39:09 +06:00
Joakim Uddholm
8c38dde4e8 Moved parse command tests to its own file. Added some checks to check for logged errors. 2016-09-19 05:33:05 +02:00
Paul Tremberth
03ab077249 Feed exporter: start exporting only on first item
Fixes GH-872
2016-09-17 01:36:56 +02:00
Paul Tremberth
b828facff4 Add shell test for using scrapy.Request() directly without importing scrapy 2016-09-15 19:25:20 +02:00
pawelmhm
7d88209543 [image & file pipeline] loading setting for user classes
if user has some custom subclass of Image pipeline and no setting for
this pipeline, he should get default settings defined for Image Pipeline.

Fixes #2198
2016-09-15 09:39:16 +02:00
Elias Dorneles
129421c7e3 Merge pull request #1503 from demelziraptor/amazon-json-response
[MRG+1] interpreting json-amazonui-streaming as TextResponse
2016-09-12 13:21:16 -03:00
Paul Tremberth
fbb5559299 Add tests for crawl command non-default cases 2016-09-12 13:35:14 +02:00
Paul Tremberth
9de6f1ca75 Merge pull request #1905 from rootAvish/duplication-fix
[MRG+1] Modified read failure recovery in utils/gz.py to read only the last f.extrasize bytes of f.extrabuf[ ]
2016-08-17 14:51:30 +02:00
Ashish Kulkarni
bb3b806467 Use w3lib.url.canonicalize_url() from w3lib 1.15.0
Also remove code/imports which are now unused due to this change.

fixes #2157
2016-08-16 17:42:16 +05:30
Paul Tremberth
9a734e6759 Merge pull request #2058 from dalleng/serialize_set
[MRG+1] Add set serialization to ScrapyJSONEncoder
2016-08-12 18:28:34 +02:00
rootavish
d9437fd3d9 Modifying existing gzip read failure recovery mechanism to patch read for broken archives 2016-08-11 18:21:42 +05:30
Mikhail Korobov
414857a593 Merge pull request #2140 from jesuslosada/images-expires
[MRG+1] Fix IMAGES_EXPIRES default value
2016-08-05 21:52:27 -04:00
Mikhail Korobov
2c9a38d1f5 Merge pull request #2153 from Digenis/Selector_bad_args
[MRG+1] Selector should not receive both response and text
2016-07-31 21:28:38 -04:00
Νικόλαος-Διγενής Καραγιάννης
643dbeffcf Selector should not receive both response and text 2016-07-30 10:35:16 +03:00
Diego Allen
e17fdd7276 Add set serialization to ScrapyJSONEncoder 2016-07-22 17:20:03 -04:00
Jesús Losada
7c3e3b484e Fix ImagesPipeline test settings 2016-07-22 20:03:49 +02:00
Paul Tremberth
ec1c61504a Merge pull request #2005 from feliperuhland/master
[MRG+1] Included new optional parameter in startproject command line
2016-07-19 12:31:06 +02:00
Mikhail Korobov
79639d0fec Merge pull request #1989 from pawelmhm/fix-images-pipeline-uppercase-other
[MRG+1] [image_pipeline] bring back uppercase class attributes
2016-07-13 14:44:00 +00:00
Mikhail Korobov
2dd1a9e3bc Merge pull request #2094 from redapple/dns-invalid-id
Catch and ignore certification verification exception for IP-address hosts
2016-07-13 10:48:08 +00:00
Paul Tremberth
c3109daa72 Merge pull request #2034 from dracony/master
[MRG+1] Added option to turn off ensure_ascii for JSON exporters
2016-07-12 17:01:09 +02:00
Dracony
33a39b368f added FEED_EXPORT_ENCODING setting to allow encoding specification 2016-07-12 16:20:17 +02:00
Paul Tremberth
778f1cf84c Merge remote-tracking branch 'origin/master' into octet-stream-no-decompress 2016-07-08 18:13:20 +02:00
Elias Dorneles
d43a35735a Merge pull request #2050 from Tethik/is_gzipped_fix
[MRG+1] Is_gzipped for application/x-gzip;charset=utf-8
2016-07-08 08:47:56 -03:00
Mikhail Korobov
b7553d921a Merge pull request #2038 from redapple/canonicalize-idna-failures
[MRG] Do not fail on canonicalizing URLs with wrong netlocs
2016-07-08 10:47:54 +06:00
Mikhail Korobov
52a52e2388 Merge pull request #2001 from matveinazaruk/issue-2000
[MRG+1] Fixed choosing of response class.
2016-07-08 10:44:24 +06:00
Valdir Stumm Junior
1779f5feca enable genspider command outside projects 2016-07-06 15:10:48 -03:00
Mikhail Korobov
759a555d28 Merge pull request #2069 from redapple/https-connect-host
[MRG] Add "Host" header in CONNECT requests to HTTPS proxies
2016-07-06 21:43:41 +06:00
Mikhail Korobov
4273734744 TST pin pytest-cov to 2.2.1; upgrade pytest 2016-07-06 18:29:49 +05:00
Paul Tremberth
37efdde3e3 Catch and ignore TLS verification exception for IP-address hosts
Fixes GH-2092
2016-07-06 14:20:13 +02:00
Paul Tremberth
6539277f99 Fix CONNECT request timeout (with an ugly hack) 2016-06-21 17:14:41 +02:00
Paul Tremberth
10a2c46e12 [HttpCompressionMiddleware] Do not decompress binary/octet-stream responses 2016-06-20 16:37:00 +02:00
Pawel Miech
fa4d0cdfe5 [FilesPipeline, ImagesPipeline] fix for cls attrs with DEFAULT prefix
some class attributes for ImagePipeline and FilesPipeline had DEFAULT prefix. These
attributes should be preserved as well, if users subclasses define values for
DEFAULT_<CLS_ATTRIBUTE_NAME> attribute this value should be preserved.
2016-06-20 12:53:20 +02:00
Pawel Miech
539d34bce0 [media-pipeline, file-pipeline] allow setting custom settings for subclasses
* move key_for_pipe function to media pipeline so that file pipeline can use it
* use key_for_pipe in file pipeline so that users can define custom settings for subclasses easily
* add tests for file pipelines attributes and settings
2016-06-15 15:39:11 +02:00
Pawel Miech
acbfdc6184 [files_pipeline] ensure class attributes are preserved
dont override class attributes with default settings (same as in image pipeline).
2016-06-15 15:14:28 +02:00
Pawel Miech
c6d1686d98 [files_pipeline] unify tests for files pipeline
if test tests same thing but for different field it can be unified into one.
2016-06-15 14:48:25 +02:00
Pawel Miech
72e4d5f33e [image_pipeline] another test for subclass inheritance
test case when subclass inherits from base class and has no attributes nor
settings defined.
2016-06-15 14:07:17 +02:00
Pawel Miech
ee39d11e45 [image_pipeline] refactor and simplify tests for image settings
unify tests that test same thing for different attribute values into one. Add
better docstrings for tests.
2016-06-15 11:25:38 +02:00
Joakim Uddholm
23f99e98c4 is_gzipped: Separated tests again. 2016-06-14 21:33:51 +02:00
Pawel Miech
d715172528 [image_pipeline] unify and simplify tests for setting loading
there was identical test for different setting keys. I unified it into
one unit test.

Fixes comments for tests, adds comments about intention of uppercase attrs.

Adds another test for user defined setting keys and uppercase attrs.
2016-06-14 19:09:56 +02:00
Joakim Uddholm
124e218a3b Added new testcases suggested by @redapple. 2016-06-14 14:22:18 +02:00
Matvei Nazaruk
b76b374648 Added test for http11 choosing response type without content-type header. 2016-06-13 23:21:38 +03:00
Joakim Uddholm
2c98a88a0e Separated tests based on case 2016-06-12 10:49:34 +02:00
Joakim Uddholm
989f6b8843 Test to show bug with is_gzipped and Content-Type: application/gzip;charset. 2016-06-12 01:38:27 +02:00
Pawel Miech
a62d4b081c [image-pipeline] image settings with class name
allow to have image settings with class name, so that settings for user defined ImagePipeline
subclasses can be defined easily.
2016-06-10 12:48:02 +02:00