1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-23 14:24:19 +00:00

571 Commits

Author SHA1 Message Date
Paul Tremberth
07f9985a94 TST: Randomize FILES_EXPIRES above 90 days 2016-12-21 17:03:11 +01:00
Elias Dorneles
d09ec3db68 Merge pull request #2410 from redapple/fetch-transparent-redirect
[MRG+1] Transparently handle redirections in fetch and shell
2016-12-21 09:49:15 -02:00
Mikhail Korobov
d19c4c1f80 Merge pull request #2433 from redapple/wrong-spidermodules-warning
[MRG+1] Warn user instead of failing for wrong SPIDER_MODULES setting
2016-12-19 21:46:56 +05:00
Paul Tremberth
ed1e4d8df3 Merge pull request #1731 from scrapy/disable-toplevel-2
[MRG+1] LOG_SHORT_NAMES option to disable TopLevelFormatter
2016-12-19 16:02:55 +01:00
Elias Dorneles
97e82107b1 Merge pull request #2270 from redapple/httperror-log-info
[MRG+1] Raise log level for HttpErrorMiddleware to INFO (from DEBUG)
2016-12-14 12:18:04 -02:00
Paul Tremberth
f7e4081414 Add tests for SequenceExclude container 2016-12-12 22:37:53 +01:00
Mikhail Korobov
05b4555f39 TST tests for LOG_SHORT_NAMES 2016-12-09 02:19:51 +05:00
Mikhail Korobov
e46572d6f2 TST end-to-end test for LOG_LEVEL option
there were no end-to-end tests for this option
2016-12-09 02:19:33 +05:00
Mikhail Korobov
6eab59cbac TST cleanup runspider tests 2016-12-09 02:14:12 +05:00
Paul Tremberth
948e3cd003 Warn user instead of failing for wrong SPIDER_MODULES setting 2016-12-08 12:50:26 +01:00
Paul Tremberth
2cd579a774 Add test for fetch(url) within shell with and without redirect 2016-12-07 19:07:32 +01:00
Paul Tremberth
7e54de2455 Add tests for shell command with and without --no-redirect 2016-12-07 18:41:24 +01:00
Elias Dorneles
cce631abec Merge pull request #1887 from nyov/twisted11
[MRG+2] Bump Twisted dependency to 13.1.0
2016-12-07 15:00:01 -02:00
Paul Tremberth
778bed07bf Let framework handle only HTTP redirects by default for fetch and shell commands 2016-12-07 17:56:13 +01:00
Mikhail Korobov
ff3aec6613 Merge pull request #2331 from moisesguimaraes/fixes-2272
[MRG+1] Fixes issue #2272 using arg_to_iter() to wrap single values and list() to…
2016-12-07 20:08:18 +05:00
Elias Dorneles
a9c69458ff Merge pull request #2422 from rolando-contrib/nested-spiders-modules
[MRG+1] DOC State explicitly that spiders are loaded recursively.
2016-12-07 11:21:15 -02:00
Paul Tremberth
5efd65255c TST: Randomize IMAGES_EXPIRES above 90 days 2016-12-06 18:49:53 +01:00
nyov
534772f6ea Import xlib.tx code from twisted proper 2016-12-02 21:21:51 +00:00
Elias Dorneles
c4e67c0696 Merge pull request #2421 from rolando-contrib/tests-bug
[MRG+1] TST: Fix duplicated test name.
2016-12-01 18:43:48 -02:00
Elias Dorneles
f3a4420750 Merge pull request #2388 from redapple/robotparser-native-str
[MRG+1] Parse robots.txt content as native str
2016-12-01 18:43:23 -02:00
Rolando Espinoza
923b974f0a TST Include nested a nested spider in spider loader test. 2016-12-01 13:26:19 -03:00
Rolando Espinoza
d9f43e21ba TST: Fix duplicated test name. 2016-12-01 11:56:33 -03:00
Eugenio Lacuesta
5ff64ad015 handle relative sitemap urls in robots.txt 2016-12-01 09:53:40 -03:00
Paul Tremberth
9aefc0a886 Add test for fetch command with redirections disabled 2016-11-24 13:41:51 +01:00
Paul Tremberth
01142e2ae5 Print more dependencies versions in "scrapy version" verbose output 2016-11-22 14:48:33 +01:00
Paul Tremberth
6cd35c77da Pass user-agent as native str when checking URLs against robots.txt 2016-11-15 17:38:32 +01:00
Paul Tremberth
de89b1b562 Merge pull request #2275 from scrapy/response-css-xpath-message
[MRG+1] Add better messages for when response content isn't text (closes #2264)
2016-11-10 11:38:22 +01:00
Paul Tremberth
28155dfccc Parse robots.txt content as native str
Fixes #2373
2016-11-09 12:20:06 +01:00
Paul Tremberth
af2280e695 Update docstring 2016-11-08 13:30:51 +01:00
Paul Tremberth
27456996a9 Add assertion on crawler not running 2016-11-08 11:46:16 +01:00
Paul Tremberth
61efacdd1f Add testcase for catching exception from open_spider() from pipeline 2016-11-08 11:35:42 +01:00
Paul Tremberth
db40852892 Do not interpret non-ASCII bytes in "Location" and percent-encode them (#2322)
* Do not interpret non-ASCII bytes in "Location" and percent-encode them

Fixes GH-2321

The idea is to not guess the encoding of "Location" header value
and simply percent-encode non-ASCII bytes,
which should then be re-interpreted correctly by the remote website
in whatever encoding was used originally.

See https://tools.ietf.org/html/rfc3987#section-3.2

This is similar to the changes to safe_url_string in
https://github.com/scrapy/w3lib/pull/45

* Remove unused import
2016-10-19 23:26:12 -03:00
Moisés Guimarães
45e95b79ce (fixes #2272) using arg_to_iter() to wrap single values and list() to avoid consuming from generators. 2016-10-18 11:06:55 -03:00
Elias Dorneles
2d932c173c test abs path outside project as well 2016-09-30 15:07:58 -03:00
Elias Dorneles
25bd3b3fea add .scrapy when outside spider too, add tests 2016-09-29 18:30:42 -03:00
Elias Dorneles
9c9690c76c add better messages for when response content isn't text (closes #2264) 2016-09-21 10:30:35 -03:00
Paul Tremberth
81a0e3cd93 Raise log level for HttpErrorMiddleware to INFO (from DEBUG)
Fixes GH-910
2016-09-20 13:44:21 +02:00
Paul Tremberth
41cd9f401f Merge pull request #2243 from pawelmhm/image-pipeline-2198
[MRG+1] [image & file pipeline] loading setting for user classes
2016-09-19 18:43:52 +02:00
Mikhail Korobov
5657f6b8ef Merge pull request #2258 from redapple/feed-export-started
[MRG+1] Feed exporter: start exporting only on first item
2016-09-19 14:40:30 +06:00
Mikhail Korobov
552368727a Merge pull request #2225 from Tethik/parse_command_rules_fix
[MRG+1] Two small fixes for when using the parse command and the '-r' flag (rules).
2016-09-19 14:39:09 +06:00
Joakim Uddholm
8c38dde4e8 Moved parse command tests to its own file. Added some checks to check for logged errors. 2016-09-19 05:33:05 +02:00
Paul Tremberth
03ab077249 Feed exporter: start exporting only on first item
Fixes GH-872
2016-09-17 01:36:56 +02:00
Paul Tremberth
b828facff4 Add shell test for using scrapy.Request() directly without importing scrapy 2016-09-15 19:25:20 +02:00
pawelmhm
7d88209543 [image & file pipeline] loading setting for user classes
if user has some custom subclass of Image pipeline and no setting for
this pipeline, he should get default settings defined for Image Pipeline.

Fixes #2198
2016-09-15 09:39:16 +02:00
Elias Dorneles
129421c7e3 Merge pull request #1503 from demelziraptor/amazon-json-response
[MRG+1] interpreting json-amazonui-streaming as TextResponse
2016-09-12 13:21:16 -03:00
Paul Tremberth
fbb5559299 Add tests for crawl command non-default cases 2016-09-12 13:35:14 +02:00
Paul Tremberth
9de6f1ca75 Merge pull request #1905 from rootAvish/duplication-fix
[MRG+1] Modified read failure recovery in utils/gz.py to read only the last f.extrasize bytes of f.extrabuf[ ]
2016-08-17 14:51:30 +02:00
Ashish Kulkarni
bb3b806467 Use w3lib.url.canonicalize_url() from w3lib 1.15.0
Also remove code/imports which are now unused due to this change.

fixes #2157
2016-08-16 17:42:16 +05:30
Paul Tremberth
9a734e6759 Merge pull request #2058 from dalleng/serialize_set
[MRG+1] Add set serialization to ScrapyJSONEncoder
2016-08-12 18:28:34 +02:00
rootavish
d9437fd3d9 Modifying existing gzip read failure recovery mechanism to patch read for broken archives 2016-08-11 18:21:42 +05:30