1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-23 04:44:04 +00:00

5931 Commits

Author SHA1 Message Date
Paul Tremberth
778bed07bf Let framework handle only HTTP redirects by default for fetch and shell commands 2016-12-07 17:56:13 +01:00
Paul Tremberth
9aefc0a886 Add test for fetch command with redirections disabled 2016-11-24 13:41:51 +01:00
Paul Tremberth
35b655d2f8 Handle redirects transparently by default in shell and fetch
Adds --no-status-aware command line option to have previous behaviour
2016-11-24 12:23:22 +01:00
Mikhail Korobov
a07400ce0a Merge pull request #2396 from redapple/ubuntu-packages-toc
[MRG] DOC Remove "Ubuntu" section from sidebar/ToC
2016-11-22 18:21:38 +05:00
Pablo Hoffman
d62776a858 mention scrapoxy in best practices doc 2016-11-16 12:19:32 -03:00
Paul Tremberth
8db8545393 Add "orphan" metadata for Ubuntu packages page
As described in http://www.sphinx-doc.org/en/latest/markup/misc.html#metadata
2016-11-16 13:56:58 +01:00
Paul Tremberth
11fe3751cf DOC Remove "Ubuntu" section from sidebar/ToC 2016-11-16 11:55:09 +01:00
Elias Dorneles
1cbc57faa5 Merge pull request #2337 from muriloviana/add-middleware-template
[MRG+1] Add middleware to template project (closes #2335)
2016-11-15 12:58:40 -02:00
Mikhail Korobov
570e12b5db Merge pull request #2328 from scrapy/download-latency-meta-docs
[MRG+1] Document `download_latency` meta key
2016-11-14 21:24:14 +05:00
Valdir Stumm Junior
7025d6656a document download_latency meta key 2016-11-14 13:06:18 -02:00
Paul Tremberth
201a16f6d5 Merge pull request #2394 from nyov/fix-le
[MRG+1] LinkExtractor PY3 'unicode' type fix
2016-11-14 12:10:14 +01:00
nyov
e8205f6733 LinkExtractor PY3 'unicode' type fix 2016-11-10 17:50:06 +00:00
Paul Tremberth
de89b1b562 Merge pull request #2275 from scrapy/response-css-xpath-message
[MRG+1] Add better messages for when response content isn't text (closes #2264)
2016-11-10 11:38:22 +01:00
Mikhail Korobov
6d6dc7138b Merge pull request #2387 from rolando-contribute/conda-forge-updates
[MRG+1] Update conda channel to conda-forge and show conda version badge.
2016-11-10 00:35:43 +06:00
Rolando Espinoza
76459e1969 Update conda channel to conda-forge and show conda version badge. 2016-11-08 21:30:27 -03:00
Mikhail Korobov
ea83e67796 Merge pull request #2382 from redapple/slot-heartbeat-state-test
Test Slot's heartbeat state before stopping it
2016-11-08 18:32:07 +06:00
Paul Tremberth
af2280e695 Update docstring 2016-11-08 13:30:51 +01:00
Paul Tremberth
27456996a9 Add assertion on crawler not running 2016-11-08 11:46:16 +01:00
Paul Tremberth
61efacdd1f Add testcase for catching exception from open_spider() from pipeline 2016-11-08 11:35:42 +01:00
Paul Tremberth
7727d87f64 Test Slot's heartbeat state before stopping it
Also add a test on state of looping task in LogStats extension

Fixes #2011 and #2362
2016-11-07 16:44:57 +01:00
Mikhail Korobov
0f5eb4cf88 Merge pull request #2380 from rahulkant13may/master
Document update in "Using Firefox for scraping"
2016-11-07 19:02:53 +06:00
Rahul Kant
f56aef99c2 Add closing tag in <tbody> 2016-11-07 17:49:22 +05:30
Mikhail Korobov
9755ef933a Merge pull request #2369 from jc0n/media-pipeline-docs-typo
Fix typo in media pipeline docs
2016-10-29 12:46:48 +06:00
John O'Connor
d85da273be Fix typo in media pipeline docs 2016-10-28 19:44:46 -07:00
Elias Dorneles
a9c74dbe42 Merge pull request #2339 from visued/master
[MRG+1] Update documentation to explain the use of double quotes on Windows (fixes #2325)
2016-10-25 09:53:24 -02:00
Paul Tremberth
bd4f156d16 Merge pull request #2354 from stav/doc-spider-arguments
[MRG+1] doc: wording
2016-10-24 12:42:59 +02:00
Steven Almeroth
99daea495b Doc: wording 2016-10-21 16:16:42 -07:00
Steven Almeroth
a958e54954 Doc: remove trailing spaces 2016-10-21 16:16:37 -07:00
Mikhail Korobov
1be2447af9 Merge pull request #2346 from redapple/master-pr2313
Fix tutorial AuthorSpider
2016-10-21 11:08:17 +02:00
Randy Pen
c7d245b90b update
Thx for your advice.
2016-10-21 10:52:50 +02:00
Randy Pen
eacc5937e4 fix example code
In the AuthorSpider, original css selector failed to get links of author pages
2016-10-21 10:52:50 +02:00
Paul Tremberth
6df48d57e0 Bump version: 1.2.0 → 1.2.1 1.2.1 2016-10-21 10:37:45 +02:00
Paul Tremberth
f0e935735f Merge pull request #2341 from redapple/release-notes-1.2.1
Update release notes for upcoming 1.2.1
2016-10-21 10:34:47 +02:00
Paul Tremberth
c559a64c20 Set date for 1.2.1 release 2016-10-21 10:17:46 +02:00
Paul Tremberth
22772a61c8 Update release notes for upcoming 1.2.1 2016-10-21 10:17:04 +02:00
Paul Tremberth
eb5d396527 Merge pull request #2344 from jaympatel/master
Typo (through was misspelled)
2016-10-20 22:24:14 +02:00
Jay Patel
f357ccd0d7 Typo (through was misspelled) 2016-10-20 15:52:28 -04:00
Mikhail Korobov
871eec9827 Merge pull request #2327 from bopace/http-header-docs
[MRG+1] Added documentation about accessing header values
2016-10-20 09:05:14 +02:00
Mikhail Korobov
71c8278f57 Merge pull request #2329 from josericardo/scrapy-2262
[MRG+1] Better explain middleware orders and processing directions
2016-10-20 09:04:19 +02:00
Paul Tremberth
db40852892 Do not interpret non-ASCII bytes in "Location" and percent-encode them (#2322)
* Do not interpret non-ASCII bytes in "Location" and percent-encode them

Fixes GH-2321

The idea is to not guess the encoding of "Location" header value
and simply percent-encode non-ASCII bytes,
which should then be re-interpreted correctly by the remote website
in whatever encoding was used originally.

See https://tools.ietf.org/html/rfc3987#section-3.2

This is similar to the changes to safe_url_string in
https://github.com/scrapy/w3lib/pull/45

* Remove unused import
2016-10-19 23:26:12 -03:00
muriloviana
09c401bf8e use crawler only to register signals in from_crawler() 2016-10-19 14:09:01 -02:00
muriloviana
32fd692810 define method spider_opened and update class instructions 2016-10-19 13:50:09 -02:00
muriloviana
38e292a132 add from_crawler method to template and turn the returns methods explicit 2016-10-19 12:47:23 -02:00
muriloviana
34f2014c55 change settings middleware name and updating middleware template 2016-10-19 00:37:31 -02:00
muriloviana
d5bd44a5b9 add middleware to template project 2016-10-18 17:29:16 -02:00
Victor Sued
f74051e69e update documentation to explain the use of double quotes on Windows. 2016-10-18 16:36:43 -02:00
bopace
fd016ee71b Fixed wording of documentation 2016-10-18 09:37:45 -06:00
Paul Tremberth
6cc83c041e Merge pull request #2330 from lfmattossch/note-python2-name
[MRG+1] Added a note about invalid spider names in python 2 (fixes #2251)
2016-10-18 16:35:19 +02:00
Jose Ricardo
e12e364a40 Add details to the spider middlewares docs
Document the effects of the middleware order in a more detailed way.
2016-10-18 12:29:30 -02:00
Mikhail Korobov
a5f4450313 Merge pull request #2302 from Granitosaurus/pipeline_doc_fix
[MRG+1] Fix JsonWriterPipeline example in docs
2016-10-18 16:20:59 +02:00