1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 15:24:15 +00:00

5595 Commits

Author SHA1 Message Date
Paul Tremberth
d6760dbaac Set SNI properly when using CONNECT 2016-04-18 18:30:01 +02:00
Mikhail Korobov
ba6dbad1e0 Merge pull request #1926 from redapple/faq-mcve
Reference StackOverflow's "minimal, complete, and verifiable example" guide
2016-04-12 18:35:34 +06:00
Paul Tremberth
2849ebf4c6 Reference StackOverflow's "minimal, complete, and verifiable example" guide 2016-04-12 14:07:33 +02:00
Paul Tremberth
47bfac1669 Merge pull request #1924 from lopuhin/faq-fix-py3
[MRG+1] Fix FAQ entry about python versions support (add Python 3.3+)
2016-04-12 12:15:17 +02:00
Konstantin Lopuhin
1ec49c2ada Fix FAQ entry about python versions support 2016-04-12 11:48:57 +03:00
Paul Tremberth
10d03ee419 Merge pull request #1916 from nblock/patch-1
Fix spelling mistake
2016-04-11 15:12:24 +02:00
nblock
a3557dd34d Fix spelling mistake 2016-04-11 14:06:57 +02:00
Mikhail Korobov
ff80e1c381 Merge pull request #1913 from redapple/link-extractor-new-w3lib
Fix link extractor tests for non-ASCII characters from latin1 document
2016-04-09 22:05:59 +06:00
Paul Tremberth
7b5243a263 Add link extractor test for non-ASCII characters in query part of URL 2016-04-09 15:15:01 +02:00
Paul Tremberth
1656fbcffa Fix link extractor tests for non-ASCII characters from latin1 document
URL path component should use UTF-8 before percent-encoding (that's what
browsers do when you open scrapy/tests/sample_data/link_extractor/linkextractor_latin1.html
and follow the links)
This matches current w3lib v1.14.1
2016-04-08 23:25:50 +02:00
Paul Tremberth
0ede017d2a Merge pull request #1891 from djunzu/update_files_images_pipelines
[MRG+1] Change Files/ImagesPipelines class attributes to instance attributes
2016-04-08 12:55:09 +02:00
Paul Tremberth
cbb695d08c Merge pull request #1881 from nyov/dedupe
[MRG+1] Remove duplicate code now handled by newer w3lib
2016-04-06 15:47:06 +02:00
Paul Tremberth
642fedb3d6 Merge pull request #1902 from starrify/case-insensitive-robots-txt-for-sitemap
[MRG+1] Added: Making it case-insensitive when extracting sitemap URLs from a robots.txt
2016-04-04 15:23:03 +02:00
djunzu
6988e9cd4b Update docs.
modified:   docs/topics/media-pipeline.rst
2016-04-01 21:51:15 -03:00
Pengyu CHEN
103f6eaa88 Added: Making it case-insensitive when extracting sitemap URLs from a robots.txt 2016-04-02 02:04:50 +08:00
Paul Tremberth
bf7f675493 Merge pull request #1847 from aron-bordin/add_blocking_storage_path_setting
[MRG+2] added BLOCKING_FEED_STORAGE_PATH to settings
2016-04-01 15:47:06 +02:00
Aron Bordin
9250a5bffa added FEED_TEMPDIR to settings 2016-04-01 00:05:21 -03:00
djunzu
537083524e Change ImagesPipeline class attributes to instance attributes.
modified:   scrapy/pipelines/images.py
2016-03-31 19:20:43 -03:00
djunzu
8228a0c491 Change FilesPipeline class attributes to instance attributes.
modified:   scrapy/pipelines/files.py
	modified:   tests/test_pipeline_files.py
2016-03-31 19:20:39 -03:00
djunzu
c7fc17866f Move default settings to settings/default_settings.py.
modified:   scrapy/pipelines/files.py
	modified:   scrapy/pipelines/images.py
	modified:   scrapy/settings/default_settings.py
2016-03-31 19:20:33 -03:00
djunzu
e9d48f8a8e Add tests.
modified:   tests/test_pipeline_files.py
	modified:   tests/test_pipeline_images.py
2016-03-31 19:19:49 -03:00
Paul Tremberth
9d8c368ce8 Merge pull request #1879 from scrapy/scrapy-arch-docs
DOC improved Architecture overview
2016-03-31 12:09:24 +02:00
Paul Tremberth
9ae4e46f32 Merge pull request #1883 from lopuhin/botocore-files-store-fix
[MRG+1] Make FilesPipeline work with S3FilesStore using botocore
2016-03-31 11:57:39 +02:00
Paul Tremberth
3ba5671fbc Merge pull request #1851 from nyov/binary_or_text
[MRG+1] Rename isbinarytext function to binary_is_text for clarity
2016-03-31 11:55:09 +02:00
Paul Tremberth
3a763f7ba7 Merge pull request #1857 from pawelmhm/fix_response_status_msg
[MRG+1] response_status_message should not fail on non-standard HTTP codes
2016-03-31 11:44:44 +02:00
nyov
e8ca467572 Rename isbinarytext function to binary_is_text for clarity
Closes #1389
2016-03-30 15:44:15 +00:00
nyov
3787fec460 Remove duplicate code now handled by newer w3lib
see f3029a6a10
2016-03-30 14:58:03 +00:00
Mikhail Korobov
a38a99e0e2 Merge pull request #1893 from redapple/sphinx-1.4
Add support for Sphinx 1.4
2016-03-30 19:55:48 +06:00
Paul Tremberth
1075587dbd Add support for Sphinx 1.4
See http://www.sphinx-doc.org/en/stable/changes.html#release-1-4-released-mar-28-2016

sphinx_rtd_theme has become optional, needs to be added to reqs

https://github.com/sphinx-doc/sphinx/pull/2320 changes node entries tuples
to 5 values instead of 4

`sh` syntax highlighting added very locally in selectors.rst
because of this warning/error with Sphinx 1.4:

```
Warning, treated as error:
/home/paul/src/scrapy/docs/topics/selectors.rst:743:
WARNING: Could not lex literal_block as "python". Highlighting skipped.
```
2016-03-30 14:40:52 +02:00
nanolab
a583e4d531 Update httpcache.py
It checks cache directory modification time, but have to check file modification time.
2016-03-30 10:57:48 +02:00
Lele
7082454f2a Changed sel. to response. for clarity
Changed sel. to response. to comply with the rest of the examples in the same section, to avoid confusion.
2016-03-28 05:27:15 +05:00
Konstantin Lopuhin
fc8cd45a48 Fix a race condition in the FilesPipeline
Checksum calculation could happen simultaniously with
persisting the file in the store (which is done in a thread):
they operated on the same buf object.
Concretely this lead to a bug with S3FilesStore
when using botocore: the signature did not match because
the position in the buf was already at the end.
The fix is to move checksum calculation before passing buf
to the store.
2016-03-27 21:56:47 +02:00
Konstantin Lopuhin
5045a4f168 Fix handling of meta=None in S3FilesStore.persist_file 2016-03-25 18:35:55 +03:00
Mikhail Korobov
4f335b5a01 DOC clarify Architecture docs 2016-03-25 17:03:41 +05:00
Mikhail Korobov
3ca977a8cb DOC improved Architecture overview
* spiders don't have to work on specific domains;
* explain what to use Downloader middleware for
  and what to use Spider middleware for;
* Engine no longer locates spiders based on domains;
* "Spider middleware output direction" step was missing.

See also: GH-1569.
2016-03-25 07:11:33 +05:00
pawelmhm
65c7c05060 response_status_message should not fail on non-standard HTTP codes
utility is used in retry middleware and it was failing to handle non-standard HTTP codes.
Instead of raising exceptions when passing through to_native_str it should return
"Unknown status" message.
2016-03-12 14:16:40 +01:00
Mikhail Korobov
ebef6d7c6d Merge pull request #1848 from aron-bordin/small_doc_style_fixes
small doc style fixes
2016-03-07 08:49:25 +05:00
Aron Bordin
2cfe9e424d small doc style fixes 2016-03-05 19:54:06 -03:00
Paul Tremberth
e122c569fe Merge pull request #1842 from nyov/nyov/docs
[MRG+1] Update documentation links
2016-03-04 11:50:26 +01:00
nyov
5876b9aa30 Update documentation links 2016-03-03 16:28:33 +00:00
Paul Tremberth
9f4fe5dc4a Merge pull request #1822 from nyov/nyov/scheduler
[MRG+1] Allow core Scheduler priority queue customization
2016-03-02 14:20:40 +01:00
Mikhail Korobov
6b2871dadd Merge pull request #1835 from djunzu/add_pps_to_IGNORED_EXTENSIONS
[MRG+1] Add pps extension to IGNORED_EXTENSIONS
2016-03-02 16:15:51 +05:00
djunzu
0e288d4a71 Add pps extension to IGNORED_EXTENSIONS
modified:   scrapy/linkextractors/__init__.py
2016-03-01 21:02:13 -03:00
nyov
2a6524ee3a Allow core Scheduler priority queue customization 2016-03-01 13:58:40 +00:00
Daniel Graña
b8fcb46e67 Merge pull request #1804 from redapple/enable-test-dwnld-timeout
Re-enable HTTPS tests for download timeouts
2016-03-01 10:44:24 -03:00
Daniel Graña
21da493109 Merge pull request #1828 from scrapy/py3-classifiers
[MRG+1] declare Python 3 support in setup.py
2016-03-01 10:34:36 -03:00
Daniel Graña
cf535fe840 Merge pull request #1827 from scrapy/proxy-auth-test
[MRG+1] Extract a function to build CONNECT request; add tests for it.
2016-03-01 10:34:24 -03:00
Mikhail Korobov
17d3bec699 declare Python 3 support in setup.py 2016-03-01 16:34:13 +05:00
Mikhail Korobov
94e28adfb7 Extract a function to build CONNECT request; add tests for it. See GH-1701 and GH-1808. 2016-03-01 16:29:12 +05:00
Mikhail Korobov
e8635cd03c Merge pull request #1826 from redapple/universal-wheels
Build universal wheels
2016-03-01 15:23:57 +05:00