1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 09:44:28 +00:00

3407 Commits

Author SHA1 Message Date
Daniel Graña
0a26170086 move ssl context factory to its own module and implement a non-ssl version that warns about pyopenssl support 2013-05-06 12:52:58 -03:00
Daniel Graña
ab3407289c enable persistent connections 2013-05-06 12:50:14 -03:00
Daniel Graña
e4fe7c63b0 add http connection pool and custom ssl context factory 2013-05-06 12:50:13 -03:00
Daniel Graña
a7a354f982 http11 cleanup 2013-05-06 12:45:27 -03:00
paul
ef03603869 Restore handling of HTTPS 2013-05-06 12:45:27 -03:00
paul
46341d5275 Renamed downloader to Http11DownloadHandler and some refactoring
Only for HTTP, not HTTPS
Test on expected body length instead of request method (HEAD case)
2013-05-06 12:45:27 -03:00
paul
4018d25a9b Use twisted.web.client.Agent for download requests (use of HTTP/1.1)
Adds http11.HttpDownloadHandler in scrapy.core.downloader.handlers
2013-05-06 12:45:27 -03:00
Daniel Graña
2a1a4477d3 Merge pull request #297 from scrapy/downloader-gc
Add garbage collector to downloader
2013-04-29 07:38:27 -07:00
Pablo Hoffman
af5c13fa14 Add garbage collector to downloader
This fixes a couple of issues:
- reactor callLater leaks when using download delay (test was
  re-enabled)
- downloader slot leaking on broad crawls (slots were created but never
  removed)
2013-04-29 10:16:30 -03:00
Pablo Hoffman
36ee36000e bind mockserver to 0.0.0.0, to listen on all 127.* range (useful for testing broad crawls) 2013-04-27 04:17:19 -03:00
Pablo Hoffman
9361c89573 remove scrapyd doc, as it was moved to its own repo 2013-04-27 04:15:42 -03:00
Daniel Graña
5ba2b60a4b fix broken doctests 2013-04-25 11:56:56 -03:00
Daniel Graña
32b781a1b8 cookies: increase candidate list with dot prefixed domains 2013-04-24 16:22:40 -03:00
Daniel Graña
31bec08a39 rfc2965 is dead
- It is not enabled by default in python cookielib
- Mozilla rejected implementing it https://bugzilla.mozilla.org/show_bug.cgi?id=208985
- Netscape cookies still rules
- It was superseded by RFC6265 which is the facto protocol formalized
2013-04-24 15:07:40 -03:00
Daniel Graña
973906153c Merge pull request #77 from shane42/master
Cookie handling performance improvement
2013-04-24 10:59:36 -07:00
Pablo Hoffman
d02da2f31f ported code to use queuelib 2013-04-23 17:48:09 -03:00
Daniel Graña
5531290d53 Merge pull request #292 from nramirezuy/item-unusedimport
deleted unused import
2013-04-19 10:53:20 -07:00
Nicolás Ramírez
bc592e9958 deleted unused import 2013-04-19 14:51:59 -03:00
Pablo Hoffman
7a1536f76e Merge pull request #290 from nramirezuy/item-copy
added copy method to item
2013-04-19 09:27:44 -07:00
Nicolás Ramírez
6df274bba5 added copy method to item 2013-04-19 13:23:53 -03:00
Pablo Hoffman
98f89d1c3f Merge pull request #291 from nramirezuy/cmd-parse-pipelines
parse pipelines test fixed
2013-04-19 09:22:38 -07:00
Pablo Hoffman
626331b865 test_crawl: make mock server print a line when ready, and wait for that line to start tests, instead of waiting for an arbitrary time 2013-04-19 13:06:30 -03:00
Nicolás Ramírez
e0c88f2d93 test fixed 2013-04-19 12:38:20 -03:00
Daniel Graña
ce177fa5bc mock server is slow bringing up on busy builders 2013-04-19 14:31:27 +00:00
Pablo Hoffman
74e9aecc8d Merge pull request #288 from kmike/patch-1
Update faq.rst
2013-04-17 17:55:45 -07:00
Mikhail Korobov
b245d592aa Update faq.rst
spider.DOWNLOAD_DELAY is deprecated
2013-04-18 02:42:15 +06:00
Pablo Hoffman
9feb65865c Merge pull request #284 from nramirezuy/cmd-parse-pipelines
Command parse, --pipelines argument added
2013-04-09 06:26:24 -07:00
Pablo Hoffman
adf38a65e9 Merge pull request #283 from opyate/patch-1
Update overview.rst, Torrent referenced as TorrentItem in spider
2013-04-08 13:48:05 -07:00
Nicolás Ramírez
2b39527f72 pipelines argument added 2013-04-08 14:55:28 -03:00
Juan M Uys
4de3aa4932 Update overview.rst 2013-04-08 14:13:15 +02:00
Pablo Hoffman
96c2332e0e fix inaccurate downloader middleware documentation. refs #280 2013-04-02 11:35:32 -03:00
Pablo Hoffman
b0ea457c7c Merge pull request #277 from nramirezuy/cmd-parse-args
Spider Arguments support for parse command
2013-03-28 15:38:35 -07:00
Nicolás Ramírez
df19693ed2 Spider Arguments support for parse command and test 2013-03-28 16:49:06 -03:00
Pablo Hoffman
21c8b89422 Revert "replaced use of depricated module scrapy.settings with the method get_project_settings()"
This reverts commit 1b4d14c8f635b28d5f72d37f924c4e71d71520ca.

Calling `get_project_settings()` generates a new independent Settings
object that doesn't contain the overrides passed by command line
arguments, for example.

Proper port would require implementing the from_crawler() class method
and making sure the settings object is passed to all internal objects
(probably breaking some minor backwards compatibility).
2013-03-21 12:27:07 -03:00
Pablo Hoffman
8c181d87f6 Merge pull request #274 from brunsgaard/master
Updated settings import in contrib/feedexport.py class S3FeedStorage
2013-03-21 08:25:29 -07:00
Jonas Brunsgaard
1b4d14c8f6 replaced use of depricated module scrapy.settings with the method get_project_settings() 2013-03-21 16:09:14 +01:00
Pablo Hoffman
d0a81d369f initial version of crawl tests using a mock HTTP server (in separate process). This can also be used to benchmark scrapy performance, although a script (specially suited for that task) would be more convenient 2013-03-20 14:48:59 -03:00
Pablo Hoffman
2a5c7ed4da make Crawler.start() return a deferred that is fired when the crawl is finished 2013-03-20 14:48:59 -03:00
Daniel Graña
c43931ea8c Use latest pyOpenSSL for all travis tests environments
SSLv2 was removed from OpenSSL 1.0 and above but it is still referenced
by pyOpenSSL < 0.13. Travis workers are precise hosts with OpenSSL 1.0
and pyOpenSSL 0.12 (!) with a debian patch to workaround this problem
that is not present in pyOpenSSL 0.12 shipped by PyPi.

Trying to install pyOpenSSL 0.10 or 0.12 from packages at PyPi under a
system with OpenSSL >= 1.0 will success but fails at import time with a
message similar to:

    ImportError: .../lib/python2.7/site-packages/OpenSSL/SSL.so: undefined symbol: SSLv2_method
2013-03-20 11:44:48 -03:00
Pablo Hoffman
9968f99e06 remove ssl from optional_features to simplify code, as it is now required. also deprecate optional_features set 2013-03-20 09:52:40 -03:00
Pablo Hoffman
320bdfe391 Merge pull request #269 from kalessin/settingdict
added support for explicitly interpret a setting value as dict
2013-03-19 11:10:54 -07:00
Martin Olveyra
bf480015f4 added support generic python literals in settings, and for explicitly
interpret a setting value as dict
2013-03-19 16:02:53 -02:00
Pablo Hoffman
d246b926bf Merge pull request #273 from plainas/master
Accept ajax requests from other hosts (CORS support)
2013-03-19 07:27:56 -07:00
Pedro
a80ed769d9 allow remote ajax requests to the webservice 2013-03-19 12:00:49 +01:00
Pablo Hoffman
b347c14b5f update engine status output on telnet console documentation 2013-03-18 19:12:12 -03:00
Pablo Hoffman
6f9f6f1f16 Merge pull request #271 from nramirezuy/scraper-6017
Slots removed, now Scraper can handle just one spider
2013-03-18 15:05:34 -07:00
Shane Evans
5c2a82f1f7 fix typo 2013-03-17 19:34:55 +00:00
Nicolás Ramírez
58975abeab Slots removed, now Scraper can handle just one spider 2013-03-15 11:11:51 -03:00
Pablo Hoffman
e630126b82 scrapy deploy: return non-zero exit code if deploy fails 2013-03-14 16:47:35 -03:00
Pablo Hoffman
bb20907254 minor updated to faq 2013-03-14 16:43:00 -03:00