1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-03-05 18:39:09 +00:00

4677 Commits

Author SHA1 Message Date
Daniel Graña
9e10abcc43 Merge branch 'master' into lxml-formrequest 2012-04-13 14:50:49 -03:00
Daniel Graña
1789a55f28 Merge pull request #117 from dangra/lxml-document
Lxml document
2012-04-13 10:49:35 -07:00
Daniel Graña
4c7d29b7f7 Cache response's element trees using LxmlDocument similar to Libxml2Document 2012-04-13 14:43:30 -03:00
Daniel Graña
ac4f6cc17c unify libxml2 document and factories 2012-04-13 13:54:48 -03:00
Daniel Graña
2904dc2dc0 no need for MultipleElementsFound exception. #111 2012-04-13 13:28:16 -03:00
Daniel Graña
ee1f7847a4 more lxml form fixes and test cases. #111
* Do not treat "coord" attribute specially, just pass "NN,NN" as clickdata value
* Raise explicit ValueError if not clickable is found
* Fix bug looking for clickeables trough xpath when there is more than one form
* Test from_response with multiple clickdata
2012-04-13 13:09:21 -03:00
Daniel Graña
32b9f788be lxml form request cleanup. #111
* remove unused _nons function copied from lxml.html
* compute clickables only if dont_click is False
* less _get_clickables function branch nesting
2012-04-13 12:50:47 -03:00
Daniel Graña
e4d22cb16a reuse form_values() method from lxml to avoid copying code. #111 2012-04-13 10:31:59 -03:00
Daniel Graña
18a35a9fd6 Merge branch 'lxml-selectors' 2012-04-13 09:32:51 -03:00
Daniel Graña
3dbe211d29 lxml boolean results fix. oops #116 2012-04-13 09:29:56 -03:00
Daniel Graña
a338c29287 Merge pull request #116 from dangra/lxml-selectors
more fixes to lxml selector incompatibilities
2012-04-13 04:38:45 -07:00
Daniel Graña
b9efa5ee73 more fixes to lxml selector incompatibilities
* Do not fail parsing empty bodies
* Do not fail parsing bodies with null bytes
* Recode to utf8 using response.body_as_unicode() to avoid decoding bugs
* Return empty results with unevaluable nodes like text or attribute nodes
* Return u'1' and u'0' for boolean xpaths
2012-04-13 00:58:31 -03:00
Daniel Graña
d8ebf16fe5 Merge pull request #114 from stav/master
Scrapy DOC changes
2012-04-11 12:08:45 -07:00
Pablo Hoffman
7cca916ed5 added release notes to official documentation, including all release notes since Scrapy 0.7 2012-04-11 15:53:23 -03:00
Lucian Ursu
c760cc5cd8 Copied lxml.html._nons to not rely on that module's private interface and took out check out of the for loop because it can be done only once 2012-04-11 21:40:00 +03:00
Lucian Ursu
f13a547203 Removed unnecessary iteration of formdata items 2012-04-11 21:08:43 +03:00
stav
f1802289cd small doc typo change to get the fork rolling 2012-04-11 12:05:39 -05:00
Daniel Graña
02833e3265 fix typo in module description. closes #112 2012-04-11 10:44:31 -03:00
Daniel Graña
a0a1a5026b do formdata encoding and serialization in one place. refs #111 2012-04-11 10:07:56 -03:00
Pablo Hoffman
4f28ffcb2c removed no longer needed dependency on simplejson 2012-04-10 16:01:36 -03:00
Pablo Hoffman
6e8edbd72e switched default selectors backend to lxml 2012-04-10 15:52:14 -03:00
Daniel Graña
af0e1c40f5 Avoid logging useless error messages about ignored requests in robots.txt 2012-04-10 13:37:32 -03:00
Lucian Ursu
4be6c22c4d Removed ClientForm with its patch and tests, and BeautifulSoup 2012-04-10 10:24:30 +03:00
Lucian Ursu
df2e795278 Added test case to make sure that ambiguous clickdata is not allowed 2012-04-10 10:19:59 +03:00
Lucian Ursu
eb47849c05 Replaced ClientForm-based FormRequest with a lxml-based implementation 2012-04-10 10:18:54 +03:00
Daniel Graña
97e4003a56 do not fail handling unicode xpaths in libxml2 backed selectors 2012-04-04 17:18:31 -03:00
Pablo Hoffman
ab4dd928ee Merge pull request #108 from kalessin/throttleslot
Fix autothrottle in order to modify also inactive downloader slots, so c...
2012-04-03 17:02:31 -07:00
olveyra
e6d7afa13b Fix autothrottle in order to modify also inactive downloader slots, so cases fixed by inactive slots patch will work ok also when using autothrottle 2012-04-03 23:14:00 +00:00
Pablo Hoffman
c27f7eb7e9 Merge pull request #106 from kalessin/downloader2
dont discard slot when empty, just save in another dict in order to recycle if needed again
2012-04-02 14:39:38 -07:00
olveyra
b39cb22d83 dont discard slot when empty, just save in another dict in order to recycle if needed again.
This fix avoids to continuosly create new slot under certain cases, bug that prevents download_delay and max_concurrent_requests to work properly.

The problem arises when the slot for a given domain becomes empty, but further requests for that domain werent still created by the spider. This is typical when spider creates requests one by one, or it makes requests to multiple domains and one or more of them are created in a rate enough slow that makes slot to be empty each time the response is fetched.

The effect is that a new slot is created for each request under such conditions, and so the download_delay and max_concurrent_requests are not taking effect (because in order to apply, depends on an already existing slot for that domain).
2012-04-02 20:34:57 +00:00
Pablo Hoffman
e9184def35 make selector re() method use re.UNICODE flag to compile regexes 2012-04-01 00:41:03 -03:00
Pablo Hoffman
27018fced7 changed default user agent to Scrapy/0.15 (+http://scrapy.org) and removed no longer needed BOT_VERSION setting 2012-03-23 13:45:21 -03:00
Pablo Hoffman
731c569b5c fixed test-scrapyd.sh script after changed on insophia website 2012-03-22 16:38:28 -03:00
Pablo Hoffman
8933e2f2be added REFERER_ENABLED setting, to control referer middleware 2012-03-22 16:35:14 -03:00
Pablo Hoffman
eed34e88cd Merge pull request #103 from jsyeo/patch-1
fixed minor mistake in Request objects documentation
2012-03-20 19:49:31 -07:00
Jason Yeo
da826aa13d fixed minor mistake in Request objects documentation 2012-03-21 10:25:41 +08:00
Pablo Hoffman
175c70ad44 fixed minor defect in link extractors documentation 2012-03-20 22:56:45 -03:00
Pablo Hoffman
056a7c53d0 added artwork files properly now 2012-03-20 10:46:45 -03:00
Pablo Hoffman
aef70e8394 removed wrongly added artwork files 2012-03-20 10:45:48 -03:00
Pablo Hoffman
bcd8520f8d added sep directory with Scrapy Enhancement Proposal imported from old Trac site 2012-03-20 10:15:00 -03:00
Pablo Hoffman
c0141d154e added artwork directory (data taken from old Trac) 2012-03-20 10:14:11 -03:00
Pablo Hoffman
35fb01156e removed some obsolete remaining code related to sqlite support in scrapy 2012-03-16 11:55:55 -03:00
Pablo Hoffman
838e1dcce9 updated FormRequest tests to use HtmlResponse instead of Response, as it makes more sense 2012-03-15 11:47:02 -03:00
Pablo Hoffman
b6ae266546 Removed (very old and possibly broken) backwards compatibility support for Twisted 2.5 2012-03-15 00:28:24 -03:00
Pablo Hoffman
9fddc73ed8 removed backwards compatibility code for old scrapy versions 2012-03-06 05:42:09 -02:00
Pablo Hoffman
9a508d4638 Removed deprecated setting: CLOSESPIDER_ITEMPASSED 2012-03-06 05:26:57 -02:00
Pablo Hoffman
8b83177655 Added CLOSESPIDER_ERRORCOUNT to scrapy/default_settings.py 2012-03-06 05:26:57 -02:00
Pablo Hoffman
9006227358 bumped required python-w3lib version in debian/control 2012-03-05 20:25:38 -02:00
Daniel Graña
2909a60e95 test that default start_request return value type is a generator. refs #98 2012-03-05 17:53:20 -02:00
Pablo Hoffman
45685ea6cd Restored scrapy.utils.py26 module for backwards compatibility, with a deprecation message. This is needed because the module was used a lot by users and the change causes too much trouble 2012-03-05 17:15:49 -02:00