Daniel Graña
9e10abcc43
Merge branch 'master' into lxml-formrequest
2012-04-13 14:50:49 -03:00
Daniel Graña
1789a55f28
Merge pull request #117 from dangra/lxml-document
...
Lxml document
2012-04-13 10:49:35 -07:00
Daniel Graña
4c7d29b7f7
Cache response's element trees using LxmlDocument similar to Libxml2Document
2012-04-13 14:43:30 -03:00
Daniel Graña
ac4f6cc17c
unify libxml2 document and factories
2012-04-13 13:54:48 -03:00
Daniel Graña
2904dc2dc0
no need for MultipleElementsFound exception. #111
2012-04-13 13:28:16 -03:00
Daniel Graña
ee1f7847a4
more lxml form fixes and test cases. #111
...
* Do not treat "coord" attribute specially, just pass "NN,NN" as clickdata value
* Raise explicit ValueError if not clickable is found
* Fix bug looking for clickeables trough xpath when there is more than one form
* Test from_response with multiple clickdata
2012-04-13 13:09:21 -03:00
Daniel Graña
32b9f788be
lxml form request cleanup. #111
...
* remove unused _nons function copied from lxml.html
* compute clickables only if dont_click is False
* less _get_clickables function branch nesting
2012-04-13 12:50:47 -03:00
Daniel Graña
e4d22cb16a
reuse form_values() method from lxml to avoid copying code. #111
2012-04-13 10:31:59 -03:00
Daniel Graña
18a35a9fd6
Merge branch 'lxml-selectors'
2012-04-13 09:32:51 -03:00
Daniel Graña
3dbe211d29
lxml boolean results fix. oops #116
2012-04-13 09:29:56 -03:00
Daniel Graña
a338c29287
Merge pull request #116 from dangra/lxml-selectors
...
more fixes to lxml selector incompatibilities
2012-04-13 04:38:45 -07:00
Daniel Graña
b9efa5ee73
more fixes to lxml selector incompatibilities
...
* Do not fail parsing empty bodies
* Do not fail parsing bodies with null bytes
* Recode to utf8 using response.body_as_unicode() to avoid decoding bugs
* Return empty results with unevaluable nodes like text or attribute nodes
* Return u'1' and u'0' for boolean xpaths
2012-04-13 00:58:31 -03:00
Daniel Graña
d8ebf16fe5
Merge pull request #114 from stav/master
...
Scrapy DOC changes
2012-04-11 12:08:45 -07:00
Pablo Hoffman
7cca916ed5
added release notes to official documentation, including all release notes since Scrapy 0.7
2012-04-11 15:53:23 -03:00
Lucian Ursu
c760cc5cd8
Copied lxml.html._nons to not rely on that module's private interface and took out check out of the for loop because it can be done only once
2012-04-11 21:40:00 +03:00
Lucian Ursu
f13a547203
Removed unnecessary iteration of formdata items
2012-04-11 21:08:43 +03:00
stav
f1802289cd
small doc typo change to get the fork rolling
2012-04-11 12:05:39 -05:00
Daniel Graña
02833e3265
fix typo in module description. closes #112
2012-04-11 10:44:31 -03:00
Daniel Graña
a0a1a5026b
do formdata encoding and serialization in one place. refs #111
2012-04-11 10:07:56 -03:00
Pablo Hoffman
4f28ffcb2c
removed no longer needed dependency on simplejson
2012-04-10 16:01:36 -03:00
Pablo Hoffman
6e8edbd72e
switched default selectors backend to lxml
2012-04-10 15:52:14 -03:00
Daniel Graña
af0e1c40f5
Avoid logging useless error messages about ignored requests in robots.txt
2012-04-10 13:37:32 -03:00
Lucian Ursu
4be6c22c4d
Removed ClientForm with its patch and tests, and BeautifulSoup
2012-04-10 10:24:30 +03:00
Lucian Ursu
df2e795278
Added test case to make sure that ambiguous clickdata is not allowed
2012-04-10 10:19:59 +03:00
Lucian Ursu
eb47849c05
Replaced ClientForm-based FormRequest with a lxml-based implementation
2012-04-10 10:18:54 +03:00
Daniel Graña
97e4003a56
do not fail handling unicode xpaths in libxml2 backed selectors
2012-04-04 17:18:31 -03:00
Pablo Hoffman
ab4dd928ee
Merge pull request #108 from kalessin/throttleslot
...
Fix autothrottle in order to modify also inactive downloader slots, so c...
2012-04-03 17:02:31 -07:00
olveyra
e6d7afa13b
Fix autothrottle in order to modify also inactive downloader slots, so cases fixed by inactive slots patch will work ok also when using autothrottle
2012-04-03 23:14:00 +00:00
Pablo Hoffman
c27f7eb7e9
Merge pull request #106 from kalessin/downloader2
...
dont discard slot when empty, just save in another dict in order to recycle if needed again
2012-04-02 14:39:38 -07:00
olveyra
b39cb22d83
dont discard slot when empty, just save in another dict in order to recycle if needed again.
...
This fix avoids to continuosly create new slot under certain cases, bug that prevents download_delay and max_concurrent_requests to work properly.
The problem arises when the slot for a given domain becomes empty, but further requests for that domain werent still created by the spider. This is typical when spider creates requests one by one, or it makes requests to multiple domains and one or more of them are created in a rate enough slow that makes slot to be empty each time the response is fetched.
The effect is that a new slot is created for each request under such conditions, and so the download_delay and max_concurrent_requests are not taking effect (because in order to apply, depends on an already existing slot for that domain).
2012-04-02 20:34:57 +00:00
Pablo Hoffman
e9184def35
make selector re() method use re.UNICODE flag to compile regexes
2012-04-01 00:41:03 -03:00
Pablo Hoffman
27018fced7
changed default user agent to Scrapy/0.15 (+ http://scrapy.org ) and removed no longer needed BOT_VERSION setting
2012-03-23 13:45:21 -03:00
Pablo Hoffman
731c569b5c
fixed test-scrapyd.sh script after changed on insophia website
2012-03-22 16:38:28 -03:00
Pablo Hoffman
8933e2f2be
added REFERER_ENABLED setting, to control referer middleware
2012-03-22 16:35:14 -03:00
Pablo Hoffman
eed34e88cd
Merge pull request #103 from jsyeo/patch-1
...
fixed minor mistake in Request objects documentation
2012-03-20 19:49:31 -07:00
Jason Yeo
da826aa13d
fixed minor mistake in Request objects documentation
2012-03-21 10:25:41 +08:00
Pablo Hoffman
175c70ad44
fixed minor defect in link extractors documentation
2012-03-20 22:56:45 -03:00
Pablo Hoffman
056a7c53d0
added artwork files properly now
2012-03-20 10:46:45 -03:00
Pablo Hoffman
aef70e8394
removed wrongly added artwork files
2012-03-20 10:45:48 -03:00
Pablo Hoffman
bcd8520f8d
added sep directory with Scrapy Enhancement Proposal imported from old Trac site
2012-03-20 10:15:00 -03:00
Pablo Hoffman
c0141d154e
added artwork directory (data taken from old Trac)
2012-03-20 10:14:11 -03:00
Pablo Hoffman
35fb01156e
removed some obsolete remaining code related to sqlite support in scrapy
2012-03-16 11:55:55 -03:00
Pablo Hoffman
838e1dcce9
updated FormRequest tests to use HtmlResponse instead of Response, as it makes more sense
2012-03-15 11:47:02 -03:00
Pablo Hoffman
b6ae266546
Removed (very old and possibly broken) backwards compatibility support for Twisted 2.5
2012-03-15 00:28:24 -03:00
Pablo Hoffman
9fddc73ed8
removed backwards compatibility code for old scrapy versions
2012-03-06 05:42:09 -02:00
Pablo Hoffman
9a508d4638
Removed deprecated setting: CLOSESPIDER_ITEMPASSED
2012-03-06 05:26:57 -02:00
Pablo Hoffman
8b83177655
Added CLOSESPIDER_ERRORCOUNT to scrapy/default_settings.py
2012-03-06 05:26:57 -02:00
Pablo Hoffman
9006227358
bumped required python-w3lib version in debian/control
2012-03-05 20:25:38 -02:00
Daniel Graña
2909a60e95
test that default start_request return value type is a generator. refs #98
2012-03-05 17:53:20 -02:00
Pablo Hoffman
45685ea6cd
Restored scrapy.utils.py26 module for backwards compatibility, with a deprecation message. This is needed because the module was used a lot by users and the change causes too much trouble
2012-03-05 17:15:49 -02:00