Pablo Hoffman
2fb5e62c39
doc: update overview page to point to the genspider command. refs #107
2012-04-19 02:37:22 -03:00
Pablo Hoffman
1c5294bee1
update docstring in project template to avoid confusion with genspider command, which may be considered as an advanced feature. refs #107
2012-04-19 02:35:48 -03:00
Pablo Hoffman
d567d8efbe
added note to docs/topics/firebug.rst about google directory being shut down
2012-04-19 01:34:20 -03:00
Daniel Graña
21e03729a3
lxml is the new default selector backend. closes #120
2012-04-19 00:28:27 -03:00
Daniel Graña
6bb40fe5a8
use xpath to match img tags in one shot. #119
2012-04-19 00:03:04 -03:00
Andrés Moreira
e24107feb8
fix HTMLImageLinkExtractor to work with libxml2 and lxml selectors
2012-04-18 18:05:44 -03:00
Pablo Hoffman
30ddbf624e
mention about some scrapy.xlib modules removed in the release notes
2012-04-17 12:31:18 -03:00
Pablo Hoffman
d99ee6deb9
added some missing entries to release notes
2012-04-17 12:29:48 -03:00
Daniel Graña
60ee5d6213
Merge branch 'lxml-formrequest'
2012-04-15 00:47:39 -03:00
Daniel Graña
5b0df465e7
SELECT matched as form inputs but hasnot type attribute. #111
2012-04-15 00:47:30 -03:00
Daniel Graña
150e5734d3
Merge pull request #111 from LucianU/lxml-formrequest
...
Lxml formrequest
2012-04-13 12:29:04 -07:00
Daniel Graña
39395eb4f7
iteritems returns tuple elements duh!. #111
2012-04-13 16:22:25 -03:00
Daniel Graña
b88bdd05c4
do not add clickable if it is in formdata. #111
2012-04-13 16:20:35 -03:00
Daniel Graña
51e5aadd1d
simplify formdata type infering. #111
2012-04-13 16:07:27 -03:00
Daniel Graña
a11ef7fba7
reuse LxmlDocument in FormRequest. #111
2012-04-13 15:41:58 -03:00
Daniel Graña
9e10abcc43
Merge branch 'master' into lxml-formrequest
2012-04-13 14:50:49 -03:00
Daniel Graña
1789a55f28
Merge pull request #117 from dangra/lxml-document
...
Lxml document
2012-04-13 10:49:35 -07:00
Daniel Graña
4c7d29b7f7
Cache response's element trees using LxmlDocument similar to Libxml2Document
2012-04-13 14:43:30 -03:00
Daniel Graña
ac4f6cc17c
unify libxml2 document and factories
2012-04-13 13:54:48 -03:00
Daniel Graña
2904dc2dc0
no need for MultipleElementsFound exception. #111
2012-04-13 13:28:16 -03:00
Daniel Graña
ee1f7847a4
more lxml form fixes and test cases. #111
...
* Do not treat "coord" attribute specially, just pass "NN,NN" as clickdata value
* Raise explicit ValueError if not clickable is found
* Fix bug looking for clickeables trough xpath when there is more than one form
* Test from_response with multiple clickdata
2012-04-13 13:09:21 -03:00
Daniel Graña
32b9f788be
lxml form request cleanup. #111
...
* remove unused _nons function copied from lxml.html
* compute clickables only if dont_click is False
* less _get_clickables function branch nesting
2012-04-13 12:50:47 -03:00
Daniel Graña
e4d22cb16a
reuse form_values() method from lxml to avoid copying code. #111
2012-04-13 10:31:59 -03:00
Daniel Graña
18a35a9fd6
Merge branch 'lxml-selectors'
2012-04-13 09:32:51 -03:00
Daniel Graña
3dbe211d29
lxml boolean results fix. oops #116
2012-04-13 09:29:56 -03:00
Daniel Graña
a338c29287
Merge pull request #116 from dangra/lxml-selectors
...
more fixes to lxml selector incompatibilities
2012-04-13 04:38:45 -07:00
Daniel Graña
b9efa5ee73
more fixes to lxml selector incompatibilities
...
* Do not fail parsing empty bodies
* Do not fail parsing bodies with null bytes
* Recode to utf8 using response.body_as_unicode() to avoid decoding bugs
* Return empty results with unevaluable nodes like text or attribute nodes
* Return u'1' and u'0' for boolean xpaths
2012-04-13 00:58:31 -03:00
Daniel Graña
d8ebf16fe5
Merge pull request #114 from stav/master
...
Scrapy DOC changes
2012-04-11 12:08:45 -07:00
Pablo Hoffman
7cca916ed5
added release notes to official documentation, including all release notes since Scrapy 0.7
2012-04-11 15:53:23 -03:00
Lucian Ursu
c760cc5cd8
Copied lxml.html._nons to not rely on that module's private interface and took out check out of the for loop because it can be done only once
2012-04-11 21:40:00 +03:00
Lucian Ursu
f13a547203
Removed unnecessary iteration of formdata items
2012-04-11 21:08:43 +03:00
stav
f1802289cd
small doc typo change to get the fork rolling
2012-04-11 12:05:39 -05:00
Daniel Graña
02833e3265
fix typo in module description. closes #112
2012-04-11 10:44:31 -03:00
Daniel Graña
a0a1a5026b
do formdata encoding and serialization in one place. refs #111
2012-04-11 10:07:56 -03:00
Pablo Hoffman
4f28ffcb2c
removed no longer needed dependency on simplejson
2012-04-10 16:01:36 -03:00
Pablo Hoffman
6e8edbd72e
switched default selectors backend to lxml
2012-04-10 15:52:14 -03:00
Daniel Graña
af0e1c40f5
Avoid logging useless error messages about ignored requests in robots.txt
2012-04-10 13:37:32 -03:00
Lucian Ursu
4be6c22c4d
Removed ClientForm with its patch and tests, and BeautifulSoup
2012-04-10 10:24:30 +03:00
Lucian Ursu
df2e795278
Added test case to make sure that ambiguous clickdata is not allowed
2012-04-10 10:19:59 +03:00
Lucian Ursu
eb47849c05
Replaced ClientForm-based FormRequest with a lxml-based implementation
2012-04-10 10:18:54 +03:00
Daniel Graña
97e4003a56
do not fail handling unicode xpaths in libxml2 backed selectors
2012-04-04 17:18:31 -03:00
Pablo Hoffman
ab4dd928ee
Merge pull request #108 from kalessin/throttleslot
...
Fix autothrottle in order to modify also inactive downloader slots, so c...
2012-04-03 17:02:31 -07:00
olveyra
e6d7afa13b
Fix autothrottle in order to modify also inactive downloader slots, so cases fixed by inactive slots patch will work ok also when using autothrottle
2012-04-03 23:14:00 +00:00
Pablo Hoffman
c27f7eb7e9
Merge pull request #106 from kalessin/downloader2
...
dont discard slot when empty, just save in another dict in order to recycle if needed again
2012-04-02 14:39:38 -07:00
olveyra
b39cb22d83
dont discard slot when empty, just save in another dict in order to recycle if needed again.
...
This fix avoids to continuosly create new slot under certain cases, bug that prevents download_delay and max_concurrent_requests to work properly.
The problem arises when the slot for a given domain becomes empty, but further requests for that domain werent still created by the spider. This is typical when spider creates requests one by one, or it makes requests to multiple domains and one or more of them are created in a rate enough slow that makes slot to be empty each time the response is fetched.
The effect is that a new slot is created for each request under such conditions, and so the download_delay and max_concurrent_requests are not taking effect (because in order to apply, depends on an already existing slot for that domain).
2012-04-02 20:34:57 +00:00
Pablo Hoffman
e9184def35
make selector re() method use re.UNICODE flag to compile regexes
2012-04-01 00:41:03 -03:00
Pablo Hoffman
27018fced7
changed default user agent to Scrapy/0.15 (+ http://scrapy.org ) and removed no longer needed BOT_VERSION setting
2012-03-23 13:45:21 -03:00
Pablo Hoffman
731c569b5c
fixed test-scrapyd.sh script after changed on insophia website
2012-03-22 16:38:28 -03:00
Pablo Hoffman
8933e2f2be
added REFERER_ENABLED setting, to control referer middleware
2012-03-22 16:35:14 -03:00
Pablo Hoffman
eed34e88cd
Merge pull request #103 from jsyeo/patch-1
...
fixed minor mistake in Request objects documentation
2012-03-20 19:49:31 -07:00