Ping Yin
a6c315552c
ItemLoader: Update tests for {add,replace,get}_value
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:49:25 +08:00
Ping Yin
913b5db242
{add,replace,get}_value: accept keyword args, now only 're'
...
if re given, extract data from the given value by this regex
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:45:01 +08:00
Ping Yin
ddfaf6049f
{add,replace}_value: add processors args and allow field_name to be None
...
* value is first proccessed by processors before passing to input
processor
* if field_name is None, values for multiple fields may be
added/replaced. The keys of the processed value are as the field names
* add get_value function for the processor logic
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:42:55 +08:00
Ping Yin
cf35e09d35
ItemLoader: don't limit item to Item object
...
Now, for example, item can be a dict
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:28:57 +08:00
Pablo Hoffman
bfd9cb42e5
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-05-17 20:11:27 -03:00
Pablo Hoffman
076cdfd585
Added documentation about contributing to Scrapy
2010-05-17 20:10:46 -03:00
Pablo Hoffman
7a55158fed
fixed documentation bug (thanks rhill for reporting)
2010-05-11 11:25:03 -03:00
Steven Almeroth
5d03405cac
FormRequest.from_response doc fix. closes #155
...
--HG--
extra : rebase_source : d54979f6a15e5e997072dcbbc6d43b426189312b
2010-04-26 22:28:07 -03:00
Pablo Hoffman
2121a30c74
added note about installing Zope.Interface in windows platforms
2010-04-24 18:19:52 -03:00
Daniel Grana
6c12106803
Remove shpinx warning introduced by shorter title overline
2010-04-18 23:42:56 -03:00
Lucian Ursu
2f8c052484
#154 : Language fixes to the documentation
2010-04-18 23:39:54 -03:00
Ping Yin
d42e5fdbac
linkextractor: unique after urljoin_rfc
...
Now, '/foo.html' and 'http://example.org/foo.html ' are considered
as the same and only one is kept.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-02 19:45:30 +08:00
Pablo Hoffman
1868ede549
bumped embedded pydispatch to 2.0.1
2010-05-14 16:38:04 -03:00
Pablo Hoffman
02b7ca7e8c
bumped embedded BeautifulSoup to 3.0.8.1
2010-05-14 16:30:50 -03:00
Daniel Grana
e528a77fa3
Automated merge with ssh://hg.scrapy.org/scrapy
2010-05-14 20:09:29 +01:00
Daniel Grana
b2f58207a4
avoid different behaviour in urljoin between pytho2.5 and python2.6+. see http://bugs.python.org/issue1432
2010-05-14 20:09:07 +01:00
Pablo Hoffman
c87a29eb9e
improved docstring
2010-05-14 14:48:34 -03:00
Pablo Hoffman
31843316bc
Added new instance based learning extraction library in scrapy.contrib.ibl. Documentation and tools will be added later.
2010-05-14 14:33:26 -03:00
Ping Yin
0b3bf5c6f6
downloader_handler: test HEAD method
2010-05-04 15:50:26 +08:00
Ping Yin
0aaa74d2bd
extract_regex: encoding arg defaults to 'utf-8'
...
Sometimes it is not neccessary to pass the encoding argument. For
example, when the text argument is unicode. So set a default encoding.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-22 23:43:34 +08:00
Pablo Hoffman
dfdac356af
added missing default values to file xporter doc
2010-04-02 02:49:18 -03:00
Pablo Hoffman
2f75839e7a
Ignore noisy Twisted deprecation warnings
2010-03-27 13:23:13 -03:00
Pablo Hoffman
f19c939925
fixed doc typo
2010-03-26 08:28:32 -03:00
Pablo Hoffman
99a876754c
Improved "What else?" section of "Scrapy at a glance" overview
2010-03-20 20:24:18 -03:00
Pablo Hoffman
234fd709ad
fixed doc typo (thanks Victor)
2010-03-19 10:32:17 -03:00
Daniel Grana
184cf6684f
Remove HttpException references from docs. Since 0.7, scrapy returns non-200 as Response objects and does not raise HttpException anymore
2010-03-18 10:05:33 -03:00
Daniel Grana
17091902f3
Explicity say where to save item class in "Defining our item" section of tutorial
2010-03-12 14:12:49 -02:00
Pablo Hoffman
c5cd8b9d3d
Fixed bug in open_in_browser() function with Python 2.5 ( closes #145 ).
2010-03-12 09:31:05 -02:00
Ping Yin
90fef3cbcd
ImagePipeline: show http code when failing to download
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-02-27 18:09:50 +08:00
Ping Yin
5c60ef69ab
remove_tags: add keep argument
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-24 19:08:01 +08:00
Ping Yin
94e6acebab
Fix remove_tags like functions can't remove empty tag such as <br/>
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-24 18:18:43 +08:00
Daniel Grana
c925c9e9a0
Notify spider when requests are ignored by HttpErrorMiddleware, and generally when any call to process_spider_input raises an exception
2010-05-12 16:41:06 -03:00
Daniel Grana
d3ab3cf85c
url_query_cleaner: cleanup and avoid rejoining key-sep-value to build the query again
...
--HG--
extra : rebase_source : 7c2648b6dd1c2253f1ec0f11d5e1f2ee25bd1273
2010-05-12 14:09:37 -03:00
Pablo Hoffman
3fb8058016
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-05-11 11:25:24 -03:00
Pablo Hoffman
1750e233f7
moved import to top
2010-05-11 11:23:56 -03:00
Daniel Grana
ac646a3b47
url_query_cleaner: do not append ? if query is empty
2010-04-30 16:19:59 -03:00
Daniel Grana
3d731ba641
url_query_cleaner: add exclude and non-unique parameters support, also remove untested exception catching code and add missing tests
2010-04-30 09:41:11 -03:00
Daniel Grana
c0d45846b8
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
2010-04-26 22:29:45 -03:00
Pablo Hoffman
81f6502e37
Automated merge with http://hg.scrapy.org/scrapy-0.8/
2010-04-24 18:22:13 -03:00
Daniel Grana
658e6f15e9
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
2010-04-18 23:44:59 -03:00
Pablo Hoffman
b94abf36a3
Added scrapy.utils.py26.json to use python2.6 json module when available, otherwise failback to simplejson module or scrapy.xlib.simplejson. This way we can always assume json and avoid conditional code.
2010-04-12 10:44:07 -03:00
Pablo Hoffman
cd6aa72d7f
fixed import
2010-04-12 10:42:07 -03:00
Pablo Hoffman
025b34e122
bugfix for python < 2.6
2010-04-11 07:07:38 -03:00
Pablo Hoffman
650d1c4fbe
moved copytree() function from utils.python to utils.py26
2010-04-11 03:47:48 -03:00
Pablo Hoffman
be45acd457
added scrapy.service and scrapy.tac for running from twistd
2010-04-11 03:37:08 -03:00
Daniel Grana
0dbb5d44ae
images: avoid signing images based on spider name or request hostname, use request.meta instead
2010-04-09 14:16:00 -03:00
Daniel Grana
68a875edb0
update ENCODING_ALIASES setting default value in settings documentation topic
2010-04-07 10:54:54 -03:00
Daniel Grana
8b86e1d008
Minimize effect of http://bugs.python.org/issue8271 on TextResponses by changing str.decode errors policy by custom replace
alike error handler
2010-04-07 00:29:53 -03:00
Pablo Hoffman
3fcd69c347
added a couple additional TwistedPluginSpiderManager tests
2010-04-06 10:55:21 -03:00
daniel
2cd591e8a7
add missing dropin.cache file required by default spidermanager tests
2010-04-06 07:22:50 +01:00