Daniel Grana
c8c19a8e53
Automated merge with ssh://hg.scrapy.org/scrapy
2010-05-21 17:54:41 -03:00
Daniel Grana
cce9c4da49
silence HttpError exceptions raised by httperror spidermiddleware if not handled by spider
2010-05-21 17:54:32 -03:00
Ping Yin
f2363afe6f
LinkExtractor: split _process_links from _extract_links
...
Separate the extraction and process logic, so we can override in subclass easier.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-27 14:58:11 +08:00
Ping Yin
6059221716
Compose: stop process on None value by default
...
By doing this, we can use str.lower as a processor safely without
checking whether the given value is None.
By passing stop_on_none=False as keyword argument, this behaviour can be changed.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-08 10:59:47 +08:00
Ping Yin
15b879f845
ItemLoader: Update docs for {add,replace,get}_{value,xpath}
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-05-18 17:54:25 +08:00
Ping Yin
8f53a72306
ItemLoader: add test for adding a dict value
...
After arg_to_iter is changed to return [arg] if arg is a dict,
the added test will pass.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-24 21:21:12 +08:00
Ping Yin
8497301784
arg_to_iter: return [arg] if arg is a dict
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-24 21:20:23 +08:00
Ping Yin
bd844f690b
{add,replace}_xpath: add processors, kw args and allow field_name to be None
...
Also add method get_xpath.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:34:55 +08:00
Ping Yin
a6c315552c
ItemLoader: Update tests for {add,replace,get}_value
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:49:25 +08:00
Ping Yin
913b5db242
{add,replace,get}_value: accept keyword args, now only 're'
...
if re given, extract data from the given value by this regex
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:45:01 +08:00
Ping Yin
ddfaf6049f
{add,replace}_value: add processors args and allow field_name to be None
...
* value is first proccessed by processors before passing to input
processor
* if field_name is None, values for multiple fields may be
added/replaced. The keys of the processed value are as the field names
* add get_value function for the processor logic
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:42:55 +08:00
Ping Yin
cf35e09d35
ItemLoader: don't limit item to Item object
...
Now, for example, item can be a dict
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:28:57 +08:00
Pablo Hoffman
bfd9cb42e5
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-05-17 20:11:27 -03:00
Pablo Hoffman
076cdfd585
Added documentation about contributing to Scrapy
2010-05-17 20:10:46 -03:00
Pablo Hoffman
7a55158fed
fixed documentation bug (thanks rhill for reporting)
2010-05-11 11:25:03 -03:00
Steven Almeroth
5d03405cac
FormRequest.from_response doc fix. closes #155
...
--HG--
extra : rebase_source : d54979f6a15e5e997072dcbbc6d43b426189312b
2010-04-26 22:28:07 -03:00
Pablo Hoffman
2121a30c74
added note about installing Zope.Interface in windows platforms
2010-04-24 18:19:52 -03:00
Daniel Grana
6c12106803
Remove shpinx warning introduced by shorter title overline
2010-04-18 23:42:56 -03:00
Lucian Ursu
2f8c052484
#154 : Language fixes to the documentation
2010-04-18 23:39:54 -03:00
Ping Yin
d42e5fdbac
linkextractor: unique after urljoin_rfc
...
Now, '/foo.html' and 'http://example.org/foo.html ' are considered
as the same and only one is kept.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-02 19:45:30 +08:00
Pablo Hoffman
1868ede549
bumped embedded pydispatch to 2.0.1
2010-05-14 16:38:04 -03:00
Pablo Hoffman
02b7ca7e8c
bumped embedded BeautifulSoup to 3.0.8.1
2010-05-14 16:30:50 -03:00
Daniel Grana
e528a77fa3
Automated merge with ssh://hg.scrapy.org/scrapy
2010-05-14 20:09:29 +01:00
Daniel Grana
b2f58207a4
avoid different behaviour in urljoin between pytho2.5 and python2.6+. see http://bugs.python.org/issue1432
2010-05-14 20:09:07 +01:00
Pablo Hoffman
c87a29eb9e
improved docstring
2010-05-14 14:48:34 -03:00
Pablo Hoffman
31843316bc
Added new instance based learning extraction library in scrapy.contrib.ibl. Documentation and tools will be added later.
2010-05-14 14:33:26 -03:00
Ping Yin
0b3bf5c6f6
downloader_handler: test HEAD method
2010-05-04 15:50:26 +08:00
Ping Yin
0aaa74d2bd
extract_regex: encoding arg defaults to 'utf-8'
...
Sometimes it is not neccessary to pass the encoding argument. For
example, when the text argument is unicode. So set a default encoding.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-22 23:43:34 +08:00
Pablo Hoffman
dfdac356af
added missing default values to file xporter doc
2010-04-02 02:49:18 -03:00
Pablo Hoffman
2f75839e7a
Ignore noisy Twisted deprecation warnings
2010-03-27 13:23:13 -03:00
Pablo Hoffman
f19c939925
fixed doc typo
2010-03-26 08:28:32 -03:00
Pablo Hoffman
99a876754c
Improved "What else?" section of "Scrapy at a glance" overview
2010-03-20 20:24:18 -03:00
Pablo Hoffman
234fd709ad
fixed doc typo (thanks Victor)
2010-03-19 10:32:17 -03:00
Daniel Grana
184cf6684f
Remove HttpException references from docs. Since 0.7, scrapy returns non-200 as Response objects and does not raise HttpException anymore
2010-03-18 10:05:33 -03:00
Daniel Grana
17091902f3
Explicity say where to save item class in "Defining our item" section of tutorial
2010-03-12 14:12:49 -02:00
Pablo Hoffman
c5cd8b9d3d
Fixed bug in open_in_browser() function with Python 2.5 ( closes #145 ).
2010-03-12 09:31:05 -02:00
Ping Yin
90fef3cbcd
ImagePipeline: show http code when failing to download
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-02-27 18:09:50 +08:00
Ping Yin
5c60ef69ab
remove_tags: add keep argument
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-24 19:08:01 +08:00
Ping Yin
94e6acebab
Fix remove_tags like functions can't remove empty tag such as <br/>
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-24 18:18:43 +08:00
Daniel Grana
c925c9e9a0
Notify spider when requests are ignored by HttpErrorMiddleware, and generally when any call to process_spider_input raises an exception
2010-05-12 16:41:06 -03:00
Daniel Grana
d3ab3cf85c
url_query_cleaner: cleanup and avoid rejoining key-sep-value to build the query again
...
--HG--
extra : rebase_source : 7c2648b6dd1c2253f1ec0f11d5e1f2ee25bd1273
2010-05-12 14:09:37 -03:00
Pablo Hoffman
3fb8058016
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-05-11 11:25:24 -03:00
Pablo Hoffman
1750e233f7
moved import to top
2010-05-11 11:23:56 -03:00
Daniel Grana
ac646a3b47
url_query_cleaner: do not append ? if query is empty
2010-04-30 16:19:59 -03:00
Daniel Grana
3d731ba641
url_query_cleaner: add exclude and non-unique parameters support, also remove untested exception catching code and add missing tests
2010-04-30 09:41:11 -03:00
Daniel Grana
c0d45846b8
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
2010-04-26 22:29:45 -03:00
Pablo Hoffman
81f6502e37
Automated merge with http://hg.scrapy.org/scrapy-0.8/
2010-04-24 18:22:13 -03:00
Daniel Grana
658e6f15e9
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
2010-04-18 23:44:59 -03:00
Pablo Hoffman
b94abf36a3
Added scrapy.utils.py26.json to use python2.6 json module when available, otherwise failback to simplejson module or scrapy.xlib.simplejson. This way we can always assume json and avoid conditional code.
2010-04-12 10:44:07 -03:00
Pablo Hoffman
cd6aa72d7f
fixed import
2010-04-12 10:42:07 -03:00