Pablo Hoffman
7a55158fed
fixed documentation bug (thanks rhill for reporting)
2010-05-11 11:25:03 -03:00
Steven Almeroth
5d03405cac
FormRequest.from_response doc fix. closes #155
...
--HG--
extra : rebase_source : d54979f6a15e5e997072dcbbc6d43b426189312b
2010-04-26 22:28:07 -03:00
Pablo Hoffman
2121a30c74
added note about installing Zope.Interface in windows platforms
2010-04-24 18:19:52 -03:00
Daniel Grana
6c12106803
Remove shpinx warning introduced by shorter title overline
2010-04-18 23:42:56 -03:00
Lucian Ursu
2f8c052484
#154 : Language fixes to the documentation
2010-04-18 23:39:54 -03:00
Ping Yin
d42e5fdbac
linkextractor: unique after urljoin_rfc
...
Now, '/foo.html' and 'http://example.org/foo.html ' are considered
as the same and only one is kept.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-02 19:45:30 +08:00
Pablo Hoffman
1868ede549
bumped embedded pydispatch to 2.0.1
2010-05-14 16:38:04 -03:00
Pablo Hoffman
02b7ca7e8c
bumped embedded BeautifulSoup to 3.0.8.1
2010-05-14 16:30:50 -03:00
Daniel Grana
e528a77fa3
Automated merge with ssh://hg.scrapy.org/scrapy
2010-05-14 20:09:29 +01:00
Daniel Grana
b2f58207a4
avoid different behaviour in urljoin between pytho2.5 and python2.6+. see http://bugs.python.org/issue1432
2010-05-14 20:09:07 +01:00
Pablo Hoffman
c87a29eb9e
improved docstring
2010-05-14 14:48:34 -03:00
Pablo Hoffman
31843316bc
Added new instance based learning extraction library in scrapy.contrib.ibl. Documentation and tools will be added later.
2010-05-14 14:33:26 -03:00
Ping Yin
0b3bf5c6f6
downloader_handler: test HEAD method
2010-05-04 15:50:26 +08:00
Ping Yin
0aaa74d2bd
extract_regex: encoding arg defaults to 'utf-8'
...
Sometimes it is not neccessary to pass the encoding argument. For
example, when the text argument is unicode. So set a default encoding.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-22 23:43:34 +08:00
Pablo Hoffman
dfdac356af
added missing default values to file xporter doc
2010-04-02 02:49:18 -03:00
Pablo Hoffman
2f75839e7a
Ignore noisy Twisted deprecation warnings
2010-03-27 13:23:13 -03:00
Pablo Hoffman
f19c939925
fixed doc typo
2010-03-26 08:28:32 -03:00
Pablo Hoffman
99a876754c
Improved "What else?" section of "Scrapy at a glance" overview
2010-03-20 20:24:18 -03:00
Pablo Hoffman
234fd709ad
fixed doc typo (thanks Victor)
2010-03-19 10:32:17 -03:00
Daniel Grana
184cf6684f
Remove HttpException references from docs. Since 0.7, scrapy returns non-200 as Response objects and does not raise HttpException anymore
2010-03-18 10:05:33 -03:00
Daniel Grana
17091902f3
Explicity say where to save item class in "Defining our item" section of tutorial
2010-03-12 14:12:49 -02:00
Pablo Hoffman
c5cd8b9d3d
Fixed bug in open_in_browser() function with Python 2.5 ( closes #145 ).
2010-03-12 09:31:05 -02:00
Ping Yin
90fef3cbcd
ImagePipeline: show http code when failing to download
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-02-27 18:09:50 +08:00
Ping Yin
5c60ef69ab
remove_tags: add keep argument
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-24 19:08:01 +08:00
Ping Yin
94e6acebab
Fix remove_tags like functions can't remove empty tag such as <br/>
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-24 18:18:43 +08:00
Daniel Grana
c925c9e9a0
Notify spider when requests are ignored by HttpErrorMiddleware, and generally when any call to process_spider_input raises an exception
2010-05-12 16:41:06 -03:00
Daniel Grana
d3ab3cf85c
url_query_cleaner: cleanup and avoid rejoining key-sep-value to build the query again
...
--HG--
extra : rebase_source : 7c2648b6dd1c2253f1ec0f11d5e1f2ee25bd1273
2010-05-12 14:09:37 -03:00
Pablo Hoffman
3fb8058016
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-05-11 11:25:24 -03:00
Pablo Hoffman
1750e233f7
moved import to top
2010-05-11 11:23:56 -03:00
Daniel Grana
ac646a3b47
url_query_cleaner: do not append ? if query is empty
2010-04-30 16:19:59 -03:00
Daniel Grana
3d731ba641
url_query_cleaner: add exclude and non-unique parameters support, also remove untested exception catching code and add missing tests
2010-04-30 09:41:11 -03:00
Daniel Grana
c0d45846b8
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
2010-04-26 22:29:45 -03:00
Pablo Hoffman
81f6502e37
Automated merge with http://hg.scrapy.org/scrapy-0.8/
2010-04-24 18:22:13 -03:00
Daniel Grana
658e6f15e9
Automated merge with ssh://hg.scrapy.org/scrapy-0.8
2010-04-18 23:44:59 -03:00
Pablo Hoffman
b94abf36a3
Added scrapy.utils.py26.json to use python2.6 json module when available, otherwise failback to simplejson module or scrapy.xlib.simplejson. This way we can always assume json and avoid conditional code.
2010-04-12 10:44:07 -03:00
Pablo Hoffman
cd6aa72d7f
fixed import
2010-04-12 10:42:07 -03:00
Pablo Hoffman
025b34e122
bugfix for python < 2.6
2010-04-11 07:07:38 -03:00
Pablo Hoffman
650d1c4fbe
moved copytree() function from utils.python to utils.py26
2010-04-11 03:47:48 -03:00
Pablo Hoffman
be45acd457
added scrapy.service and scrapy.tac for running from twistd
2010-04-11 03:37:08 -03:00
Daniel Grana
0dbb5d44ae
images: avoid signing images based on spider name or request hostname, use request.meta instead
2010-04-09 14:16:00 -03:00
Daniel Grana
68a875edb0
update ENCODING_ALIASES setting default value in settings documentation topic
2010-04-07 10:54:54 -03:00
Daniel Grana
8b86e1d008
Minimize effect of http://bugs.python.org/issue8271 on TextResponses by changing str.decode errors policy by custom replace
alike error handler
2010-04-07 00:29:53 -03:00
Pablo Hoffman
3fcd69c347
added a couple additional TwistedPluginSpiderManager tests
2010-04-06 10:55:21 -03:00
daniel
2cd591e8a7
add missing dropin.cache file required by default spidermanager tests
2010-04-06 07:22:50 +01:00
Daniel Grana
0b07742adb
gb2312 and gbk encodings was superseded by gb18030
2010-04-05 15:07:43 -03:00
Pablo Hoffman
0dfec04439
made Spider name required again (do not default)
2010-04-05 12:34:29 -03:00
Daniel Grana
70ac6642d5
SEP-012: bugfix backward compatibility of Spider.domain_name and Spider.extra_domain_names
...
--HG--
extra : rebase_source : 66f779cddc6854092951078d443dbf9113f7576a
2010-04-05 12:09:43 -03:00
Pablo Hoffman
77a4d9aba9
use a default name for spiders constructed without names
2010-04-05 11:53:22 -03:00
Pablo Hoffman
c99e1af766
Added support for passing generic arguments to spider constructors (refs #152 ), extended Spider tests, added unittests for TwistedPluginSpiderManager
2010-04-05 11:27:19 -03:00
Pablo Hoffman
de32612c99
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-04-02 02:49:51 -03:00