1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 04:23:45 +00:00

2145 Commits

Author SHA1 Message Date
Pablo Hoffman
c359a34d7d moved scrapy.core.exceptions to scrapy.exceptions, keeping backwards compatibility
--HG--
rename : scrapy/core/exceptions.py => scrapy/exceptions.py
2010-08-10 17:36:48 -03:00
Pablo Hoffman
355c615c2c removed old unsupported SpiderProfiler extension 2010-08-10 17:23:22 -03:00
Pablo Hoffman
b1c0280616 removed scheduler middleware doc, as scheduler middleware will be removed soon 2010-08-10 16:59:49 -03:00
Pablo Hoffman
5aa8e63957 Removed 'sender' argument when sending signals, as we're not sending it consistently, and it's not being used by receivers either 2010-08-09 14:41:54 -03:00
Pablo Hoffman
3d121898e2 add signal handler name when logging errors 2010-08-09 13:32:44 -03:00
Pablo Hoffman
a08e62a25d silence irrelevant (and confusing) errors generated in tests by signals left active after engine tests run - we should really rewrite engine tests asap 2010-08-09 13:24:22 -03:00
Pablo Hoffman
bb2e0de7da fixed utils.signal tests broken in previous commit 2010-08-09 13:22:57 -03:00
Pablo Hoffman
8de0dc3647 Log full traceback of signal handler errors in send_catch_log() - closes #194. Also made engine use send_catch_log for spider_idle signal 2010-08-09 12:05:03 -03:00
Pablo Hoffman
2aa84073ab moved scrapy log observer logic into a separate function 2010-08-09 11:09:07 -03:00
Pablo Hoffman
5139843c5c avoid noisy KeyError in enqueue_scrape, when closing spiders manually 2010-08-09 11:07:45 -03:00
Pablo Hoffman
67e42d1bfd Moved scrapy/command/__init__.py to scrapy/command.py
--HG--
rename : scrapy/command/__init__.py => scrapy/command.py
2010-08-08 07:30:03 -03:00
Pablo Hoffman
c7d9f6e270 Added JSON item exporter with doc and unittests (closes #192), and also:
* put all json exporters in scrapy.contrib.exporters and deprecated
  scrapy.contrib.exporters.jsonlines to reduce module nesting
* use JSON exporter with EXPORT_FORMAT=json in file export pipeline
2010-08-07 15:52:59 -03:00
Pablo Hoffman
ba2369bfa1 changed variable names for clarity 2010-08-06 15:05:58 -03:00
Pablo Hoffman
35e6c8725b Added handles_request() class method to BaseSpider - closes #191 2010-08-06 14:59:18 -03:00
Pablo Hoffman
4d66f4c6f8 Added support for logging twisted errors generated outside of Scrapy - refs #188 2010-08-05 20:46:54 -03:00
Pablo Hoffman
c82260f5ec make runtests.sh more virtualenv-friendly 2010-08-05 13:31:19 -03:00
Pablo Hoffman
daa9506b34 Fixed bug with non-keepalive execution queues (closes #190) 2010-08-04 14:20:45 -03:00
Pablo Hoffman
49851d7f55 Automated merge with http://hg.scrapy.org/scrapy-0.9 2010-08-02 17:20:55 -03:00
Pablo Hoffman
6c68e4ce15 fixed documentation typo 2010-08-02 17:20:13 -03:00
Pablo Hoffman
b2878fbbef scrapy.utils.url_is_from_spider() - Consider spider name as possible domain for matching in scrapy.utils.url_is_from_spider() 2010-08-02 12:16:36 -03:00
Pablo Hoffman
453e7bf38c Scrapy logging refactoring (closes #188):
* added Twisted log observer for Scrapy, with unittests
 * use numeric values from Python logging module for log levels
 * removed scrapy.log.exc() function - use scrapy.log.err() instead
 * removed logmessage_received signal - write a (twisted) log observer instead
 * dropped support for obsolete `domain` argument
 * dropped support for old setting names: LOGLEVEL, LOGFILE (replaced by LOG_LEVEL, LOG_FILE)
 * deprecated `component` argument
2010-08-02 08:49:14 -03:00
Pablo Hoffman
c5fd113c09 minor fix to path name 2010-07-31 16:02:59 -03:00
Pablo Hoffman
f6eac8e348 Splitted single Debian package into two packages (closes #187):
- scrapy: which provides only the library and scrapy-ctl command
- scrapy-service: which provides the service, upstart script, system user, etc

This allows a clean install of just the library for those which are not
interested in the Scrapy service.

--HG--
rename : debian/scrapy.dirs => debian/scrapy-service.dirs
rename : debian/scrapy.install => debian/scrapy-service.install
rename : debian/scrapy.postinst => debian/scrapy-service.postinst
rename : debian/scrapy.postrm => debian/scrapy-service.postrm
rename : debian/scrapy.upstart => debian/scrapy-service.upstart
rename : debian/conf/service_conf.py => debian/service_conf.py
2010-07-31 15:50:12 -03:00
Ismael Carnales
e145ec686c Replaced default spider manager (TwistedPluginSpiderManger) with a simpler one that doesn't depend on Twisted Plugins infrastructure. 2010-07-30 17:30:32 -03:00
Pablo Hoffman
65c3de8e6a Rewritten walk_modules function to support eggs, and added tests 2010-07-30 17:05:55 -03:00
Ismael Carnales
6d06824488 utils: Add walk_packages utility function
--HG--
rename : scrapy/tests/test_utils_misc.py => scrapy/tests/test_utils_misc/__init__.py
2010-07-30 15:53:24 -03:00
Ismael Carnales
9511a3154e Fix error message when spider not found in parse command 2010-07-30 14:49:07 -03:00
Pablo Hoffman
e112def754 removed old untested module: scrapy.utils.mysql 2010-07-29 12:11:07 -03:00
Pablo Hoffman
6312826220 removed (no longer supported) webconsole code 2010-07-29 12:03:02 -03:00
molveyra
936ffe5e26 Automated merge with ssh://hg@hg.scrapy.org:2222/scrapy 2010-07-28 11:28:52 -03:00
molveyra
8781ef3914 Remove restriction of marking ignore-beneath only for img unpaired tags 2010-07-28 11:16:42 -03:00
Pablo Hoffman
2349d241e0 removed custom Makefile and version based on mercurial revision 2010-07-24 17:02:08 -03:00
Pablo Hoffman
e2290a5359 Some changes to Crawl spider:
* added process_request attribute to rules
* removed docstrings, since it duplicates documentation
2010-07-22 18:40:35 -03:00
Daniel Grana
4e2859e5d5 Automated merge with ssh://hg.scrapy.org/scrapy-0.9 2010-07-20 15:47:46 -03:00
Daniel Grana
68c7ef7d98 fix scraper leak closing spider. closes #182 2010-07-20 15:47:07 -03:00
Daniel Grana
3e013f564b update docs for defaultheaders middleware and change spider attribute to match global setting name 2010-07-16 16:17:08 -03:00
Daniel Grana
6883a99c1e Automated merge with ssh://hg.scrapy.org/scrapy-0.9 2010-07-16 14:56:00 -03:00
Daniel Grana
b799e5ee37 Support default headers per spider. closes #181
--HG--
extra : rebase_source : 60162dffa4fbab525501e46b479dc272b8998942
2010-07-16 14:51:14 -03:00
Pablo Hoffman
b91d40ba78 Fixed grammar error in doc (patch by stav) - closes #176 2010-07-16 11:34:18 -03:00
Pablo Hoffman
b8aa74ee9e bugfix in request_httprepr() function 2010-07-15 12:04:55 -03:00
Martin Olveyra
ec850b9fd1 Fix memusage report concatenation 2010-07-14 18:47:09 -03:00
Pablo Hoffman
90a04f0530 Automated merge with http://hg.scrapy.org/scrapy-0.9 2010-07-13 19:47:55 -03:00
Pablo Hoffman
cc32f6ec66 Applied patch to ClientForm to fix bug with wrong entities. Also added tests and left patch in repo in case we upgrade ClientForm in the future and need to re-apply it 2010-07-13 19:46:53 -03:00
Pablo Hoffman
9e37ec4230 fixed documentation typo (closes #151) 2010-07-13 19:03:02 -03:00
Ping Yin
b3a65d3313 HTTPCACHE: Don't cache response with codes in HTTPCACHE_IGNORE_HTTP_CODES 2010-07-09 13:14:25 -03:00
Pablo Hoffman
2067bcd8d0 Automated merge with http://hg.scrapy.org/scrapy-0.9 2010-07-08 14:03:58 -03:00
Juan Picca
2ddbbc8152 allow passing custom headers in FormRequest.from_response() 2010-07-08 14:02:28 -03:00
Pablo Hoffman
a6a86d9b4a Automated merge with http://hg.scrapy.org/scrapy-0.9 2010-07-01 11:48:36 -03:00
Martin Olveyra
b258fc3305 Fixed bug with float values in meta refresh 2010-07-01 11:46:06 -03:00
Pablo Hoffman
3e976e5005 Automated merge with http://hg.scrapy.org/scrapy-0.9 2010-06-28 00:55:35 -03:00