1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 11:44:11 +00:00

2735 Commits

Author SHA1 Message Date
Pablo Hoffman
b10f4fae35 Moved several functions from scrapy.utils.{http,markup,multipart,response,url} (and their tests) to a new library called 'w3lib'. Scrapy will now depend on w3lib. 2011-04-18 22:37:19 -03:00
Pablo Hoffman
ad496eb3b6 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-04-14 12:36:27 -03:00
Pablo Hoffman
ecb4f44cbc Added clarification on how to work with local settings and scrapy deploy 2011-04-14 12:36:09 -03:00
Pablo Hoffman
6f262a198c Added IOError to the list of exceptions to retry in the RetryMiddleware 2011-04-12 18:12:36 -03:00
Pablo Hoffman
7c49e8679c fixed typo 2011-04-07 02:04:42 -03:00
Pablo Hoffman
3ee2c94e93 Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it 2011-04-06 14:54:48 -03:00
Pablo Hoffman
8a5c08a6bc added join_multivalued parameter to CsvItemExporter 2011-03-24 13:15:52 -03:00
Pablo Hoffman
84dee1f77f removed unused function 2011-03-24 09:03:57 -03:00
Pablo Hoffman
3954e600ca added DBM storage backend for HTTP cache 2011-03-23 21:32:02 -03:00
Pablo Hoffman
60f6a9b054 moved scrapyd python module to scrapy debian package. left scrapyd package only for installing service (upstart script) and scrapy user 2011-03-23 15:45:40 -03:00
Shane Evans
407f7f2d65 fix minor error in IBL tests and post-processing nested annoations 2011-03-11 21:58:47 +00:00
Shane Evans
bc6eee71e3 refactor IBL extraction to allow processing parsed data
--HG--
extra : rebase_source : 1a0ec4322702288f6e996d384d45a36deede3868
2011-03-11 20:02:29 +00:00
Pablo Hoffman
9591413d9d added Jochen Maes to AUTHORS 2011-03-09 14:24:17 -02:00
Jochen Maes
47a7f154ab Add listjobs.json to Scrapyd API
You can use listjobs.json with project=<projectname> to get a list of projects that are running currently.
It returns a list of jobs with spidername and job-id.

Signed-off-by: Jochen Maes <jochen.maes@sejo.be>
---
 scrapyd/webservice.py |    9 +++++++++
 scrapyd/website.py    |    1 +
 2 files changed, 10 insertions(+), 0 deletions(-)
2011-03-09 14:22:10 -02:00
Pablo Hoffman
99033d91c3 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-03-09 12:41:26 -02:00
Pablo Hoffman
36431a1439 Silenced confusing sqlite3.ProgrammingError exception. For more info see: http://twistedmatrix.com/trac/ticket/4040 2011-03-09 12:39:24 -02:00
Shane Evans
5cae22b665 add nofollow to Link object 2011-02-25 12:55:51 -02:00
Pablo Hoffman
cfd11df539 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-24 15:28:57 -02:00
Pablo Hoffman
8f7e163b04 Fixed wrong method name in downloader middleware documentation 2011-02-24 15:26:32 -02:00
Shane Evans
32fa2add75 style fix to ibl contrib 2011-02-24 14:21:23 -02:00
Pablo Hoffman
fe9febe2b1 added --build-only option to deploy command, to build the egg without deploying it 2011-02-23 18:10:16 -02:00
Shane Evans
af4db2767c Automated merge with ssh://hg.scrapy.org/scrapy-0.12 2011-02-16 18:31:49 -02:00
Shane Evans
74413ff989 prevent incorrect assertion error possible when spiders are closing 2011-02-16 18:30:50 -02:00
Daniel Grana
c55355642c fix FAQ typos reported by marlun_ at #scrapy IRC channel 2011-02-16 08:57:42 -02:00
Shane Evans
a1c3fa5dd8 small refactor of image extraction 2011-02-15 15:42:10 -02:00
Pablo Hoffman
1fb55bdaf0 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-15 07:25:12 -02:00
Pablo Hoffman
16d9a33951 added FAQ entry about working with big data feeds 2011-02-15 07:24:52 -02:00
Ismael Carnales
9b07b0ab0a Fix xmliter_lxml 2011-02-11 11:41:44 -02:00
Pablo Hoffman
874bfa0284 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-10 17:41:13 -02:00
Pablo Hoffman
3dc677725e Fixed scrapy.utils.python unittests 2011-02-10 17:27:40 -02:00
Pablo Hoffman
e9f3724f1c Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-10 17:12:37 -02:00
Pablo Hoffman
ce8137b738 Replace unknown characters in sgml link extractor, to deal more gracefully with encoding errors in the page. Closes #309 2011-02-10 17:12:03 -02:00
Ismael Carnales
bfc6c3809b Add namespace support to xmliter_lxml 2011-02-09 16:20:48 -02:00
Pablo Hoffman
936353d5f1 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-09 11:20:46 -02:00
Pablo Hoffman
181d1c09ae Fixed typo and code indentation in the doc. Closes #307 and #308 2011-02-09 11:19:46 -02:00
Pablo Hoffman
c91f0d9ea1 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-04 13:39:54 -02:00
Pablo Hoffman
c5499ead73 Clarified behaviour when multiple rules match the same link in CrawlSpider 2011-02-04 13:39:12 -02:00
Pablo Hoffman
dde4ccf665 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-04 13:30:37 -02:00
Pablo Hoffman
65fc2fbd1f Set CONCURRENT_SPIDERS=1 in Scrapyd to force one spider per process 2011-02-04 13:30:01 -02:00
Pablo Hoffman
c4fd7174a6 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-01-28 16:24:18 -02:00
Pablo Hoffman
b1c89508f5 fixed wrong changes commited in previous changeset 2011-01-28 16:22:39 -02:00
Pablo Hoffman
4361150494 added missing scrapyd/default_scrapyd.conf file to MANIFEST.in 2011-01-28 16:21:00 -02:00
Pablo Hoffman
632bc27deb added tests for Link object 2011-01-25 19:51:17 -02:00
Shane Evans
c5351d2f48 add __hash__ method to Link object to be compatible with the __eq__ method 2011-01-25 19:23:50 -02:00
Martin Olveyra
32adbea545 handle case when attributes are not separated by space (still recognizable because of quotes) 2011-01-24 18:40:42 -02:00
Pablo Hoffman
18e097d496 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-01-13 13:14:57 -02:00
Pablo Hoffman
09f084c220 simplified scrapy shell code after recent changes. refs #306 2011-01-13 13:11:39 -02:00
Pablo Hoffman
0aac226b42 Fixed bug in Scrapy shell's fetch() which wasn't updating local variables properly. Closes #306 2011-01-13 13:08:11 -02:00
Pablo Hoffman
c87fef9c77 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-01-11 17:24:33 -02:00
LucianU
0c5f605b0e The xmlfeed.tmpl file didn't use the naming convention specific of the XMLFeedSpider. Namely, it used parse_item (which has been deprecated) instead of parse_node and it didn't show the iterator and itertag attributes. 2011-01-11 17:23:46 -02:00