1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-03-04 21:37:11 +00:00

3042 Commits

Author SHA1 Message Date
Pablo Hoffman
bb2b67c862 updated tutorial to use 'dmoz' as the name of the spider instead of 'dmoz.org', so that it's more similar to the dirbot example project 2011-04-28 09:31:57 -03:00
Pablo Hoffman
bf73002428 removed googledir example, replaced by dirbot project on github. updated docs accordingly 2011-04-28 02:28:39 -03:00
Pablo Hoffman
b12dd76bb8 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-04-25 09:31:18 -03:00
Pablo Hoffman
678f08bc1b added warning about using 'parse' as callback in crawl spider rules 2011-04-25 09:30:42 -03:00
Pablo Hoffman
18d303b5f1 ported internal scrapy.utils imports to w3lib 2011-04-19 01:33:52 -03:00
Pablo Hoffman
fcc8d73840 Removed scrapy.contrib.ibl module (and submodules). They have been moved to a new library "scrapely". See https://github.com/scrapy/scrapely 2011-04-19 01:04:22 -03:00
Pablo Hoffman
ebcbb9f453 debian: added python-w3lib package to dependencies 2011-04-19 00:55:08 -03:00
Pablo Hoffman
b10f4fae35 Moved several functions from scrapy.utils.{http,markup,multipart,response,url} (and their tests) to a new library called 'w3lib'. Scrapy will now depend on w3lib. 2011-04-18 22:37:19 -03:00
Pablo Hoffman
ad496eb3b6 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-04-14 12:36:27 -03:00
Pablo Hoffman
ecb4f44cbc Added clarification on how to work with local settings and scrapy deploy 2011-04-14 12:36:09 -03:00
Pablo Hoffman
6f262a198c Added IOError to the list of exceptions to retry in the RetryMiddleware 2011-04-12 18:12:36 -03:00
Pablo Hoffman
7c49e8679c fixed typo 2011-04-07 02:04:42 -03:00
Pablo Hoffman
3ee2c94e93 Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it 2011-04-06 14:54:48 -03:00
Pablo Hoffman
8a5c08a6bc added join_multivalued parameter to CsvItemExporter 2011-03-24 13:15:52 -03:00
Pablo Hoffman
84dee1f77f removed unused function 2011-03-24 09:03:57 -03:00
Pablo Hoffman
3954e600ca added DBM storage backend for HTTP cache 2011-03-23 21:32:02 -03:00
Pablo Hoffman
60f6a9b054 moved scrapyd python module to scrapy debian package. left scrapyd package only for installing service (upstart script) and scrapy user 2011-03-23 15:45:40 -03:00
Shane Evans
407f7f2d65 fix minor error in IBL tests and post-processing nested annoations 2011-03-11 21:58:47 +00:00
Shane Evans
bc6eee71e3 refactor IBL extraction to allow processing parsed data
--HG--
extra : rebase_source : 1a0ec4322702288f6e996d384d45a36deede3868
2011-03-11 20:02:29 +00:00
Pablo Hoffman
9591413d9d added Jochen Maes to AUTHORS 2011-03-09 14:24:17 -02:00
Jochen Maes
47a7f154ab Add listjobs.json to Scrapyd API
You can use listjobs.json with project=<projectname> to get a list of projects that are running currently.
It returns a list of jobs with spidername and job-id.

Signed-off-by: Jochen Maes <jochen.maes@sejo.be>
---
 scrapyd/webservice.py |    9 +++++++++
 scrapyd/website.py    |    1 +
 2 files changed, 10 insertions(+), 0 deletions(-)
2011-03-09 14:22:10 -02:00
Pablo Hoffman
99033d91c3 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-03-09 12:41:26 -02:00
Pablo Hoffman
36431a1439 Silenced confusing sqlite3.ProgrammingError exception. For more info see: http://twistedmatrix.com/trac/ticket/4040 2011-03-09 12:39:24 -02:00
Shane Evans
5cae22b665 add nofollow to Link object 2011-02-25 12:55:51 -02:00
Pablo Hoffman
cfd11df539 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-24 15:28:57 -02:00
Pablo Hoffman
8f7e163b04 Fixed wrong method name in downloader middleware documentation 2011-02-24 15:26:32 -02:00
Shane Evans
32fa2add75 style fix to ibl contrib 2011-02-24 14:21:23 -02:00
Pablo Hoffman
fe9febe2b1 added --build-only option to deploy command, to build the egg without deploying it 2011-02-23 18:10:16 -02:00
Shane Evans
af4db2767c Automated merge with ssh://hg.scrapy.org/scrapy-0.12 2011-02-16 18:31:49 -02:00
Shane Evans
74413ff989 prevent incorrect assertion error possible when spiders are closing 2011-02-16 18:30:50 -02:00
Daniel Grana
c55355642c fix FAQ typos reported by marlun_ at #scrapy IRC channel 2011-02-16 08:57:42 -02:00
Shane Evans
a1c3fa5dd8 small refactor of image extraction 2011-02-15 15:42:10 -02:00
Pablo Hoffman
1fb55bdaf0 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-15 07:25:12 -02:00
Pablo Hoffman
16d9a33951 added FAQ entry about working with big data feeds 2011-02-15 07:24:52 -02:00
Ismael Carnales
9b07b0ab0a Fix xmliter_lxml 2011-02-11 11:41:44 -02:00
Pablo Hoffman
874bfa0284 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-10 17:41:13 -02:00
Pablo Hoffman
3dc677725e Fixed scrapy.utils.python unittests 2011-02-10 17:27:40 -02:00
Pablo Hoffman
e9f3724f1c Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-10 17:12:37 -02:00
Pablo Hoffman
ce8137b738 Replace unknown characters in sgml link extractor, to deal more gracefully with encoding errors in the page. Closes #309 2011-02-10 17:12:03 -02:00
Ismael Carnales
bfc6c3809b Add namespace support to xmliter_lxml 2011-02-09 16:20:48 -02:00
Pablo Hoffman
936353d5f1 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-09 11:20:46 -02:00
Pablo Hoffman
181d1c09ae Fixed typo and code indentation in the doc. Closes #307 and #308 2011-02-09 11:19:46 -02:00
Pablo Hoffman
c91f0d9ea1 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-04 13:39:54 -02:00
Pablo Hoffman
c5499ead73 Clarified behaviour when multiple rules match the same link in CrawlSpider 2011-02-04 13:39:12 -02:00
Pablo Hoffman
dde4ccf665 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-02-04 13:30:37 -02:00
Pablo Hoffman
65fc2fbd1f Set CONCURRENT_SPIDERS=1 in Scrapyd to force one spider per process 2011-02-04 13:30:01 -02:00
Pablo Hoffman
c4fd7174a6 Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-01-28 16:24:18 -02:00
Pablo Hoffman
b1c89508f5 fixed wrong changes commited in previous changeset 2011-01-28 16:22:39 -02:00
Pablo Hoffman
4361150494 added missing scrapyd/default_scrapyd.conf file to MANIFEST.in 2011-01-28 16:21:00 -02:00
Pablo Hoffman
632bc27deb added tests for Link object 2011-01-25 19:51:17 -02:00