scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 11:44:11 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	b10f4fae35	Moved several functions from scrapy.utils.{http,markup,multipart,response,url} (and their tests) to a new library called 'w3lib'. Scrapy will now depend on w3lib.	2011-04-18 22:37:19 -03:00
Pablo Hoffman	ad496eb3b6	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-04-14 12:36:27 -03:00
Pablo Hoffman	ecb4f44cbc	Added clarification on how to work with local settings and scrapy deploy	2011-04-14 12:36:09 -03:00
Pablo Hoffman	6f262a198c	Added IOError to the list of exceptions to retry in the RetryMiddleware	2011-04-12 18:12:36 -03:00
Pablo Hoffman	7c49e8679c	fixed typo	2011-04-07 02:04:42 -03:00
Pablo Hoffman	3ee2c94e93	Improved cookies middleware by making COOKIES_DEBUG nicer and documenting it	2011-04-06 14:54:48 -03:00
Pablo Hoffman	8a5c08a6bc	added join_multivalued parameter to CsvItemExporter	2011-03-24 13:15:52 -03:00
Pablo Hoffman	84dee1f77f	removed unused function	2011-03-24 09:03:57 -03:00
Pablo Hoffman	3954e600ca	added DBM storage backend for HTTP cache	2011-03-23 21:32:02 -03:00
Pablo Hoffman	60f6a9b054	moved scrapyd python module to scrapy debian package. left scrapyd package only for installing service (upstart script) and scrapy user	2011-03-23 15:45:40 -03:00
Shane Evans	407f7f2d65	fix minor error in IBL tests and post-processing nested annoations	2011-03-11 21:58:47 +00:00
Shane Evans	bc6eee71e3	refactor IBL extraction to allow processing parsed data --HG-- extra : rebase_source : 1a0ec4322702288f6e996d384d45a36deede3868	2011-03-11 20:02:29 +00:00
Pablo Hoffman	9591413d9d	added Jochen Maes to AUTHORS	2011-03-09 14:24:17 -02:00
Jochen Maes	47a7f154ab	Add listjobs.json to Scrapyd API You can use listjobs.json with project=<projectname> to get a list of projects that are running currently. It returns a list of jobs with spidername and job-id. Signed-off-by: Jochen Maes <jochen.maes@sejo.be> --- scrapyd/webservice.py \| 9 +++++++++ scrapyd/website.py \| 1 + 2 files changed, 10 insertions(+), 0 deletions(-)	2011-03-09 14:22:10 -02:00
Pablo Hoffman	99033d91c3	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-03-09 12:41:26 -02:00
Pablo Hoffman	36431a1439	Silenced confusing sqlite3.ProgrammingError exception. For more info see: http://twistedmatrix.com/trac/ticket/4040	2011-03-09 12:39:24 -02:00
Shane Evans	5cae22b665	add nofollow to Link object	2011-02-25 12:55:51 -02:00
Pablo Hoffman	cfd11df539	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-24 15:28:57 -02:00
Pablo Hoffman	8f7e163b04	Fixed wrong method name in downloader middleware documentation	2011-02-24 15:26:32 -02:00
Shane Evans	32fa2add75	style fix to ibl contrib	2011-02-24 14:21:23 -02:00
Pablo Hoffman	fe9febe2b1	added --build-only option to deploy command, to build the egg without deploying it	2011-02-23 18:10:16 -02:00
Shane Evans	af4db2767c	Automated merge with ssh://hg.scrapy.org/scrapy-0.12	2011-02-16 18:31:49 -02:00
Shane Evans	74413ff989	prevent incorrect assertion error possible when spiders are closing	2011-02-16 18:30:50 -02:00
Daniel Grana	c55355642c	fix FAQ typos reported by marlun_ at #scrapy IRC channel	2011-02-16 08:57:42 -02:00
Shane Evans	a1c3fa5dd8	small refactor of image extraction	2011-02-15 15:42:10 -02:00
Pablo Hoffman	1fb55bdaf0	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-15 07:25:12 -02:00
Pablo Hoffman	16d9a33951	added FAQ entry about working with big data feeds	2011-02-15 07:24:52 -02:00
Ismael Carnales	9b07b0ab0a	Fix xmliter_lxml	2011-02-11 11:41:44 -02:00
Pablo Hoffman	874bfa0284	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-10 17:41:13 -02:00
Pablo Hoffman	3dc677725e	Fixed scrapy.utils.python unittests	2011-02-10 17:27:40 -02:00
Pablo Hoffman	e9f3724f1c	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-10 17:12:37 -02:00
Pablo Hoffman	ce8137b738	Replace unknown characters in sgml link extractor, to deal more gracefully with encoding errors in the page. Closes #309	2011-02-10 17:12:03 -02:00
Ismael Carnales	bfc6c3809b	Add namespace support to xmliter_lxml	2011-02-09 16:20:48 -02:00
Pablo Hoffman	936353d5f1	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-09 11:20:46 -02:00
Pablo Hoffman	181d1c09ae	Fixed typo and code indentation in the doc. Closes #307 and #308	2011-02-09 11:19:46 -02:00
Pablo Hoffman	c91f0d9ea1	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-04 13:39:54 -02:00
Pablo Hoffman	c5499ead73	Clarified behaviour when multiple rules match the same link in CrawlSpider	2011-02-04 13:39:12 -02:00
Pablo Hoffman	dde4ccf665	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-02-04 13:30:37 -02:00
Pablo Hoffman	65fc2fbd1f	Set CONCURRENT_SPIDERS=1 in Scrapyd to force one spider per process	2011-02-04 13:30:01 -02:00
Pablo Hoffman	c4fd7174a6	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-01-28 16:24:18 -02:00
Pablo Hoffman	b1c89508f5	fixed wrong changes commited in previous changeset	2011-01-28 16:22:39 -02:00
Pablo Hoffman	4361150494	added missing scrapyd/default_scrapyd.conf file to MANIFEST.in	2011-01-28 16:21:00 -02:00
Pablo Hoffman	632bc27deb	added tests for Link object	2011-01-25 19:51:17 -02:00
Shane Evans	c5351d2f48	add __hash__ method to Link object to be compatible with the __eq__ method	2011-01-25 19:23:50 -02:00
Martin Olveyra	32adbea545	handle case when attributes are not separated by space (still recognizable because of quotes)	2011-01-24 18:40:42 -02:00
Pablo Hoffman	18e097d496	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-01-13 13:14:57 -02:00
Pablo Hoffman	09f084c220	simplified scrapy shell code after recent changes. refs #306	2011-01-13 13:11:39 -02:00
Pablo Hoffman	0aac226b42	Fixed bug in Scrapy shell's fetch() which wasn't updating local variables properly. Closes #306	2011-01-13 13:08:11 -02:00
Pablo Hoffman	c87fef9c77	Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12	2011-01-11 17:24:33 -02:00
LucianU	0c5f605b0e	The xmlfeed.tmpl file didn't use the naming convention specific of the XMLFeedSpider. Namely, it used parse_item (which has been deprecated) instead of parse_node and it didn't show the iterator and itertag attributes.	2011-01-11 17:23:46 -02:00

... 2 3 4 5 6 ...

2735 Commits