scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 09:24:20 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	5c39d173a5	merged proposed and experimental documentation, as it didn't make sense to keep two separate sections --HG-- rename : docs/proposed/_images/scrapy_architecture.odg => docs/experimental/_images/scrapy_architecture.odg rename : docs/proposed/_images/scrapy_architecture.png => docs/experimental/_images/scrapy_architecture.png rename : docs/proposed/index.rst => docs/experimental/index.rst rename : docs/proposed/newitem-fields.rst => docs/experimental/newitem-fields.rst rename : docs/proposed/newitem.rst => docs/experimental/newitem.rst	2009-07-13 22:15:54 -03:00
Pablo Hoffman	26bb8ef608	doc: improved newitem fields reference	2009-07-13 22:05:48 -03:00
Pablo Hoffman	2bf39b7cdb	minor layout cleanups to newitem doc	2009-07-13 22:05:18 -03:00
Pablo Hoffman	74fcfc2cfc	deprecated old adaptors documentation	2009-07-13 22:03:56 -03:00
Pablo Hoffman	c73ff8198b	newitem fields: dropped support in to_python() for converting from None for default value, improved raising of TypeError instead of ValueError when appropiate, added and adapted unittests	2009-07-13 21:10:29 -03:00
Ismael Carnales	72457c3e4e	better handling of default value in newitem	2009-07-13 17:03:38 -03:00
Ismael Carnales	47d937f36b	only accept unicode strings in text fields	2009-07-13 15:54:48 -03:00
Ismael Carnales	d75afaa161	renamed StringField to TextField	2009-07-13 15:54:46 -03:00
Pablo Hoffman	8634a0d181	more efficient Item implementation and added support for using custom methods (unittests included)	2009-07-13 14:00:41 -03:00
Pablo Hoffman	dff510384b	doc: updated SCHEDULER_MIDDLEWARES_BASE setting	2009-07-13 13:33:47 -03:00
Ismael Carnales	b44409a203	added TimeField to newitem	2009-07-13 10:31:32 -03:00
Pablo Hoffman	5eebe1f405	fixed bug in fetcher caused by recent spider manager changes (thanks andres)	2009-07-13 00:04:00 -03:00
Pablo Hoffman	e3fe0ef297	Some changes to newitem API and implementation: - Dropped support for wildcard importing from newitem package (must now import from newitem.fields and don't use wildcard) - Removed assign() method from Fields as it was apparently redundant (with to_python() method) and I couldn't find any reason for keeping it (neither in the docs nor in the tests) - Moved deiter() method of Field to StringField, as its both its purpose and implementation was specific for strings. if it's really needed as a general purpose method, it could be restored. Also, no unittest was broken because of this change, which sort-of reinforces my point. - Renamed (previously mentioned) StringField.deiter() method to StringField.to_single(), for better consistency with to_python() method - Removed Field class as it was useless without the deiter() functionality (now belonging to StringField class) - Moved ansi_date_re module variable to DateField class attribute - Simplified implementation of DecimalField, FloatField and IntegerField to one line of code (using tests to make sure not to break any functionality) - Renamed ItemMeta class (in models.py) to _ItemMeta to highlight its protected state (should not be externally imported) - Added support for instantiating new items with dicts, to support deserializing items with their repr() string - Added unittests for new functionality introduced	2009-07-11 22:19:56 -03:00
Pablo Hoffman	5054b67a02	improved newitems doc and marked robust scraped items as deprecated	2009-07-11 21:26:52 -03:00
Pablo Hoffman	1a153d47f3	improved invalid xpath exception message in xpath selectors, and added unittests	2009-07-11 17:19:20 -03:00
Pablo Hoffman	fc64360a34	removed unused lines	2009-07-11 16:42:32 -03:00
Pablo Hoffman	e3caf00d7a	simplified implementation of spider manager by removing knowledge of enabled spiders	2009-07-10 16:41:02 -03:00
dgrana	4cd1fa9c32	generate dropin.cache for spiders under tests	2009-07-10 05:29:27 +01:00
Pablo Hoffman	9270810840	improved usage of urljoin_rfc function, adding unittests and encoding where needed	2009-07-09 18:45:40 -03:00
Daniel Grana	d5d2c5c924	update documentation to recent pydispatcher import path change	2009-07-09 17:13:30 -03:00
Daniel Grana	18fbd7c7eb	Automated merge with ssh://hg.scrapy.org/scrapy	2009-07-09 16:58:07 -03:00
Daniel Grana	eff8ea6173	remove response from item_passed and item_dropped signal api	2009-07-09 16:57:03 -03:00
Pablo Hoffman	5da32d9f6d	fixed Sphinx warning	2009-07-09 16:50:13 -03:00
Pablo Hoffman	ae7333d598	added simplejson optional dependency to doc	2009-07-09 16:49:20 -03:00
Daniel Grana	aba16c20c4	Automated merge with ssh://hg.scrapy.org/scrapy	2009-07-09 14:38:56 -03:00
Daniel Grana	a8de5cef6e	remove xlib hack that appends scrapy/xlib to sys.path	2009-07-09 14:37:59 -03:00
Ismael Carnales	32c25f5a36	complete the newitem tests	2009-07-09 13:03:54 -03:00
Ismael Carnales	25b53df191	merge with trunk	2009-07-09 13:02:49 -03:00
Pablo Hoffman	60e7b80798	removed signal docs from core.signals module, to leave them only in once place (the doc)	2009-07-09 12:57:10 -03:00
Ismael Carnales	f31f75c0e2	remove required attribute from newitem (until we add a validation framework)	2009-07-09 12:54:02 -03:00
Ismael Carnales	9e3e41f946	added more newitem documentation in proposed	2009-07-09 11:29:04 -03:00
Pablo Hoffman	b071681cd4	removed duplicated spiders doc (which used autodoc)	2009-07-09 11:14:33 -03:00
Pablo Hoffman	4f19115a80	removed old setting from default_settings.py, updated doc of CONCURRENT_ITEMS setting	2009-07-09 10:56:15 -03:00
Pablo Hoffman	a4b728f2b2	Scraper: added lower limit for responses sizes, removed redundant line	2009-07-09 10:55:30 -03:00
Pablo Hoffman	8b26e49636	Added new ItemProcessor component to Scraper component	2009-07-08 23:48:06 -03:00
Pablo Hoffman	42b86a385f	removed wtf line	2009-07-08 18:19:54 -03:00
pablo	5cbafaea7f	StackTraceDump extension: using USR2 signal to avoid collision with other stuff that uses USR1 (such as twistd log rotation)	2009-07-08 09:19:35 -03:00
Daniel Grana	b83851dcc3	remove unused lines from shell command	2009-07-07 16:24:59 -03:00
Daniel Grana	8e5ede7179	shell command was broken by recent commits because scrapyengine.crawl does not returns a deferred anymore, now we use scrapyengine.schedule that returns the deferred of the download response	2009-07-07 16:22:23 -03:00
damian	1ba98606c2	test.test_utils_url: update parameter name; utils.url: minor code clean up	2009-07-07 12:35:24 -03:00
damian	460f690c5c	utils.url: add_or_replace_parameter function fixed, quoted urls support and test cases added	2009-07-07 11:20:26 -03:00
pablo	c205f7d8e5	added missing comment for non-trivial code	2009-07-06 20:38:39 -03:00
Daniel Grana	a15dc94340	images: images uploaded trough amazon s3 special spider must be scheduled	2009-07-06 16:16:49 -03:00
Daniel Grana	2e52005847	rewrite RequestLimitMiddleware spidermw so it does not consume spider output at once --HG-- rename : scrapy/contrib/spidermiddleware/limit.py => scrapy/contrib/spidermiddleware/requestlimit.py	2009-07-06 15:35:36 -03:00
Pablo Hoffman	31b3d7ce1e	Added flow control mechanism to new Scraper component, to prevent cases where memory fills because of requests being downloaded much faster than they can be processed (by the spiders)	2009-07-06 15:31:50 -03:00
Daniel Grana	4f1d388733	Cleanup scrapyengine.crawl by moving functionality inside a new component named Scraper	2009-07-06 15:31:50 -03:00
Daniel Grana	3cb18dbbbb	Move itempipeline functionality outside of engine as a spidermiddleware	2009-07-06 15:31:50 -03:00
pablo	2ce43ebbec	made downloader/scheduler/spider middlewares code more consistent, added enabled/disabled/loaded informational attributes to all of them	2009-07-06 01:07:45 -03:00
Daniel Grana	f467c233b2	downloader: process queue inmediately after downloading the response	2009-07-03 01:32:24 -03:00
Pablo Hoffman	0c4c153819	improved Scrapy documentation index for better usability	2009-07-01 09:51:57 -03:00

1 2 3 4 5 ...

1283 Commits