1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 09:24:20 +00:00

1283 Commits

Author SHA1 Message Date
Pablo Hoffman
5c39d173a5 merged proposed and experimental documentation, as it didn't make sense to keep two separate sections
--HG--
rename : docs/proposed/_images/scrapy_architecture.odg => docs/experimental/_images/scrapy_architecture.odg
rename : docs/proposed/_images/scrapy_architecture.png => docs/experimental/_images/scrapy_architecture.png
rename : docs/proposed/index.rst => docs/experimental/index.rst
rename : docs/proposed/newitem-fields.rst => docs/experimental/newitem-fields.rst
rename : docs/proposed/newitem.rst => docs/experimental/newitem.rst
2009-07-13 22:15:54 -03:00
Pablo Hoffman
26bb8ef608 doc: improved newitem fields reference 2009-07-13 22:05:48 -03:00
Pablo Hoffman
2bf39b7cdb minor layout cleanups to newitem doc 2009-07-13 22:05:18 -03:00
Pablo Hoffman
74fcfc2cfc deprecated old adaptors documentation 2009-07-13 22:03:56 -03:00
Pablo Hoffman
c73ff8198b newitem fields: dropped support in to_python() for converting from None for default value, improved raising of TypeError instead of ValueError when appropiate, added and adapted unittests 2009-07-13 21:10:29 -03:00
Ismael Carnales
72457c3e4e better handling of default value in newitem 2009-07-13 17:03:38 -03:00
Ismael Carnales
47d937f36b only accept unicode strings in text fields 2009-07-13 15:54:48 -03:00
Ismael Carnales
d75afaa161 renamed StringField to TextField 2009-07-13 15:54:46 -03:00
Pablo Hoffman
8634a0d181 more efficient Item implementation and added support for using custom methods (unittests included) 2009-07-13 14:00:41 -03:00
Pablo Hoffman
dff510384b doc: updated SCHEDULER_MIDDLEWARES_BASE setting 2009-07-13 13:33:47 -03:00
Ismael Carnales
b44409a203 added TimeField to newitem 2009-07-13 10:31:32 -03:00
Pablo Hoffman
5eebe1f405 fixed bug in fetcher caused by recent spider manager changes (thanks andres) 2009-07-13 00:04:00 -03:00
Pablo Hoffman
e3fe0ef297 Some changes to newitem API and implementation:
- Dropped support for wildcard importing from newitem package (must now import
  from newitem.fields and don't use wildcard)
- Removed assign() method from Fields as it was apparently redundant (with
  to_python() method) and I couldn't find any reason for keeping it (neither in
  the docs nor in the tests)
- Moved deiter() method of Field to StringField, as its both its purpose and
  implementation was specific for strings. if it's really needed as a general
  purpose method, it could be restored. Also, no unittest was broken because of
  this change, which sort-of reinforces my point.
- Renamed (previously mentioned) StringField.deiter() method to
  StringField.to_single(), for better consistency with to_python() method
- Removed Field class as it was useless without the deiter() functionality (now
  belonging to StringField class)
- Moved ansi_date_re module variable to DateField class attribute
- Simplified implementation of DecimalField, FloatField and IntegerField to one
  line of code (using tests to make sure not to break any functionality)
- Renamed ItemMeta class (in models.py) to _ItemMeta to highlight its protected
  state (should not be externally imported)
- Added support for instantiating new items with dicts, to support
  deserializing items with their repr() string
- Added unittests for new functionality introduced
2009-07-11 22:19:56 -03:00
Pablo Hoffman
5054b67a02 improved newitems doc and marked robust scraped items as deprecated 2009-07-11 21:26:52 -03:00
Pablo Hoffman
1a153d47f3 improved invalid xpath exception message in xpath selectors, and added unittests 2009-07-11 17:19:20 -03:00
Pablo Hoffman
fc64360a34 removed unused lines 2009-07-11 16:42:32 -03:00
Pablo Hoffman
e3caf00d7a simplified implementation of spider manager by removing knowledge of enabled spiders 2009-07-10 16:41:02 -03:00
dgrana
4cd1fa9c32 generate dropin.cache for spiders under tests 2009-07-10 05:29:27 +01:00
Pablo Hoffman
9270810840 improved usage of urljoin_rfc function, adding unittests and encoding where needed 2009-07-09 18:45:40 -03:00
Daniel Grana
d5d2c5c924 update documentation to recent pydispatcher import path change 2009-07-09 17:13:30 -03:00
Daniel Grana
18fbd7c7eb Automated merge with ssh://hg.scrapy.org/scrapy 2009-07-09 16:58:07 -03:00
Daniel Grana
eff8ea6173 remove response from item_passed and item_dropped signal api 2009-07-09 16:57:03 -03:00
Pablo Hoffman
5da32d9f6d fixed Sphinx warning 2009-07-09 16:50:13 -03:00
Pablo Hoffman
ae7333d598 added simplejson optional dependency to doc 2009-07-09 16:49:20 -03:00
Daniel Grana
aba16c20c4 Automated merge with ssh://hg.scrapy.org/scrapy 2009-07-09 14:38:56 -03:00
Daniel Grana
a8de5cef6e remove xlib hack that appends scrapy/xlib to sys.path 2009-07-09 14:37:59 -03:00
Ismael Carnales
32c25f5a36 complete the newitem tests 2009-07-09 13:03:54 -03:00
Ismael Carnales
25b53df191 merge with trunk 2009-07-09 13:02:49 -03:00
Pablo Hoffman
60e7b80798 removed signal docs from core.signals module, to leave them only in once place (the doc) 2009-07-09 12:57:10 -03:00
Ismael Carnales
f31f75c0e2 remove required attribute from newitem (until we add a validation framework) 2009-07-09 12:54:02 -03:00
Ismael Carnales
9e3e41f946 added more newitem documentation in proposed 2009-07-09 11:29:04 -03:00
Pablo Hoffman
b071681cd4 removed duplicated spiders doc (which used autodoc) 2009-07-09 11:14:33 -03:00
Pablo Hoffman
4f19115a80 removed old setting from default_settings.py, updated doc of CONCURRENT_ITEMS setting 2009-07-09 10:56:15 -03:00
Pablo Hoffman
a4b728f2b2 Scraper: added lower limit for responses sizes, removed redundant line 2009-07-09 10:55:30 -03:00
Pablo Hoffman
8b26e49636 Added new ItemProcessor component to Scraper component 2009-07-08 23:48:06 -03:00
Pablo Hoffman
42b86a385f removed wtf line 2009-07-08 18:19:54 -03:00
pablo
5cbafaea7f StackTraceDump extension: using USR2 signal to avoid collision with other stuff that uses USR1 (such as twistd log rotation) 2009-07-08 09:19:35 -03:00
Daniel Grana
b83851dcc3 remove unused lines from shell command 2009-07-07 16:24:59 -03:00
Daniel Grana
8e5ede7179 shell command was broken by recent commits because scrapyengine.crawl does not returns a deferred anymore, now we use scrapyengine.schedule that returns the deferred of the download response 2009-07-07 16:22:23 -03:00
damian
1ba98606c2 test.test_utils_url: update parameter name; utils.url: minor code clean up 2009-07-07 12:35:24 -03:00
damian
460f690c5c utils.url: add_or_replace_parameter function fixed, quoted urls support and test cases added 2009-07-07 11:20:26 -03:00
pablo
c205f7d8e5 added missing comment for non-trivial code 2009-07-06 20:38:39 -03:00
Daniel Grana
a15dc94340 images: images uploaded trough amazon s3 special spider must be scheduled 2009-07-06 16:16:49 -03:00
Daniel Grana
2e52005847 rewrite RequestLimitMiddleware spidermw so it does not consume spider output at once
--HG--
rename : scrapy/contrib/spidermiddleware/limit.py => scrapy/contrib/spidermiddleware/requestlimit.py
2009-07-06 15:35:36 -03:00
Pablo Hoffman
31b3d7ce1e Added flow control mechanism to new Scraper component, to prevent cases where memory fills because of requests being downloaded much faster than they can be processed (by the spiders) 2009-07-06 15:31:50 -03:00
Daniel Grana
4f1d388733 Cleanup scrapyengine.crawl by moving functionality inside a new component named Scraper 2009-07-06 15:31:50 -03:00
Daniel Grana
3cb18dbbbb Move itempipeline functionality outside of engine as a spidermiddleware 2009-07-06 15:31:50 -03:00
pablo
2ce43ebbec made downloader/scheduler/spider middlewares code more consistent, added enabled/disabled/loaded informational attributes to all of them 2009-07-06 01:07:45 -03:00
Daniel Grana
f467c233b2 downloader: process queue inmediately after downloading the response 2009-07-03 01:32:24 -03:00
Pablo Hoffman
0c4c153819 improved Scrapy documentation index for better usability 2009-07-01 09:51:57 -03:00