Pablo Hoffman
5c39d173a5
merged proposed and experimental documentation, as it didn't make sense to keep two separate sections
...
--HG--
rename : docs/proposed/_images/scrapy_architecture.odg => docs/experimental/_images/scrapy_architecture.odg
rename : docs/proposed/_images/scrapy_architecture.png => docs/experimental/_images/scrapy_architecture.png
rename : docs/proposed/index.rst => docs/experimental/index.rst
rename : docs/proposed/newitem-fields.rst => docs/experimental/newitem-fields.rst
rename : docs/proposed/newitem.rst => docs/experimental/newitem.rst
2009-07-13 22:15:54 -03:00
Pablo Hoffman
26bb8ef608
doc: improved newitem fields reference
2009-07-13 22:05:48 -03:00
Pablo Hoffman
2bf39b7cdb
minor layout cleanups to newitem doc
2009-07-13 22:05:18 -03:00
Pablo Hoffman
74fcfc2cfc
deprecated old adaptors documentation
2009-07-13 22:03:56 -03:00
Pablo Hoffman
c73ff8198b
newitem fields: dropped support in to_python() for converting from None for default value, improved raising of TypeError instead of ValueError when appropiate, added and adapted unittests
2009-07-13 21:10:29 -03:00
Ismael Carnales
72457c3e4e
better handling of default value in newitem
2009-07-13 17:03:38 -03:00
Ismael Carnales
47d937f36b
only accept unicode strings in text fields
2009-07-13 15:54:48 -03:00
Ismael Carnales
d75afaa161
renamed StringField to TextField
2009-07-13 15:54:46 -03:00
Pablo Hoffman
8634a0d181
more efficient Item implementation and added support for using custom methods (unittests included)
2009-07-13 14:00:41 -03:00
Pablo Hoffman
dff510384b
doc: updated SCHEDULER_MIDDLEWARES_BASE setting
2009-07-13 13:33:47 -03:00
Ismael Carnales
b44409a203
added TimeField to newitem
2009-07-13 10:31:32 -03:00
Pablo Hoffman
5eebe1f405
fixed bug in fetcher caused by recent spider manager changes (thanks andres)
2009-07-13 00:04:00 -03:00
Pablo Hoffman
e3fe0ef297
Some changes to newitem API and implementation:
...
- Dropped support for wildcard importing from newitem package (must now import
from newitem.fields and don't use wildcard)
- Removed assign() method from Fields as it was apparently redundant (with
to_python() method) and I couldn't find any reason for keeping it (neither in
the docs nor in the tests)
- Moved deiter() method of Field to StringField, as its both its purpose and
implementation was specific for strings. if it's really needed as a general
purpose method, it could be restored. Also, no unittest was broken because of
this change, which sort-of reinforces my point.
- Renamed (previously mentioned) StringField.deiter() method to
StringField.to_single(), for better consistency with to_python() method
- Removed Field class as it was useless without the deiter() functionality (now
belonging to StringField class)
- Moved ansi_date_re module variable to DateField class attribute
- Simplified implementation of DecimalField, FloatField and IntegerField to one
line of code (using tests to make sure not to break any functionality)
- Renamed ItemMeta class (in models.py) to _ItemMeta to highlight its protected
state (should not be externally imported)
- Added support for instantiating new items with dicts, to support
deserializing items with their repr() string
- Added unittests for new functionality introduced
2009-07-11 22:19:56 -03:00
Pablo Hoffman
5054b67a02
improved newitems doc and marked robust scraped items as deprecated
2009-07-11 21:26:52 -03:00
Pablo Hoffman
1a153d47f3
improved invalid xpath exception message in xpath selectors, and added unittests
2009-07-11 17:19:20 -03:00
Pablo Hoffman
fc64360a34
removed unused lines
2009-07-11 16:42:32 -03:00
Pablo Hoffman
e3caf00d7a
simplified implementation of spider manager by removing knowledge of enabled spiders
2009-07-10 16:41:02 -03:00
dgrana
4cd1fa9c32
generate dropin.cache for spiders under tests
2009-07-10 05:29:27 +01:00
Pablo Hoffman
9270810840
improved usage of urljoin_rfc function, adding unittests and encoding where needed
2009-07-09 18:45:40 -03:00
Daniel Grana
d5d2c5c924
update documentation to recent pydispatcher import path change
2009-07-09 17:13:30 -03:00
Daniel Grana
18fbd7c7eb
Automated merge with ssh://hg.scrapy.org/scrapy
2009-07-09 16:58:07 -03:00
Daniel Grana
eff8ea6173
remove response from item_passed and item_dropped signal api
2009-07-09 16:57:03 -03:00
Pablo Hoffman
5da32d9f6d
fixed Sphinx warning
2009-07-09 16:50:13 -03:00
Pablo Hoffman
ae7333d598
added simplejson optional dependency to doc
2009-07-09 16:49:20 -03:00
Daniel Grana
aba16c20c4
Automated merge with ssh://hg.scrapy.org/scrapy
2009-07-09 14:38:56 -03:00
Daniel Grana
a8de5cef6e
remove xlib hack that appends scrapy/xlib to sys.path
2009-07-09 14:37:59 -03:00
Ismael Carnales
32c25f5a36
complete the newitem tests
2009-07-09 13:03:54 -03:00
Ismael Carnales
25b53df191
merge with trunk
2009-07-09 13:02:49 -03:00
Pablo Hoffman
60e7b80798
removed signal docs from core.signals module, to leave them only in once place (the doc)
2009-07-09 12:57:10 -03:00
Ismael Carnales
f31f75c0e2
remove required attribute from newitem (until we add a validation framework)
2009-07-09 12:54:02 -03:00
Ismael Carnales
9e3e41f946
added more newitem documentation in proposed
2009-07-09 11:29:04 -03:00
Pablo Hoffman
b071681cd4
removed duplicated spiders doc (which used autodoc)
2009-07-09 11:14:33 -03:00
Pablo Hoffman
4f19115a80
removed old setting from default_settings.py, updated doc of CONCURRENT_ITEMS setting
2009-07-09 10:56:15 -03:00
Pablo Hoffman
a4b728f2b2
Scraper: added lower limit for responses sizes, removed redundant line
2009-07-09 10:55:30 -03:00
Pablo Hoffman
8b26e49636
Added new ItemProcessor component to Scraper component
2009-07-08 23:48:06 -03:00
Pablo Hoffman
42b86a385f
removed wtf line
2009-07-08 18:19:54 -03:00
pablo
5cbafaea7f
StackTraceDump extension: using USR2 signal to avoid collision with other stuff that uses USR1 (such as twistd log rotation)
2009-07-08 09:19:35 -03:00
Daniel Grana
b83851dcc3
remove unused lines from shell command
2009-07-07 16:24:59 -03:00
Daniel Grana
8e5ede7179
shell command was broken by recent commits because scrapyengine.crawl does not returns a deferred anymore, now we use scrapyengine.schedule that returns the deferred of the download response
2009-07-07 16:22:23 -03:00
damian
1ba98606c2
test.test_utils_url: update parameter name; utils.url: minor code clean up
2009-07-07 12:35:24 -03:00
damian
460f690c5c
utils.url: add_or_replace_parameter function fixed, quoted urls support and test cases added
2009-07-07 11:20:26 -03:00
pablo
c205f7d8e5
added missing comment for non-trivial code
2009-07-06 20:38:39 -03:00
Daniel Grana
a15dc94340
images: images uploaded trough amazon s3 special spider must be scheduled
2009-07-06 16:16:49 -03:00
Daniel Grana
2e52005847
rewrite RequestLimitMiddleware spidermw so it does not consume spider output at once
...
--HG--
rename : scrapy/contrib/spidermiddleware/limit.py => scrapy/contrib/spidermiddleware/requestlimit.py
2009-07-06 15:35:36 -03:00
Pablo Hoffman
31b3d7ce1e
Added flow control mechanism to new Scraper component, to prevent cases where memory fills because of requests being downloaded much faster than they can be processed (by the spiders)
2009-07-06 15:31:50 -03:00
Daniel Grana
4f1d388733
Cleanup scrapyengine.crawl by moving functionality inside a new component named Scraper
2009-07-06 15:31:50 -03:00
Daniel Grana
3cb18dbbbb
Move itempipeline functionality outside of engine as a spidermiddleware
2009-07-06 15:31:50 -03:00
pablo
2ce43ebbec
made downloader/scheduler/spider middlewares code more consistent, added enabled/disabled/loaded informational attributes to all of them
2009-07-06 01:07:45 -03:00
Daniel Grana
f467c233b2
downloader: process queue inmediately after downloading the response
2009-07-03 01:32:24 -03:00
Pablo Hoffman
0c4c153819
improved Scrapy documentation index for better usability
2009-07-01 09:51:57 -03:00