Pablo Hoffman
a23ff37050
ItemLoader: added one more test and improved other test names
2009-08-05 11:38:01 -03:00
Pablo Hoffman
7bc7af0162
ItemLoader: some more code cleanups, and added many more tests
2009-08-05 00:41:02 -03:00
Pablo Hoffman
9081e84e27
ItemLoader: sorted out module locations, and added more tests
...
--HG--
rename : scrapy/newitem/reducers.py => scrapy/newitem/loader/reducers.py
2009-08-04 20:02:49 -03:00
Pablo Hoffman
ac9f4c9cc2
Refatored scrapy.newitem package:
...
- left only one type of Field - just a dict wrapper to contain field metadata
- removed Item Builder and tests
- adapted Item Loader to work with new Field class
2009-08-04 19:26:31 -03:00
Pablo Hoffman
114dba2850
some minor simplifications to tree_expander() function
2009-08-04 09:10:21 -03:00
Pablo Hoffman
8d705ec302
added ItemLoader class, an alternative implementation of ItemBuilder with a slightly different API
2009-08-03 22:53:08 -03:00
Ismael Carnales
f05695d75e
added first implementation of ItemBuilder
2009-08-03 17:27:57 -03:00
Ismael Carnales
32894643a0
added ListField documentation, ordered field reference alphabetically
2009-08-03 15:00:04 -03:00
Ismael Carnales
cf638e682c
made ListField init with an instance (not a class) of Field
2009-08-03 15:00:02 -03:00
Ismael Carnales
d8b85ae7ad
moved MultiValuedField to ListField
2009-08-02 18:43:35 -03:00
Pablo Hoffman
fcb33c7988
removed obsolete pipeline
2009-07-31 17:24:30 -03:00
Pablo Hoffman
7c049d2ef5
use standard 'mcs' for first argument of meta class __new__ method
2009-07-31 16:51:29 -03:00
Pablo Hoffman
02c454c26e
newitem: added warning when trying to access item field value via getattr instead of getitem
2009-07-31 16:49:27 -03:00
Pablo Hoffman
c3427e075c
added domain_stats parameter to stats_domain_closed signal
2009-07-31 16:36:35 -03:00
Pablo Hoffman
73172b244d
added from_unicode_list() method to Field objects
2009-07-30 16:58:24 -03:00
Pablo Hoffman
c0dcd76424
moved unused scrapy.core.scheduler.store module to scrapy.contrib_exp.history
...
--HG--
rename : scrapy/core/scheduler/store.py => scrapy/contrib_exp/history/memorystore.py
rename : scrapy/contrib_exp/history/store.py => scrapy/contrib_exp/history/sqlstore.py
2009-07-30 13:51:43 -03:00
Daniel Grana
8ecd16b5e3
remove obsolete monkey patch for twisted 2.5
2009-07-29 21:02:15 -03:00
Pablo Hoffman
b336de7302
fixed minor bugs with spiderctl webconsole extension
2009-07-29 19:50:38 -03:00
Pablo Hoffman
4a1fc74d69
fixed spiderstats webconsole extension
2009-07-29 19:26:12 -03:00
Pablo Hoffman
01b79e386f
fixed StatsCollector webconsole extension
2009-07-29 19:15:19 -03:00
Pablo Hoffman
a86b12a879
moved patches.py to xlib/patches.py to avoid import errors
...
--HG--
rename : scrapy/patches.py => scrapy/xlib/patches.py
2009-07-29 19:01:36 -03:00
Pablo Hoffman
8ec7c9e01c
WEBCONSOLE_PORT setting now defaults to 6080
2009-07-29 18:59:34 -03:00
Pablo Hoffman
fde4f219a8
don't use packages when modules are enough
...
--HG--
rename : scrapy/extension/__init__.py => scrapy/extension.py
rename : scrapy/fetcher/__init__.py => scrapy/fetcher.py
rename : scrapy/log/__init__.py => scrapy/log.py
rename : scrapy/mail/__init__.py => scrapy/mail.py
rename : scrapy/patches/monkeypatches.py => scrapy/patches.py
2009-07-29 18:41:15 -03:00
Pablo Hoffman
230bcef7b6
some refactor of settings manager (without changing API): don't fail if no settings module is found (fail on scrapy.cmdline instead). also, some code improvements for clarity.
2009-07-29 18:26:29 -03:00
Pablo Hoffman
d57c0100db
another minor fix to exporters
2009-07-29 08:23:38 -03:00
Pablo Hoffman
c3d732f28a
another bug fix in pprint item exporter
2009-07-28 23:00:35 -03:00
Pablo Hoffman
643f93721b
removed wrong self from base constructor calls
2009-07-28 20:53:03 -03:00
Pablo Hoffman
bb7b6815c0
added Item Exporters
2009-07-28 20:20:44 -03:00
Pablo Hoffman
75cf903e24
adapted project template to use the new Link Extractors location
2009-07-28 12:27:25 -03:00
Pablo Hoffman
64d9155572
CloseDomain extension: fixed bug on domain close when not using CLOSEDOMAIN_TIMEOUT
2009-07-28 12:23:13 -03:00
Pablo Hoffman
fcc91901eb
finished cleaning up closedomain documentation, and updated default settings
2009-07-27 15:42:35 -03:00
Pablo Hoffman
09ba6927d7
Some changes to CloseDomain extension:
...
- added support for closing by item passed count (CLOSEDOMAIN_ITEMPASSED)
- removed support for sending notification emails (since that's the job of
another extension)
2009-07-27 15:23:50 -03:00
Pablo Hoffman
9da66698f3
moved httprepr() method (from Request and Response objects) to scrapy.utils functions
2009-07-25 18:56:12 -03:00
Daniel Grana
bed3c38014
fix delayedclosedomain extension bug due to changing lastseen from datetime to time.time
2009-07-25 15:22:15 -03:00
Daniel Grana
fdcdc307da
ignore docs/build
2009-07-25 15:21:22 -03:00
Pablo Hoffman
c615ace6bd
doc: updated google directory links in firebug guide
2009-07-24 13:14:36 -03:00
Daniel Grana
96c3cdbec2
improve OffsiteMiddleware reference docs
...
--HG--
extra : rebase_source : 3ed3f23fc1ec63b521ead029c5749898f3ab05d7
2009-07-24 12:59:38 -03:00
Pablo Hoffman
d21a22eab5
fixed stats collector bug which wasn't throwing the stats_domain_closing signal (on subclasses) before the persisting stage
2009-07-23 13:03:25 -03:00
Pablo Hoffman
38c3f7d0b4
Some changes to logging of scraped items:
...
1. "Scraped Item" log level changed to DEBUG
2. "Dropped Item" log level changed to WARNING
3. added "Passed Item" log message with INFO level
2009-07-23 11:49:48 -03:00
Pablo Hoffman
e43e28bf1d
minimal doc improvement
2009-07-23 09:12:49 -03:00
Ismael Carnales
6d24ae5920
added reference to working with relative xpaths in the tutorial
2009-07-23 09:05:14 -03:00
Pablo Hoffman
9baa6bb2a8
minor selectors doc fix
2009-07-23 01:56:56 -03:00
Ismael Carnales
6eb2609d1e
removed old serializators from newitem
2009-07-22 15:34:56 -03:00
Ismael Carnales
3c4afb23be
moved newitem from scrapy.contrib_exp to scrapy.newitem
2009-07-22 15:13:36 -03:00
Ismael Carnales
eb3a1dc16c
return default values for newitem in __getitem__
2009-07-22 15:13:33 -03:00
Ismael Carnales
202894dd8f
fixes to newitem doc
2009-07-22 10:25:22 -03:00
Pablo Hoffman
7d8ba0542c
fixed typo in doc
2009-07-21 17:38:46 -03:00
Pablo Hoffman
aa345e116a
Added spider middleware documentation
2009-07-21 17:19:19 -03:00
Ismael Carnales
90f1d9e489
Added Scheduler middleware reference documentation
2009-07-21 16:52:27 -03:00
Ismael Carnales
7afc915717
added BaseItem as base item class, and moved extra functionality of ScrapedItem to RobustScrapedItem
2009-07-21 16:11:21 -03:00