1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 11:03:45 +00:00

1378 Commits

Author SHA1 Message Date
Pablo Hoffman
a23ff37050 ItemLoader: added one more test and improved other test names 2009-08-05 11:38:01 -03:00
Pablo Hoffman
7bc7af0162 ItemLoader: some more code cleanups, and added many more tests 2009-08-05 00:41:02 -03:00
Pablo Hoffman
9081e84e27 ItemLoader: sorted out module locations, and added more tests
--HG--
rename : scrapy/newitem/reducers.py => scrapy/newitem/loader/reducers.py
2009-08-04 20:02:49 -03:00
Pablo Hoffman
ac9f4c9cc2 Refatored scrapy.newitem package:
- left only one type of Field - just a dict wrapper to contain field metadata
- removed Item Builder and tests
- adapted Item Loader to work with new Field class
2009-08-04 19:26:31 -03:00
Pablo Hoffman
114dba2850 some minor simplifications to tree_expander() function 2009-08-04 09:10:21 -03:00
Pablo Hoffman
8d705ec302 added ItemLoader class, an alternative implementation of ItemBuilder with a slightly different API 2009-08-03 22:53:08 -03:00
Ismael Carnales
f05695d75e added first implementation of ItemBuilder 2009-08-03 17:27:57 -03:00
Ismael Carnales
32894643a0 added ListField documentation, ordered field reference alphabetically 2009-08-03 15:00:04 -03:00
Ismael Carnales
cf638e682c made ListField init with an instance (not a class) of Field 2009-08-03 15:00:02 -03:00
Ismael Carnales
d8b85ae7ad moved MultiValuedField to ListField 2009-08-02 18:43:35 -03:00
Pablo Hoffman
fcb33c7988 removed obsolete pipeline 2009-07-31 17:24:30 -03:00
Pablo Hoffman
7c049d2ef5 use standard 'mcs' for first argument of meta class __new__ method 2009-07-31 16:51:29 -03:00
Pablo Hoffman
02c454c26e newitem: added warning when trying to access item field value via getattr instead of getitem 2009-07-31 16:49:27 -03:00
Pablo Hoffman
c3427e075c added domain_stats parameter to stats_domain_closed signal 2009-07-31 16:36:35 -03:00
Pablo Hoffman
73172b244d added from_unicode_list() method to Field objects 2009-07-30 16:58:24 -03:00
Pablo Hoffman
c0dcd76424 moved unused scrapy.core.scheduler.store module to scrapy.contrib_exp.history
--HG--
rename : scrapy/core/scheduler/store.py => scrapy/contrib_exp/history/memorystore.py
rename : scrapy/contrib_exp/history/store.py => scrapy/contrib_exp/history/sqlstore.py
2009-07-30 13:51:43 -03:00
Daniel Grana
8ecd16b5e3 remove obsolete monkey patch for twisted 2.5 2009-07-29 21:02:15 -03:00
Pablo Hoffman
b336de7302 fixed minor bugs with spiderctl webconsole extension 2009-07-29 19:50:38 -03:00
Pablo Hoffman
4a1fc74d69 fixed spiderstats webconsole extension 2009-07-29 19:26:12 -03:00
Pablo Hoffman
01b79e386f fixed StatsCollector webconsole extension 2009-07-29 19:15:19 -03:00
Pablo Hoffman
a86b12a879 moved patches.py to xlib/patches.py to avoid import errors
--HG--
rename : scrapy/patches.py => scrapy/xlib/patches.py
2009-07-29 19:01:36 -03:00
Pablo Hoffman
8ec7c9e01c WEBCONSOLE_PORT setting now defaults to 6080 2009-07-29 18:59:34 -03:00
Pablo Hoffman
fde4f219a8 don't use packages when modules are enough
--HG--
rename : scrapy/extension/__init__.py => scrapy/extension.py
rename : scrapy/fetcher/__init__.py => scrapy/fetcher.py
rename : scrapy/log/__init__.py => scrapy/log.py
rename : scrapy/mail/__init__.py => scrapy/mail.py
rename : scrapy/patches/monkeypatches.py => scrapy/patches.py
2009-07-29 18:41:15 -03:00
Pablo Hoffman
230bcef7b6 some refactor of settings manager (without changing API): don't fail if no settings module is found (fail on scrapy.cmdline instead). also, some code improvements for clarity. 2009-07-29 18:26:29 -03:00
Pablo Hoffman
d57c0100db another minor fix to exporters 2009-07-29 08:23:38 -03:00
Pablo Hoffman
c3d732f28a another bug fix in pprint item exporter 2009-07-28 23:00:35 -03:00
Pablo Hoffman
643f93721b removed wrong self from base constructor calls 2009-07-28 20:53:03 -03:00
Pablo Hoffman
bb7b6815c0 added Item Exporters 2009-07-28 20:20:44 -03:00
Pablo Hoffman
75cf903e24 adapted project template to use the new Link Extractors location 2009-07-28 12:27:25 -03:00
Pablo Hoffman
64d9155572 CloseDomain extension: fixed bug on domain close when not using CLOSEDOMAIN_TIMEOUT 2009-07-28 12:23:13 -03:00
Pablo Hoffman
fcc91901eb finished cleaning up closedomain documentation, and updated default settings 2009-07-27 15:42:35 -03:00
Pablo Hoffman
09ba6927d7 Some changes to CloseDomain extension:
- added support for closing by item passed count (CLOSEDOMAIN_ITEMPASSED)
- removed support for sending notification emails (since that's the job of
  another extension)
2009-07-27 15:23:50 -03:00
Pablo Hoffman
9da66698f3 moved httprepr() method (from Request and Response objects) to scrapy.utils functions 2009-07-25 18:56:12 -03:00
Daniel Grana
bed3c38014 fix delayedclosedomain extension bug due to changing lastseen from datetime to time.time 2009-07-25 15:22:15 -03:00
Daniel Grana
fdcdc307da ignore docs/build 2009-07-25 15:21:22 -03:00
Pablo Hoffman
c615ace6bd doc: updated google directory links in firebug guide 2009-07-24 13:14:36 -03:00
Daniel Grana
96c3cdbec2 improve OffsiteMiddleware reference docs
--HG--
extra : rebase_source : 3ed3f23fc1ec63b521ead029c5749898f3ab05d7
2009-07-24 12:59:38 -03:00
Pablo Hoffman
d21a22eab5 fixed stats collector bug which wasn't throwing the stats_domain_closing signal (on subclasses) before the persisting stage 2009-07-23 13:03:25 -03:00
Pablo Hoffman
38c3f7d0b4 Some changes to logging of scraped items:
1. "Scraped Item" log level changed to DEBUG
2. "Dropped Item" log level changed to WARNING
3. added "Passed Item" log message with INFO level
2009-07-23 11:49:48 -03:00
Pablo Hoffman
e43e28bf1d minimal doc improvement 2009-07-23 09:12:49 -03:00
Ismael Carnales
6d24ae5920 added reference to working with relative xpaths in the tutorial 2009-07-23 09:05:14 -03:00
Pablo Hoffman
9baa6bb2a8 minor selectors doc fix 2009-07-23 01:56:56 -03:00
Ismael Carnales
6eb2609d1e removed old serializators from newitem 2009-07-22 15:34:56 -03:00
Ismael Carnales
3c4afb23be moved newitem from scrapy.contrib_exp to scrapy.newitem 2009-07-22 15:13:36 -03:00
Ismael Carnales
eb3a1dc16c return default values for newitem in __getitem__ 2009-07-22 15:13:33 -03:00
Ismael Carnales
202894dd8f fixes to newitem doc 2009-07-22 10:25:22 -03:00
Pablo Hoffman
7d8ba0542c fixed typo in doc 2009-07-21 17:38:46 -03:00
Pablo Hoffman
aa345e116a Added spider middleware documentation 2009-07-21 17:19:19 -03:00
Ismael Carnales
90f1d9e489 Added Scheduler middleware reference documentation 2009-07-21 16:52:27 -03:00
Ismael Carnales
7afc915717 added BaseItem as base item class, and moved extra functionality of ScrapedItem to RobustScrapedItem 2009-07-21 16:11:21 -03:00