mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-28 15:07:05 +00:00
--HG-- rename : scrapy/trunk/AUTHORS => AUTHORS rename : scrapy/trunk/INSTALL => INSTALL rename : scrapy/trunk/LICENSE => LICENSE rename : scrapy/trunk/README => README rename : scrapy/trunk/bin/runtests.sh => bin/runtests.sh rename : scrapy/trunk/docs/Makefile => docs/Makefile rename : scrapy/trunk/docs/README => docs/README rename : scrapy/trunk/docs/_ext/scrapydocs.py => docs/_ext/scrapydocs.py rename : scrapy/trunk/docs/_static/items_adaptors-sample1.html => docs/_static/items_adaptors-sample1.html rename : scrapy/trunk/docs/_static/scrapydoc.css => docs/_static/scrapydoc.css rename : scrapy/trunk/docs/_static/selectors-sample1.html => docs/_static/selectors-sample1.html rename : scrapy/trunk/docs/conf.py => docs/conf.py rename : scrapy/trunk/docs/faq.rst => docs/faq.rst rename : scrapy/trunk/docs/index.rst => docs/index.rst rename : scrapy/trunk/docs/intro/index.rst => docs/intro/index.rst rename : scrapy/trunk/docs/intro/install.rst => docs/intro/install.rst rename : scrapy/trunk/docs/intro/overview.rst => docs/intro/overview.rst rename : scrapy/trunk/docs/intro/tutorial.rst => docs/intro/tutorial.rst rename : scrapy/trunk/docs/media/scrapy-architecture.dia => docs/media/scrapy-architecture.dia rename : scrapy/trunk/docs/misc/api-stability.rst => docs/misc/api-stability.rst rename : scrapy/trunk/docs/misc/index.rst => docs/misc/index.rst rename : scrapy/trunk/docs/proposed/_images/scrapy_architecture.odg => docs/proposed/_images/scrapy_architecture.odg rename : scrapy/trunk/docs/proposed/_images/scrapy_architecture.png => docs/proposed/_images/scrapy_architecture.png rename : scrapy/trunk/docs/proposed/index.rst => docs/proposed/index.rst rename : scrapy/trunk/docs/proposed/introduction.rst => docs/proposed/introduction.rst rename : scrapy/trunk/docs/proposed/newitem.rst => docs/proposed/newitem.rst rename : scrapy/trunk/docs/proposed/spiders.rst => docs/proposed/spiders.rst rename : scrapy/trunk/docs/ref/downloader-middleware.rst => docs/ref/downloader-middleware.rst rename : scrapy/trunk/docs/ref/email.rst => docs/ref/email.rst rename : scrapy/trunk/docs/ref/exceptions.rst => docs/ref/exceptions.rst rename : scrapy/trunk/docs/ref/extension-manager.rst => docs/ref/extension-manager.rst rename : scrapy/trunk/docs/ref/extensions.rst => docs/ref/extensions.rst rename : scrapy/trunk/docs/ref/index.rst => docs/ref/index.rst rename : scrapy/trunk/docs/ref/link-extractors.rst => docs/ref/link-extractors.rst rename : scrapy/trunk/docs/ref/logging.rst => docs/ref/logging.rst rename : scrapy/trunk/docs/ref/request-response.rst => docs/ref/request-response.rst rename : scrapy/trunk/docs/ref/selectors.rst => docs/ref/selectors.rst rename : scrapy/trunk/docs/ref/settings.rst => docs/ref/settings.rst rename : scrapy/trunk/docs/ref/signals.rst => docs/ref/signals.rst rename : scrapy/trunk/docs/ref/spiders.rst => docs/ref/spiders.rst rename : scrapy/trunk/docs/topics/_images/adaptors_diagram.png => docs/topics/_images/adaptors_diagram.png rename : scrapy/trunk/docs/topics/_images/adaptors_diagram.svg => docs/topics/_images/adaptors_diagram.svg rename : scrapy/trunk/docs/topics/_images/firebug1.png => docs/topics/_images/firebug1.png rename : scrapy/trunk/docs/topics/_images/firebug2.png => docs/topics/_images/firebug2.png rename : scrapy/trunk/docs/topics/_images/firebug3.png => docs/topics/_images/firebug3.png rename : scrapy/trunk/docs/topics/_images/scrapy_architecture.odg => docs/topics/_images/scrapy_architecture.odg rename : scrapy/trunk/docs/topics/_images/scrapy_architecture.png => docs/topics/_images/scrapy_architecture.png rename : scrapy/trunk/docs/topics/adaptors.rst => docs/topics/adaptors.rst rename : scrapy/trunk/docs/topics/architecture.rst => docs/topics/architecture.rst rename : scrapy/trunk/docs/topics/downloader-middleware.rst => docs/topics/downloader-middleware.rst rename : scrapy/trunk/docs/topics/extensions.rst => docs/topics/extensions.rst rename : scrapy/trunk/docs/topics/firebug.rst => docs/topics/firebug.rst rename : scrapy/trunk/docs/topics/firefox.rst => docs/topics/firefox.rst rename : scrapy/trunk/docs/topics/index.rst => docs/topics/index.rst rename : scrapy/trunk/docs/topics/item-pipeline.rst => docs/topics/item-pipeline.rst rename : scrapy/trunk/docs/topics/items.rst => docs/topics/items.rst rename : scrapy/trunk/docs/topics/link-extractors.rst => docs/topics/link-extractors.rst rename : scrapy/trunk/docs/topics/robotstxt.rst => docs/topics/robotstxt.rst rename : scrapy/trunk/docs/topics/selectors.rst => docs/topics/selectors.rst rename : scrapy/trunk/docs/topics/settings.rst => docs/topics/settings.rst rename : scrapy/trunk/docs/topics/shell.rst => docs/topics/shell.rst rename : scrapy/trunk/docs/topics/spider-middleware.rst => docs/topics/spider-middleware.rst rename : scrapy/trunk/docs/topics/spiders.rst => docs/topics/spiders.rst rename : scrapy/trunk/docs/topics/stats.rst => docs/topics/stats.rst rename : scrapy/trunk/docs/topics/webconsole.rst => docs/topics/webconsole.rst rename : scrapy/trunk/examples/experimental/googledir/googledir/__init__.py => examples/experimental/googledir/googledir/__init__.py rename : scrapy/trunk/examples/experimental/googledir/googledir/items.py => examples/experimental/googledir/googledir/items.py rename : scrapy/trunk/examples/experimental/googledir/googledir/pipelines.py => examples/experimental/googledir/googledir/pipelines.py rename : scrapy/trunk/examples/experimental/googledir/googledir/settings.py => examples/experimental/googledir/googledir/settings.py rename : scrapy/trunk/examples/experimental/googledir/googledir/spiders/__init__.py => examples/experimental/googledir/googledir/spiders/__init__.py rename : scrapy/trunk/examples/experimental/googledir/googledir/spiders/google_directory.py => examples/experimental/googledir/googledir/spiders/google_directory.py rename : scrapy/trunk/examples/experimental/googledir/googledir/templates/spider_basic.tmpl => examples/experimental/googledir/googledir/templates/spider_basic.tmpl rename : scrapy/trunk/examples/experimental/googledir/googledir/templates/spider_crawl.tmpl => examples/experimental/googledir/googledir/templates/spider_crawl.tmpl rename : scrapy/trunk/examples/experimental/googledir/googledir/templates/spider_csvfeed.tmpl => examples/experimental/googledir/googledir/templates/spider_csvfeed.tmpl rename : scrapy/trunk/examples/experimental/googledir/googledir/templates/spider_xmlfeed.tmpl => examples/experimental/googledir/googledir/templates/spider_xmlfeed.tmpl rename : scrapy/trunk/examples/experimental/googledir/scrapy-ctl.py => examples/experimental/googledir/scrapy-ctl.py rename : scrapy/trunk/examples/googledir/googledir/__init__.py => examples/googledir/googledir/__init__.py rename : scrapy/trunk/examples/googledir/googledir/items.py => examples/googledir/googledir/items.py rename : scrapy/trunk/examples/googledir/googledir/pipelines.py => examples/googledir/googledir/pipelines.py rename : scrapy/trunk/examples/googledir/googledir/settings.py => examples/googledir/googledir/settings.py rename : scrapy/trunk/examples/googledir/googledir/spiders/__init__.py => examples/googledir/googledir/spiders/__init__.py rename : scrapy/trunk/examples/googledir/googledir/spiders/google_directory.py => examples/googledir/googledir/spiders/google_directory.py rename : scrapy/trunk/examples/googledir/scrapy-ctl.py => examples/googledir/scrapy-ctl.py rename : scrapy/trunk/extras/sql/scraping.sql => extras/sql/scraping.sql rename : scrapy/trunk/profiling/priorityqueue/pq_classes.py => profiling/priorityqueue/pq_classes.py rename : scrapy/trunk/profiling/priorityqueue/run.py => profiling/priorityqueue/run.py rename : scrapy/trunk/profiling/priorityqueue/test_cases.py => profiling/priorityqueue/test_cases.py rename : scrapy/trunk/scrapy/__init__.py => scrapy/__init__.py rename : scrapy/trunk/scrapy/bin/scrapy-admin.py => scrapy/bin/scrapy-admin.py rename : scrapy/trunk/scrapy/command/__init__.py => scrapy/command/__init__.py rename : scrapy/trunk/scrapy/command/cmdline.py => scrapy/command/cmdline.py rename : scrapy/trunk/scrapy/command/commands/__init__.py => scrapy/command/commands/__init__.py rename : scrapy/trunk/scrapy/command/commands/crawl.py => scrapy/command/commands/crawl.py rename : scrapy/trunk/scrapy/command/commands/download.py => scrapy/command/commands/download.py rename : scrapy/trunk/scrapy/command/commands/genspider.py => scrapy/command/commands/genspider.py rename : scrapy/trunk/scrapy/command/commands/help.py => scrapy/command/commands/help.py rename : scrapy/trunk/scrapy/command/commands/list.py => scrapy/command/commands/list.py rename : scrapy/trunk/scrapy/command/commands/log.py => scrapy/command/commands/log.py rename : scrapy/trunk/scrapy/command/commands/parse.py => scrapy/command/commands/parse.py rename : scrapy/trunk/scrapy/command/commands/shell.py => scrapy/command/commands/shell.py rename : scrapy/trunk/scrapy/command/commands/start.py => scrapy/command/commands/start.py rename : scrapy/trunk/scrapy/command/commands/stats.py => scrapy/command/commands/stats.py rename : scrapy/trunk/scrapy/command/models.py => scrapy/command/models.py rename : scrapy/trunk/scrapy/conf/__init__.py => scrapy/conf/__init__.py rename : scrapy/trunk/scrapy/conf/commands/__init__.py => scrapy/conf/commands/__init__.py rename : scrapy/trunk/scrapy/conf/commands/crawl.py => scrapy/conf/commands/crawl.py rename : scrapy/trunk/scrapy/conf/commands/help.py => scrapy/conf/commands/help.py rename : scrapy/trunk/scrapy/conf/commands/list.py => scrapy/conf/commands/list.py rename : scrapy/trunk/scrapy/conf/commands/log.py => scrapy/conf/commands/log.py rename : scrapy/trunk/scrapy/conf/commands/scrape.py => scrapy/conf/commands/scrape.py rename : scrapy/trunk/scrapy/conf/commands/shell.py => scrapy/conf/commands/shell.py rename : scrapy/trunk/scrapy/conf/commands/stats.py => scrapy/conf/commands/stats.py rename : scrapy/trunk/scrapy/conf/commands/test.py => scrapy/conf/commands/test.py rename : scrapy/trunk/scrapy/conf/default_settings.py => scrapy/conf/default_settings.py rename : scrapy/trunk/scrapy/contrib/__init__.py => scrapy/contrib/__init__.py rename : scrapy/trunk/scrapy/contrib/aws.py => scrapy/contrib/aws.py rename : scrapy/trunk/scrapy/contrib/closedomain.py => scrapy/contrib/closedomain.py rename : scrapy/trunk/scrapy/contrib/cluster/__init__.py => scrapy/contrib/cluster/__init__.py rename : scrapy/trunk/scrapy/contrib/cluster/crawler/__init__.py => scrapy/contrib/cluster/crawler/__init__.py rename : scrapy/trunk/scrapy/contrib/cluster/crawler/manager.py => scrapy/contrib/cluster/crawler/manager.py rename : scrapy/trunk/scrapy/contrib/cluster/hooks/__init__.py => scrapy/contrib/cluster/hooks/__init__.py rename : scrapy/trunk/scrapy/contrib/cluster/hooks/svn.py => scrapy/contrib/cluster/hooks/svn.py rename : scrapy/trunk/scrapy/contrib/cluster/master/__init__.py => scrapy/contrib/cluster/master/__init__.py rename : scrapy/trunk/scrapy/contrib/cluster/master/manager.py => scrapy/contrib/cluster/master/manager.py rename : scrapy/trunk/scrapy/contrib/cluster/master/web.py => scrapy/contrib/cluster/master/web.py rename : scrapy/trunk/scrapy/contrib/cluster/master/ws_api.txt => scrapy/contrib/cluster/master/ws_api.txt rename : scrapy/trunk/scrapy/contrib/cluster/tools/scrapy-cluster-ctl.py => scrapy/contrib/cluster/tools/scrapy-cluster-ctl.py rename : scrapy/trunk/scrapy/contrib/cluster/tools/test-worker.py => scrapy/contrib/cluster/tools/test-worker.py rename : scrapy/trunk/scrapy/contrib/cluster/worker/__init__.py => scrapy/contrib/cluster/worker/__init__.py rename : scrapy/trunk/scrapy/contrib/cluster/worker/manager.py => scrapy/contrib/cluster/worker/manager.py rename : scrapy/trunk/scrapy/contrib/codecs/__init__.py => scrapy/contrib/codecs/__init__.py rename : scrapy/trunk/scrapy/contrib/codecs/x_mac_roman.py => scrapy/contrib/codecs/x_mac_roman.py rename : scrapy/trunk/scrapy/contrib/debug.py => scrapy/contrib/debug.py rename : scrapy/trunk/scrapy/contrib/delayedclosedomain.py => scrapy/contrib/delayedclosedomain.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/__init__.py => scrapy/contrib/downloadermiddleware/__init__.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/cache.py => scrapy/contrib/downloadermiddleware/cache.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/common.py => scrapy/contrib/downloadermiddleware/common.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/cookies.py => scrapy/contrib/downloadermiddleware/cookies.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/debug.py => scrapy/contrib/downloadermiddleware/debug.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/errorpages.py => scrapy/contrib/downloadermiddleware/errorpages.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/httpauth.py => scrapy/contrib/downloadermiddleware/httpauth.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/httpcompression.py => scrapy/contrib/downloadermiddleware/httpcompression.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/redirect.py => scrapy/contrib/downloadermiddleware/redirect.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/retry.py => scrapy/contrib/downloadermiddleware/retry.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/robotstxt.py => scrapy/contrib/downloadermiddleware/robotstxt.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/stats.py => scrapy/contrib/downloadermiddleware/stats.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/useragent.py => scrapy/contrib/downloadermiddleware/useragent.py rename : scrapy/trunk/scrapy/contrib/groupsettings.py => scrapy/contrib/groupsettings.py rename : scrapy/trunk/scrapy/contrib/item/__init__.py => scrapy/contrib/item/__init__.py rename : scrapy/trunk/scrapy/contrib/item/models.py => scrapy/contrib/item/models.py rename : scrapy/trunk/scrapy/contrib/itemsampler.py => scrapy/contrib/itemsampler.py rename : scrapy/trunk/scrapy/contrib/link_extractors.py => scrapy/contrib/link_extractors.py rename : scrapy/trunk/scrapy/contrib/memdebug.py => scrapy/contrib/memdebug.py rename : scrapy/trunk/scrapy/contrib/memusage.py => scrapy/contrib/memusage.py rename : scrapy/trunk/scrapy/contrib/pipeline/__init__.py => scrapy/contrib/pipeline/__init__.py rename : scrapy/trunk/scrapy/contrib/pipeline/images.py => scrapy/contrib/pipeline/images.py rename : scrapy/trunk/scrapy/contrib/pipeline/media.py => scrapy/contrib/pipeline/media.py rename : scrapy/trunk/scrapy/contrib/pipeline/s3images.py => scrapy/contrib/pipeline/s3images.py rename : scrapy/trunk/scrapy/contrib/pipeline/show.py => scrapy/contrib/pipeline/show.py rename : scrapy/trunk/scrapy/contrib/prioritizers.py => scrapy/contrib/prioritizers.py rename : scrapy/trunk/scrapy/contrib/response/__init__.py => scrapy/contrib/response/__init__.py rename : scrapy/trunk/scrapy/contrib/response/soup.py => scrapy/contrib/response/soup.py rename : scrapy/trunk/scrapy/contrib/schedulermiddleware/__init__.py => scrapy/contrib/schedulermiddleware/__init__.py rename : scrapy/trunk/scrapy/contrib/schedulermiddleware/duplicatesfilter.py => scrapy/contrib/schedulermiddleware/duplicatesfilter.py rename : scrapy/trunk/scrapy/contrib/spider/__init__.py => scrapy/contrib/spider/__init__.py rename : scrapy/trunk/scrapy/contrib/spider/profiler.py => scrapy/contrib/spider/profiler.py rename : scrapy/trunk/scrapy/contrib/spider/reloader.py => scrapy/contrib/spider/reloader.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/__init__.py => scrapy/contrib/spidermiddleware/__init__.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/depth.py => scrapy/contrib/spidermiddleware/depth.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/duplicatesfilter.py => scrapy/contrib/spidermiddleware/duplicatesfilter.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/limit.py => scrapy/contrib/spidermiddleware/limit.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/offsite.py => scrapy/contrib/spidermiddleware/offsite.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/referer.py => scrapy/contrib/spidermiddleware/referer.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/restrict.py => scrapy/contrib/spidermiddleware/restrict.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/urlfilter.py => scrapy/contrib/spidermiddleware/urlfilter.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/urllength.py => scrapy/contrib/spidermiddleware/urllength.py rename : scrapy/trunk/scrapy/contrib/spiders/__init__.py => scrapy/contrib/spiders/__init__.py rename : scrapy/trunk/scrapy/contrib/spiders/crawl.py => scrapy/contrib/spiders/crawl.py rename : scrapy/trunk/scrapy/contrib/spiders/feed.py => scrapy/contrib/spiders/feed.py rename : scrapy/trunk/scrapy/contrib/spiders/generic.py => scrapy/contrib/spiders/generic.py rename : scrapy/trunk/scrapy/contrib/web/__init__.py => scrapy/contrib/web/__init__.py rename : scrapy/trunk/scrapy/contrib/web/http.py => scrapy/contrib/web/http.py rename : scrapy/trunk/scrapy/contrib/web/json.py => scrapy/contrib/web/json.py rename : scrapy/trunk/scrapy/contrib/web/service.py => scrapy/contrib/web/service.py rename : scrapy/trunk/scrapy/contrib/web/site.py => scrapy/contrib/web/site.py rename : scrapy/trunk/scrapy/contrib/web/stats.py => scrapy/contrib/web/stats.py rename : scrapy/trunk/scrapy/contrib/webconsole/__init__.py => scrapy/contrib/webconsole/__init__.py rename : scrapy/trunk/scrapy/contrib/webconsole/enginestatus.py => scrapy/contrib/webconsole/enginestatus.py rename : scrapy/trunk/scrapy/contrib/webconsole/livestats.py => scrapy/contrib/webconsole/livestats.py rename : scrapy/trunk/scrapy/contrib/webconsole/scheduler.py => scrapy/contrib/webconsole/scheduler.py rename : scrapy/trunk/scrapy/contrib/webconsole/spiderctl.py => scrapy/contrib/webconsole/spiderctl.py rename : scrapy/trunk/scrapy/contrib/webconsole/spiderstats.py => scrapy/contrib/webconsole/spiderstats.py rename : scrapy/trunk/scrapy/contrib/webconsole/stats.py => scrapy/contrib/webconsole/stats.py rename : scrapy/trunk/scrapy/contrib_exp/__init__.py => scrapy/contrib_exp/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/adaptors/__init__.py => scrapy/contrib_exp/adaptors/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/adaptors/date.py => scrapy/contrib_exp/adaptors/date.py rename : scrapy/trunk/scrapy/contrib_exp/adaptors/extraction.py => scrapy/contrib_exp/adaptors/extraction.py rename : scrapy/trunk/scrapy/contrib_exp/adaptors/markup.py => scrapy/contrib_exp/adaptors/markup.py rename : scrapy/trunk/scrapy/contrib_exp/adaptors/misc.py => scrapy/contrib_exp/adaptors/misc.py rename : scrapy/trunk/scrapy/contrib_exp/downloadermiddleware/__init__.py => scrapy/contrib_exp/downloadermiddleware/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/downloadermiddleware/decompression.py => scrapy/contrib_exp/downloadermiddleware/decompression.py rename : scrapy/trunk/scrapy/contrib_exp/history/__init__.py => scrapy/contrib_exp/history/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/history/history.py => scrapy/contrib_exp/history/history.py rename : scrapy/trunk/scrapy/contrib_exp/history/middleware.py => scrapy/contrib_exp/history/middleware.py rename : scrapy/trunk/scrapy/contrib_exp/history/scheduler.py => scrapy/contrib_exp/history/scheduler.py rename : scrapy/trunk/scrapy/contrib_exp/history/store.py => scrapy/contrib_exp/history/store.py rename : scrapy/trunk/scrapy/contrib_exp/link/__init__.py => scrapy/contrib_exp/link/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/newitem/__init__.py => scrapy/contrib_exp/newitem/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/newitem/adaptors.py => scrapy/contrib_exp/newitem/adaptors.py rename : scrapy/trunk/scrapy/contrib_exp/newitem/fields.py => scrapy/contrib_exp/newitem/fields.py rename : scrapy/trunk/scrapy/contrib_exp/newitem/models.py => scrapy/contrib_exp/newitem/models.py rename : scrapy/trunk/scrapy/contrib_exp/pipeline/shoveitem.py => scrapy/contrib_exp/pipeline/shoveitem.py rename : scrapy/trunk/scrapy/core/__init__.py => scrapy/core/__init__.py rename : scrapy/trunk/scrapy/core/downloader/__init__.py => scrapy/core/downloader/__init__.py rename : scrapy/trunk/scrapy/core/downloader/dnscache.py => scrapy/core/downloader/dnscache.py rename : scrapy/trunk/scrapy/core/downloader/handlers.py => scrapy/core/downloader/handlers.py rename : scrapy/trunk/scrapy/core/downloader/manager.py => scrapy/core/downloader/manager.py rename : scrapy/trunk/scrapy/core/downloader/middleware.py => scrapy/core/downloader/middleware.py rename : scrapy/trunk/scrapy/core/downloader/responsetypes/__init__.py => scrapy/core/downloader/responsetypes/__init__.py rename : scrapy/trunk/scrapy/core/downloader/responsetypes/mime.types => scrapy/core/downloader/responsetypes/mime.types rename : scrapy/trunk/scrapy/core/downloader/webclient.py => scrapy/core/downloader/webclient.py rename : scrapy/trunk/scrapy/core/engine.py => scrapy/core/engine.py rename : scrapy/trunk/scrapy/core/exceptions.py => scrapy/core/exceptions.py rename : scrapy/trunk/scrapy/core/manager.py => scrapy/core/manager.py rename : scrapy/trunk/scrapy/core/prioritizers.py => scrapy/core/prioritizers.py rename : scrapy/trunk/scrapy/core/scheduler/__init__.py => scrapy/core/scheduler/__init__.py rename : scrapy/trunk/scrapy/core/scheduler/middleware.py => scrapy/core/scheduler/middleware.py rename : scrapy/trunk/scrapy/core/scheduler/schedulers.py => scrapy/core/scheduler/schedulers.py rename : scrapy/trunk/scrapy/core/scheduler/store.py => scrapy/core/scheduler/store.py rename : scrapy/trunk/scrapy/core/signals.py => scrapy/core/signals.py rename : scrapy/trunk/scrapy/dupefilter/__init__.py => scrapy/dupefilter/__init__.py rename : scrapy/trunk/scrapy/extension/__init__.py => scrapy/extension/__init__.py rename : scrapy/trunk/scrapy/fetcher/__init__.py => scrapy/fetcher/__init__.py rename : scrapy/trunk/scrapy/http/__init__.py => scrapy/http/__init__.py rename : scrapy/trunk/scrapy/http/cookies.py => scrapy/http/cookies.py rename : scrapy/trunk/scrapy/http/headers.py => scrapy/http/headers.py rename : scrapy/trunk/scrapy/http/request/__init__.py => scrapy/http/request/__init__.py rename : scrapy/trunk/scrapy/http/request/form.py => scrapy/http/request/form.py rename : scrapy/trunk/scrapy/http/request/rpc.py => scrapy/http/request/rpc.py rename : scrapy/trunk/scrapy/http/response/__init__.py => scrapy/http/response/__init__.py rename : scrapy/trunk/scrapy/http/response/html.py => scrapy/http/response/html.py rename : scrapy/trunk/scrapy/http/response/text.py => scrapy/http/response/text.py rename : scrapy/trunk/scrapy/http/response/xml.py => scrapy/http/response/xml.py rename : scrapy/trunk/scrapy/http/url.py => scrapy/http/url.py rename : scrapy/trunk/scrapy/item/__init__.py => scrapy/item/__init__.py rename : scrapy/trunk/scrapy/item/adaptors.py => scrapy/item/adaptors.py rename : scrapy/trunk/scrapy/item/models.py => scrapy/item/models.py rename : scrapy/trunk/scrapy/item/pipeline.py => scrapy/item/pipeline.py rename : scrapy/trunk/scrapy/link/__init__.py => scrapy/link/__init__.py rename : scrapy/trunk/scrapy/link/extractors.py => scrapy/link/extractors.py rename : scrapy/trunk/scrapy/log/__init__.py => scrapy/log/__init__.py rename : scrapy/trunk/scrapy/mail/__init__.py => scrapy/mail/__init__.py rename : scrapy/trunk/scrapy/management/__init__.py => scrapy/management/__init__.py rename : scrapy/trunk/scrapy/management/telnet.py => scrapy/management/telnet.py rename : scrapy/trunk/scrapy/management/web.py => scrapy/management/web.py rename : scrapy/trunk/scrapy/patches/__init__.py => scrapy/patches/__init__.py rename : scrapy/trunk/scrapy/patches/monkeypatches.py => scrapy/patches/monkeypatches.py rename : scrapy/trunk/scrapy/spider/__init__.py => scrapy/spider/__init__.py rename : scrapy/trunk/scrapy/spider/manager.py => scrapy/spider/manager.py rename : scrapy/trunk/scrapy/spider/middleware.py => scrapy/spider/middleware.py rename : scrapy/trunk/scrapy/spider/models.py => scrapy/spider/models.py rename : scrapy/trunk/scrapy/stats/__init__.py => scrapy/stats/__init__.py rename : scrapy/trunk/scrapy/stats/corestats.py => scrapy/stats/corestats.py rename : scrapy/trunk/scrapy/stats/statscollector.py => scrapy/stats/statscollector.py rename : scrapy/trunk/scrapy/store/__init__.py => scrapy/store/__init__.py rename : scrapy/trunk/scrapy/store/db.py => scrapy/store/db.py rename : scrapy/trunk/scrapy/templates/project/module/__init__.py => scrapy/templates/project/module/__init__.py rename : scrapy/trunk/scrapy/templates/project/module/items.py.tmpl => scrapy/templates/project/module/items.py.tmpl rename : scrapy/trunk/scrapy/templates/project/module/pipelines.py.tmpl => scrapy/templates/project/module/pipelines.py.tmpl rename : scrapy/trunk/scrapy/templates/project/module/settings.py.tmpl => scrapy/templates/project/module/settings.py.tmpl rename : scrapy/trunk/scrapy/templates/project/module/spiders/__init__.py => scrapy/templates/project/module/spiders/__init__.py rename : scrapy/trunk/scrapy/templates/project/module/templates/spider_basic.tmpl => scrapy/templates/project/module/templates/spider_basic.tmpl rename : scrapy/trunk/scrapy/templates/project/module/templates/spider_crawl.tmpl => scrapy/templates/project/module/templates/spider_crawl.tmpl rename : scrapy/trunk/scrapy/templates/project/module/templates/spider_csvfeed.tmpl => scrapy/templates/project/module/templates/spider_csvfeed.tmpl rename : scrapy/trunk/scrapy/templates/project/module/templates/spider_xmlfeed.tmpl => scrapy/templates/project/module/templates/spider_xmlfeed.tmpl rename : scrapy/trunk/scrapy/templates/project/root/scrapy-ctl.py => scrapy/templates/project/root/scrapy-ctl.py rename : scrapy/trunk/scrapy/tests/__init__.py => scrapy/tests/__init__.py rename : scrapy/trunk/scrapy/tests/run.py => scrapy/tests/run.py rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/enc-ascii.html => scrapy/tests/sample_data/adaptors/enc-ascii.html rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/enc-cp1252.html => scrapy/tests/sample_data/adaptors/enc-cp1252.html rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/enc-latin1.html => scrapy/tests/sample_data/adaptors/enc-latin1.html rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/enc-utf8-meta-latin1.html => scrapy/tests/sample_data/adaptors/enc-utf8-meta-latin1.html rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/enc-utf8.html => scrapy/tests/sample_data/adaptors/enc-utf8.html rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/extr_unquoted.xml => scrapy/tests/sample_data/adaptors/extr_unquoted.xml rename : scrapy/trunk/scrapy/tests/sample_data/compressed/feed-sample1.tar => scrapy/tests/sample_data/compressed/feed-sample1.tar rename : scrapy/trunk/scrapy/tests/sample_data/compressed/feed-sample1.xml => scrapy/tests/sample_data/compressed/feed-sample1.xml rename : scrapy/trunk/scrapy/tests/sample_data/compressed/feed-sample1.xml.bz2 => scrapy/tests/sample_data/compressed/feed-sample1.xml.bz2 rename : scrapy/trunk/scrapy/tests/sample_data/compressed/feed-sample1.xml.gz => scrapy/tests/sample_data/compressed/feed-sample1.xml.gz rename : scrapy/trunk/scrapy/tests/sample_data/compressed/feed-sample1.zip => scrapy/tests/sample_data/compressed/feed-sample1.zip rename : scrapy/trunk/scrapy/tests/sample_data/compressed/html-gzip.bin => scrapy/tests/sample_data/compressed/html-gzip.bin rename : scrapy/trunk/scrapy/tests/sample_data/compressed/html-rawdeflate.bin => scrapy/tests/sample_data/compressed/html-rawdeflate.bin rename : scrapy/trunk/scrapy/tests/sample_data/compressed/html-zlibdeflate.bin => scrapy/tests/sample_data/compressed/html-zlibdeflate.bin rename : scrapy/trunk/scrapy/tests/sample_data/feeds/feed-sample1.xml => scrapy/tests/sample_data/feeds/feed-sample1.xml rename : scrapy/trunk/scrapy/tests/sample_data/feeds/feed-sample2.xml => scrapy/tests/sample_data/feeds/feed-sample2.xml rename : scrapy/trunk/scrapy/tests/sample_data/feeds/feed-sample3.csv => scrapy/tests/sample_data/feeds/feed-sample3.csv rename : scrapy/trunk/scrapy/tests/sample_data/feeds/feed-sample4.csv => scrapy/tests/sample_data/feeds/feed-sample4.csv rename : scrapy/trunk/scrapy/tests/sample_data/feeds/feed-sample5.csv => scrapy/tests/sample_data/feeds/feed-sample5.csv rename : scrapy/trunk/scrapy/tests/sample_data/link_extractor/image_linkextractor.html => scrapy/tests/sample_data/link_extractor/image_linkextractor.html rename : scrapy/trunk/scrapy/tests/sample_data/link_extractor/linkextractor_latin1.html => scrapy/tests/sample_data/link_extractor/linkextractor_latin1.html rename : scrapy/trunk/scrapy/tests/sample_data/link_extractor/linkextractor_noenc.html => scrapy/tests/sample_data/link_extractor/linkextractor_noenc.html rename : scrapy/trunk/scrapy/tests/sample_data/link_extractor/regex_linkextractor.html => scrapy/tests/sample_data/link_extractor/regex_linkextractor.html rename : scrapy/trunk/scrapy/tests/sample_data/test_site/index.html => scrapy/tests/sample_data/test_site/index.html rename : scrapy/trunk/scrapy/tests/sample_data/test_site/item1.html => scrapy/tests/sample_data/test_site/item1.html rename : scrapy/trunk/scrapy/tests/sample_data/test_site/item2.html => scrapy/tests/sample_data/test_site/item2.html rename : scrapy/trunk/scrapy/tests/test_adaptors.py => scrapy/tests/test_adaptors.py rename : scrapy/trunk/scrapy/tests/test_aws.py => scrapy/tests/test_aws.py rename : scrapy/trunk/scrapy/tests/test_c14nurls.py => scrapy/tests/test_c14nurls.py rename : scrapy/trunk/scrapy/tests/test_contrib_response_soup.py => scrapy/tests/test_contrib_response_soup.py rename : scrapy/trunk/scrapy/tests/test_dependencies.py => scrapy/tests/test_dependencies.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_cookies.py => scrapy/tests/test_downloadermiddleware_cookies.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_decompression.py => scrapy/tests/test_downloadermiddleware_decompression.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_httpcompression.py => scrapy/tests/test_downloadermiddleware_httpcompression.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_redirect.py => scrapy/tests/test_downloadermiddleware_redirect.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_retry.py => scrapy/tests/test_downloadermiddleware_retry.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_useragent.py => scrapy/tests/test_downloadermiddleware_useragent.py rename : scrapy/trunk/scrapy/tests/test_dupefilter.py => scrapy/tests/test_dupefilter.py rename : scrapy/trunk/scrapy/tests/test_engine.py => scrapy/tests/test_engine.py rename : scrapy/trunk/scrapy/tests/test_http_cookies.py => scrapy/tests/test_http_cookies.py rename : scrapy/trunk/scrapy/tests/test_http_headers.py => scrapy/tests/test_http_headers.py rename : scrapy/trunk/scrapy/tests/test_http_request.py => scrapy/tests/test_http_request.py rename : scrapy/trunk/scrapy/tests/test_http_response.py => scrapy/tests/test_http_response.py rename : scrapy/trunk/scrapy/tests/test_http_url.py => scrapy/tests/test_http_url.py rename : scrapy/trunk/scrapy/tests/test_item.py => scrapy/tests/test_item.py rename : scrapy/trunk/scrapy/tests/test_itemadaptor.py => scrapy/tests/test_itemadaptor.py rename : scrapy/trunk/scrapy/tests/test_libxml2.py => scrapy/tests/test_libxml2.py rename : scrapy/trunk/scrapy/tests/test_link.py => scrapy/tests/test_link.py rename : scrapy/trunk/scrapy/tests/test_newitem.py => scrapy/tests/test_newitem.py rename : scrapy/trunk/scrapy/tests/test_pipeline_images.py => scrapy/tests/test_pipeline_images.py rename : scrapy/trunk/scrapy/tests/test_responsetypes.py => scrapy/tests/test_responsetypes.py rename : scrapy/trunk/scrapy/tests/test_robustscrapeditem.py => scrapy/tests/test_robustscrapeditem.py rename : scrapy/trunk/scrapy/tests/test_schedulermiddleware_duplicatesfilter.py => scrapy/tests/test_schedulermiddleware_duplicatesfilter.py rename : scrapy/trunk/scrapy/tests/test_serialization.py => scrapy/tests/test_serialization.py rename : scrapy/trunk/scrapy/tests/test_spidermiddleware_duplicatesfilter.py => scrapy/tests/test_spidermiddleware_duplicatesfilter.py rename : scrapy/trunk/scrapy/tests/test_spidermonkey.py => scrapy/tests/test_spidermonkey.py rename : scrapy/trunk/scrapy/tests/test_spiders/__init__.py => scrapy/tests/test_spiders/__init__.py rename : scrapy/trunk/scrapy/tests/test_spiders/testspider.py => scrapy/tests/test_spiders/testspider.py rename : scrapy/trunk/scrapy/tests/test_stats.py => scrapy/tests/test_stats.py rename : scrapy/trunk/scrapy/tests/test_storedb.py => scrapy/tests/test_storedb.py rename : scrapy/trunk/scrapy/tests/test_utils_datatypes.py => scrapy/tests/test_utils_datatypes.py rename : scrapy/trunk/scrapy/tests/test_utils_defer.py => scrapy/tests/test_utils_defer.py rename : scrapy/trunk/scrapy/tests/test_utils_iterators.py => scrapy/tests/test_utils_iterators.py rename : scrapy/trunk/scrapy/tests/test_utils_markup.py => scrapy/tests/test_utils_markup.py rename : scrapy/trunk/scrapy/tests/test_utils_misc.py => scrapy/tests/test_utils_misc.py rename : scrapy/trunk/scrapy/tests/test_utils_python.py => scrapy/tests/test_utils_python.py rename : scrapy/trunk/scrapy/tests/test_utils_request.py => scrapy/tests/test_utils_request.py rename : scrapy/trunk/scrapy/tests/test_utils_response.py => scrapy/tests/test_utils_response.py rename : scrapy/trunk/scrapy/tests/test_utils_url.py => scrapy/tests/test_utils_url.py rename : scrapy/trunk/scrapy/tests/test_webclient.py => scrapy/tests/test_webclient.py rename : scrapy/trunk/scrapy/tests/test_xpath.py => scrapy/tests/test_xpath.py rename : scrapy/trunk/scrapy/tests/test_xpath_extension.py => scrapy/tests/test_xpath_extension.py rename : scrapy/trunk/scrapy/utils/__init__.py => scrapy/utils/__init__.py rename : scrapy/trunk/scrapy/utils/c14n.py => scrapy/utils/c14n.py rename : scrapy/trunk/scrapy/utils/datatypes.py => scrapy/utils/datatypes.py rename : scrapy/trunk/scrapy/utils/db.py => scrapy/utils/db.py rename : scrapy/trunk/scrapy/utils/defer.py => scrapy/utils/defer.py rename : scrapy/trunk/scrapy/utils/display.py => scrapy/utils/display.py rename : scrapy/trunk/scrapy/utils/http.py => scrapy/utils/http.py rename : scrapy/trunk/scrapy/utils/iterators.py => scrapy/utils/iterators.py rename : scrapy/trunk/scrapy/utils/markup.py => scrapy/utils/markup.py rename : scrapy/trunk/scrapy/utils/misc.py => scrapy/utils/misc.py rename : scrapy/trunk/scrapy/utils/python.py => scrapy/utils/python.py rename : scrapy/trunk/scrapy/utils/request.py => scrapy/utils/request.py rename : scrapy/trunk/scrapy/utils/response.py => scrapy/utils/response.py rename : scrapy/trunk/scrapy/utils/serialization.py => scrapy/utils/serialization.py rename : scrapy/trunk/scrapy/utils/test.py => scrapy/utils/test.py rename : scrapy/trunk/scrapy/utils/url.py => scrapy/utils/url.py rename : scrapy/trunk/scrapy/xlib/BeautifulSoup.py => scrapy/xlib/BeautifulSoup.py rename : scrapy/trunk/scrapy/xlib/ClientForm.py => scrapy/xlib/ClientForm.py rename : scrapy/trunk/scrapy/xlib/__init__.py => scrapy/xlib/__init__.py rename : scrapy/trunk/scrapy/xlib/lrucache.py => scrapy/xlib/lrucache.py rename : scrapy/trunk/scrapy/xlib/lsprofcalltree.py => scrapy/xlib/lsprofcalltree.py rename : scrapy/trunk/scrapy/xlib/pydispatch/__init__.py => scrapy/xlib/pydispatch/__init__.py rename : scrapy/trunk/scrapy/xlib/pydispatch/dispatcher.py => scrapy/xlib/pydispatch/dispatcher.py rename : scrapy/trunk/scrapy/xlib/pydispatch/errors.py => scrapy/xlib/pydispatch/errors.py rename : scrapy/trunk/scrapy/xlib/pydispatch/license.txt => scrapy/xlib/pydispatch/license.txt rename : scrapy/trunk/scrapy/xlib/pydispatch/robust.py => scrapy/xlib/pydispatch/robust.py rename : scrapy/trunk/scrapy/xlib/pydispatch/robustapply.py => scrapy/xlib/pydispatch/robustapply.py rename : scrapy/trunk/scrapy/xlib/pydispatch/saferef.py => scrapy/xlib/pydispatch/saferef.py rename : scrapy/trunk/scrapy/xlib/spidermonkey/INSTALL.scrapy => scrapy/xlib/spidermonkey/INSTALL.scrapy rename : scrapy/trunk/scrapy/xlib/spidermonkey/__init__.py => scrapy/xlib/spidermonkey/__init__.py rename : scrapy/trunk/scrapy/xlib/spidermonkey/sm_settings.py => scrapy/xlib/spidermonkey/sm_settings.py rename : scrapy/trunk/scrapy/xlib/spidermonkey/spidermonkey.py => scrapy/xlib/spidermonkey/spidermonkey.py rename : scrapy/trunk/scrapy/xpath/__init__.py => scrapy/xpath/__init__.py rename : scrapy/trunk/scrapy/xpath/constructors.py => scrapy/xpath/constructors.py rename : scrapy/trunk/scrapy/xpath/document.py => scrapy/xpath/document.py rename : scrapy/trunk/scrapy/xpath/extension.py => scrapy/xpath/extension.py rename : scrapy/trunk/scrapy/xpath/selector.py => scrapy/xpath/selector.py rename : scrapy/trunk/scrapy/xpath/types.py => scrapy/xpath/types.py rename : scrapy/trunk/scripts/rpm-install.sh => scripts/rpm-install.sh rename : scrapy/trunk/setup.cfg => setup.cfg rename : scrapy/trunk/setup.py => setup.py
206 lines
8.2 KiB
ReStructuredText
206 lines
8.2 KiB
ReStructuredText
.. _topics-adaptors:
|
|
|
|
=======================
|
|
Adaptors (experimental)
|
|
=======================
|
|
|
|
.. warning::
|
|
|
|
Adaptors are an experimental feature of Scrapy, which mean its API is not
|
|
yet stable and could suffer minor changes before the next stable release.
|
|
|
|
Quick overview
|
|
==============
|
|
|
|
Scrapy's adaptors are a nice feature attached to :class:`RobustScrapedItem`
|
|
that allow you to easily modify (adapt to your needs) any kind of information
|
|
you want to put in your items at assignation time.
|
|
|
|
The following diagram shows the data flow from the moment you call the
|
|
``attribute`` method until the attribute is actually set.
|
|
|
|
.. image:: _images/adaptors_diagram.png
|
|
|
|
As you can see, adaptor pipelines are executed in tree form; which means that,
|
|
for each of the values you pass to the ``attribute`` method, the first adaptor
|
|
will be applied. Then, for each of the resulting values of the first adaptor,
|
|
the second adaptor will be called, and so on. This process will end up with a
|
|
list of adapted values, which may contain zero, one, or many values.
|
|
|
|
In case the attribute is a single-valued (this is defined in the item's
|
|
``ATTRIBUTES`` dictionary), the first element of this list will be set, unless
|
|
you call the ``attribute`` method with the add parameter as True, in which case
|
|
the item's method ``_add_single_attributes`` will be called with the
|
|
attribute's name, type, and the list of attributes to join as parameters. By
|
|
default, this method raises NotImplementedError, so you should override it in
|
|
your items in order to join any kind of objects.
|
|
|
|
If the attribute is a multivalued, the resulting list will be set to the item
|
|
as is, unless you use -again- add=True, in which case the list of
|
|
already-existing values (if any) will be extended with the new one.pgq
|
|
|
|
Adaptor Pipelines
|
|
=================
|
|
|
|
.. class:: AdaptorPipe(adaptors=None)
|
|
|
|
An instance of this class represents an adaptor pipeline to be set for
|
|
adapting a certain item's attribute. It provides some useful methods for
|
|
adding/removing adaptors, and takes care of executing them properly.
|
|
Usually this class is not used directly, since the items already provide
|
|
ways to manage adaptors without having to handle AdaptorPipes.
|
|
|
|
:param adaptors: A list of callables to be added as adaptors at
|
|
instancing time.
|
|
|
|
Methods:
|
|
|
|
.. method:: add_adaptor(adaptor, position=None)
|
|
|
|
This method is used for adding adaptors to the pipeline given
|
|
a certain position.
|
|
|
|
:param adaptor: Any callable that works as an adaptor
|
|
:param position: An integer meaning the position in which the adaptor
|
|
will be inserted. If it's None the adaptor will be appended at
|
|
the end of the pipeline.
|
|
|
|
Usage
|
|
=====
|
|
|
|
As it was previously said, in order to use adaptor pipelines you must inherit
|
|
your items from the :class:`RobustScrapedItem` class. If you don't know
|
|
anything about these items, read the :ref:`topics-items` reference first.
|
|
|
|
Once you've created your own item class (inherited from
|
|
:class:`RobustScrapedItem`) with the attributes you're going to use, you have
|
|
to add adaptor pipelines to each attribute you'd like to adapt data for. For
|
|
doing so, RobustScrapedItems provide some useful methods like ``set_adaptors``,
|
|
``set_attrib_adaptors``, and more (which are also described in its reference)
|
|
so that you don't need to work with :class:`AdaptorPipe` objects directly.
|
|
|
|
Adaptors
|
|
--------
|
|
|
|
Let's now talk a bit about adaptors (singularly), what are them, and how
|
|
should they be implemented?
|
|
|
|
Adaptors are basically, any callable that receives
|
|
a value, modifies it, and returns a new value (or more) so that the next
|
|
adaptor goes on with another adapting task (or not). This is done this way to
|
|
make the process of modifying information very customizable, and also to make
|
|
adaptors reusable, since they are intended to be small functions designed for
|
|
simple purposes that can be applied in many different cases. For example, you
|
|
could make an adaptor for removing any <b> tags in a text, like this::
|
|
|
|
>>> B_TAG_RE = re.compile(r'</?b\s*>')
|
|
>>> def remove_b_tags(text):
|
|
>>> return B_TAG_RE.sub('', text)
|
|
|
|
Then you could easily add this adaptor to a certain attribute's pipeline like
|
|
this::
|
|
|
|
>>> item = MyItem()
|
|
>>> item.add_adaptor('text', remove_b_tags)
|
|
>>> item.attribute('text', u'<b>some random text in bold</b> and some random text in normal font')
|
|
>>> item.text
|
|
u'some random text in bold and some random text in normal font'
|
|
|
|
As you can see, this would make any value that you set to the item through the
|
|
``attribute`` method first pass through the ``remove_b_tags`` adaptor, which
|
|
would also replace any matching tag with an empty string.
|
|
|
|
----
|
|
|
|
But anyway, let's now think of a bit more complicated (and useless) example:
|
|
let's say you want to scrape a text, split it into single letters, strip the
|
|
vowels, turn the rest to capital letters, and join them again. In this case,
|
|
we could use three simple adaptors to process our data, plus a customized
|
|
:class:`RobustScrapedItem` for joining single text attributes; let's see an
|
|
example::
|
|
|
|
>>> # First of all, we define the item class we're going to use
|
|
>>> from string import ascii_letters
|
|
>>> from scrapy.contrib.item import RobustScrapedItem
|
|
>>> class MyItem(RobustScrapedItem):
|
|
>>> ATTRIBUTES = {
|
|
>>> 'text': basestring,
|
|
>>> }
|
|
|
|
>>> def _add_single_attributes(self, attrname, attrtype, attributes):
|
|
>>> return ''.join(attributes)
|
|
|
|
>>> # Now we'll write the needed adaptors
|
|
>>> def to_letters(text):
|
|
>>> return tuple(letter for letter in text)
|
|
|
|
>>> def is_vowel(letter):
|
|
>>> if letter in ascii_letters and letter.lower() not in ('a', 'e', 'i', 'o', 'u'):
|
|
>>> return letter
|
|
|
|
>>> def to_upper(letter):
|
|
>>> return letter.upper()
|
|
|
|
>>> # Finally, we'll join all the pieces and see how it works
|
|
>>> item = MyItem()
|
|
>>> item.set_attrib_adaptors('text', [
|
|
>>> to_letters,
|
|
>>> is_vowel,
|
|
>>> to_upper,
|
|
>>> ])
|
|
|
|
Let's now try with an example text to see what happens::
|
|
|
|
>>> item.attribute('text', 'pi', 'wind', add=True)
|
|
>>> item.text
|
|
'PWND'
|
|
|
|
More complex adaptors
|
|
---------------------
|
|
|
|
Now, after using adaptors a bit, you may find yourself in situations where you need
|
|
to use adaptors that receive other parameters from the ``attribute`` method
|
|
apart from the value to adapt.
|
|
|
|
For example, imagine you have an adaptor that removes certain characters from strings
|
|
you provide. Would you make an adaptor for each combination of characters you'd like
|
|
to strip? Of course not!
|
|
|
|
The way to handle this cases, is to make an adaptor that apart from receiving a value,
|
|
as any other adaptor, receives a parameter called ``adaptor_args``.
|
|
It's important that the parameter is called this way, since Scrapy finds out whether
|
|
an adaptor is able to receive extra parameters or not by making instrospection
|
|
and looking for a parameter called this way in the adaptor's parameters list.
|
|
|
|
The information this parameter will receive won't be anything else but the same dictionary
|
|
of keyword arguments that you pass to the ``attribute`` method when calling it.
|
|
|
|
But let's get back to the characters example, how would we implement this?
|
|
Quite simmilar to any other adaptor, let's see::
|
|
|
|
def strip_chars(value, adaptor_args):
|
|
chars = adaptor_args.get('strip_chars', [])
|
|
for char in chars:
|
|
value = value.replace(char, '')
|
|
return value
|
|
|
|
Then, after creating an item and adding the adaptor to one of its pipelines, we could do::
|
|
|
|
>>> item.attribute('text', 'Hi, my name is John', strip_chars=['a', 'i', 'm'])
|
|
>>> item.text
|
|
'H, y ne s John'
|
|
|
|
Debugging
|
|
=========
|
|
|
|
While you're coding spiders and adaptors, you usually need to know exactly what
|
|
does Scrapy do under the hood with the values you provide. There's a setting
|
|
called :setting:``ADAPTORS_DEBUG`` for this purpose that makes Scrapy print
|
|
debugging messages each time an adaptors pipeline is run, specifying which
|
|
attribute is being adapted data for, the input/output values of each adaptor in
|
|
the pipeline, and the input/output of ``_add_single_attributes`` (in some
|
|
cases).
|
|
|
|
You can enable this setting as any other, either by adding it to your settings
|
|
file, or by enabling the environment variable ``SCRAPY_ADAPTORS_DEBUG``.
|