mirror of
https://github.com/scrapy/scrapy.git
synced 2025-02-26 10:04:16 +00:00
--HG-- rename : scrapy/trunk/AUTHORS => AUTHORS rename : scrapy/trunk/INSTALL => INSTALL rename : scrapy/trunk/LICENSE => LICENSE rename : scrapy/trunk/README => README rename : scrapy/trunk/bin/runtests.sh => bin/runtests.sh rename : scrapy/trunk/docs/Makefile => docs/Makefile rename : scrapy/trunk/docs/README => docs/README rename : scrapy/trunk/docs/_ext/scrapydocs.py => docs/_ext/scrapydocs.py rename : scrapy/trunk/docs/_static/items_adaptors-sample1.html => docs/_static/items_adaptors-sample1.html rename : scrapy/trunk/docs/_static/scrapydoc.css => docs/_static/scrapydoc.css rename : scrapy/trunk/docs/_static/selectors-sample1.html => docs/_static/selectors-sample1.html rename : scrapy/trunk/docs/conf.py => docs/conf.py rename : scrapy/trunk/docs/faq.rst => docs/faq.rst rename : scrapy/trunk/docs/index.rst => docs/index.rst rename : scrapy/trunk/docs/intro/index.rst => docs/intro/index.rst rename : scrapy/trunk/docs/intro/install.rst => docs/intro/install.rst rename : scrapy/trunk/docs/intro/overview.rst => docs/intro/overview.rst rename : scrapy/trunk/docs/intro/tutorial.rst => docs/intro/tutorial.rst rename : scrapy/trunk/docs/media/scrapy-architecture.dia => docs/media/scrapy-architecture.dia rename : scrapy/trunk/docs/misc/api-stability.rst => docs/misc/api-stability.rst rename : scrapy/trunk/docs/misc/index.rst => docs/misc/index.rst rename : scrapy/trunk/docs/proposed/_images/scrapy_architecture.odg => docs/proposed/_images/scrapy_architecture.odg rename : scrapy/trunk/docs/proposed/_images/scrapy_architecture.png => docs/proposed/_images/scrapy_architecture.png rename : scrapy/trunk/docs/proposed/index.rst => docs/proposed/index.rst rename : scrapy/trunk/docs/proposed/introduction.rst => docs/proposed/introduction.rst rename : scrapy/trunk/docs/proposed/newitem.rst => docs/proposed/newitem.rst rename : scrapy/trunk/docs/proposed/spiders.rst => docs/proposed/spiders.rst rename : scrapy/trunk/docs/ref/downloader-middleware.rst => docs/ref/downloader-middleware.rst rename : scrapy/trunk/docs/ref/email.rst => docs/ref/email.rst rename : scrapy/trunk/docs/ref/exceptions.rst => docs/ref/exceptions.rst rename : scrapy/trunk/docs/ref/extension-manager.rst => docs/ref/extension-manager.rst rename : scrapy/trunk/docs/ref/extensions.rst => docs/ref/extensions.rst rename : scrapy/trunk/docs/ref/index.rst => docs/ref/index.rst rename : scrapy/trunk/docs/ref/link-extractors.rst => docs/ref/link-extractors.rst rename : scrapy/trunk/docs/ref/logging.rst => docs/ref/logging.rst rename : scrapy/trunk/docs/ref/request-response.rst => docs/ref/request-response.rst rename : scrapy/trunk/docs/ref/selectors.rst => docs/ref/selectors.rst rename : scrapy/trunk/docs/ref/settings.rst => docs/ref/settings.rst rename : scrapy/trunk/docs/ref/signals.rst => docs/ref/signals.rst rename : scrapy/trunk/docs/ref/spiders.rst => docs/ref/spiders.rst rename : scrapy/trunk/docs/topics/_images/adaptors_diagram.png => docs/topics/_images/adaptors_diagram.png rename : scrapy/trunk/docs/topics/_images/adaptors_diagram.svg => docs/topics/_images/adaptors_diagram.svg rename : scrapy/trunk/docs/topics/_images/firebug1.png => docs/topics/_images/firebug1.png rename : scrapy/trunk/docs/topics/_images/firebug2.png => docs/topics/_images/firebug2.png rename : scrapy/trunk/docs/topics/_images/firebug3.png => docs/topics/_images/firebug3.png rename : scrapy/trunk/docs/topics/_images/scrapy_architecture.odg => docs/topics/_images/scrapy_architecture.odg rename : scrapy/trunk/docs/topics/_images/scrapy_architecture.png => docs/topics/_images/scrapy_architecture.png rename : scrapy/trunk/docs/topics/adaptors.rst => docs/topics/adaptors.rst rename : scrapy/trunk/docs/topics/architecture.rst => docs/topics/architecture.rst rename : scrapy/trunk/docs/topics/downloader-middleware.rst => docs/topics/downloader-middleware.rst rename : scrapy/trunk/docs/topics/extensions.rst => docs/topics/extensions.rst rename : scrapy/trunk/docs/topics/firebug.rst => docs/topics/firebug.rst rename : scrapy/trunk/docs/topics/firefox.rst => docs/topics/firefox.rst rename : scrapy/trunk/docs/topics/index.rst => docs/topics/index.rst rename : scrapy/trunk/docs/topics/item-pipeline.rst => docs/topics/item-pipeline.rst rename : scrapy/trunk/docs/topics/items.rst => docs/topics/items.rst rename : scrapy/trunk/docs/topics/link-extractors.rst => docs/topics/link-extractors.rst rename : scrapy/trunk/docs/topics/robotstxt.rst => docs/topics/robotstxt.rst rename : scrapy/trunk/docs/topics/selectors.rst => docs/topics/selectors.rst rename : scrapy/trunk/docs/topics/settings.rst => docs/topics/settings.rst rename : scrapy/trunk/docs/topics/shell.rst => docs/topics/shell.rst rename : scrapy/trunk/docs/topics/spider-middleware.rst => docs/topics/spider-middleware.rst rename : scrapy/trunk/docs/topics/spiders.rst => docs/topics/spiders.rst rename : scrapy/trunk/docs/topics/stats.rst => docs/topics/stats.rst rename : scrapy/trunk/docs/topics/webconsole.rst => docs/topics/webconsole.rst rename : scrapy/trunk/examples/experimental/googledir/googledir/__init__.py => examples/experimental/googledir/googledir/__init__.py rename : scrapy/trunk/examples/experimental/googledir/googledir/items.py => examples/experimental/googledir/googledir/items.py rename : scrapy/trunk/examples/experimental/googledir/googledir/pipelines.py => examples/experimental/googledir/googledir/pipelines.py rename : scrapy/trunk/examples/experimental/googledir/googledir/settings.py => examples/experimental/googledir/googledir/settings.py rename : scrapy/trunk/examples/experimental/googledir/googledir/spiders/__init__.py => examples/experimental/googledir/googledir/spiders/__init__.py rename : scrapy/trunk/examples/experimental/googledir/googledir/spiders/google_directory.py => examples/experimental/googledir/googledir/spiders/google_directory.py rename : scrapy/trunk/examples/experimental/googledir/googledir/templates/spider_basic.tmpl => examples/experimental/googledir/googledir/templates/spider_basic.tmpl rename : scrapy/trunk/examples/experimental/googledir/googledir/templates/spider_crawl.tmpl => examples/experimental/googledir/googledir/templates/spider_crawl.tmpl rename : scrapy/trunk/examples/experimental/googledir/googledir/templates/spider_csvfeed.tmpl => examples/experimental/googledir/googledir/templates/spider_csvfeed.tmpl rename : scrapy/trunk/examples/experimental/googledir/googledir/templates/spider_xmlfeed.tmpl => examples/experimental/googledir/googledir/templates/spider_xmlfeed.tmpl rename : scrapy/trunk/examples/experimental/googledir/scrapy-ctl.py => examples/experimental/googledir/scrapy-ctl.py rename : scrapy/trunk/examples/googledir/googledir/__init__.py => examples/googledir/googledir/__init__.py rename : scrapy/trunk/examples/googledir/googledir/items.py => examples/googledir/googledir/items.py rename : scrapy/trunk/examples/googledir/googledir/pipelines.py => examples/googledir/googledir/pipelines.py rename : scrapy/trunk/examples/googledir/googledir/settings.py => examples/googledir/googledir/settings.py rename : scrapy/trunk/examples/googledir/googledir/spiders/__init__.py => examples/googledir/googledir/spiders/__init__.py rename : scrapy/trunk/examples/googledir/googledir/spiders/google_directory.py => examples/googledir/googledir/spiders/google_directory.py rename : scrapy/trunk/examples/googledir/scrapy-ctl.py => examples/googledir/scrapy-ctl.py rename : scrapy/trunk/extras/sql/scraping.sql => extras/sql/scraping.sql rename : scrapy/trunk/profiling/priorityqueue/pq_classes.py => profiling/priorityqueue/pq_classes.py rename : scrapy/trunk/profiling/priorityqueue/run.py => profiling/priorityqueue/run.py rename : scrapy/trunk/profiling/priorityqueue/test_cases.py => profiling/priorityqueue/test_cases.py rename : scrapy/trunk/scrapy/__init__.py => scrapy/__init__.py rename : scrapy/trunk/scrapy/bin/scrapy-admin.py => scrapy/bin/scrapy-admin.py rename : scrapy/trunk/scrapy/command/__init__.py => scrapy/command/__init__.py rename : scrapy/trunk/scrapy/command/cmdline.py => scrapy/command/cmdline.py rename : scrapy/trunk/scrapy/command/commands/__init__.py => scrapy/command/commands/__init__.py rename : scrapy/trunk/scrapy/command/commands/crawl.py => scrapy/command/commands/crawl.py rename : scrapy/trunk/scrapy/command/commands/download.py => scrapy/command/commands/download.py rename : scrapy/trunk/scrapy/command/commands/genspider.py => scrapy/command/commands/genspider.py rename : scrapy/trunk/scrapy/command/commands/help.py => scrapy/command/commands/help.py rename : scrapy/trunk/scrapy/command/commands/list.py => scrapy/command/commands/list.py rename : scrapy/trunk/scrapy/command/commands/log.py => scrapy/command/commands/log.py rename : scrapy/trunk/scrapy/command/commands/parse.py => scrapy/command/commands/parse.py rename : scrapy/trunk/scrapy/command/commands/shell.py => scrapy/command/commands/shell.py rename : scrapy/trunk/scrapy/command/commands/start.py => scrapy/command/commands/start.py rename : scrapy/trunk/scrapy/command/commands/stats.py => scrapy/command/commands/stats.py rename : scrapy/trunk/scrapy/command/models.py => scrapy/command/models.py rename : scrapy/trunk/scrapy/conf/__init__.py => scrapy/conf/__init__.py rename : scrapy/trunk/scrapy/conf/commands/__init__.py => scrapy/conf/commands/__init__.py rename : scrapy/trunk/scrapy/conf/commands/crawl.py => scrapy/conf/commands/crawl.py rename : scrapy/trunk/scrapy/conf/commands/help.py => scrapy/conf/commands/help.py rename : scrapy/trunk/scrapy/conf/commands/list.py => scrapy/conf/commands/list.py rename : scrapy/trunk/scrapy/conf/commands/log.py => scrapy/conf/commands/log.py rename : scrapy/trunk/scrapy/conf/commands/scrape.py => scrapy/conf/commands/scrape.py rename : scrapy/trunk/scrapy/conf/commands/shell.py => scrapy/conf/commands/shell.py rename : scrapy/trunk/scrapy/conf/commands/stats.py => scrapy/conf/commands/stats.py rename : scrapy/trunk/scrapy/conf/commands/test.py => scrapy/conf/commands/test.py rename : scrapy/trunk/scrapy/conf/default_settings.py => scrapy/conf/default_settings.py rename : scrapy/trunk/scrapy/contrib/__init__.py => scrapy/contrib/__init__.py rename : scrapy/trunk/scrapy/contrib/aws.py => scrapy/contrib/aws.py rename : scrapy/trunk/scrapy/contrib/closedomain.py => scrapy/contrib/closedomain.py rename : scrapy/trunk/scrapy/contrib/cluster/__init__.py => scrapy/contrib/cluster/__init__.py rename : scrapy/trunk/scrapy/contrib/cluster/crawler/__init__.py => scrapy/contrib/cluster/crawler/__init__.py rename : scrapy/trunk/scrapy/contrib/cluster/crawler/manager.py => scrapy/contrib/cluster/crawler/manager.py rename : scrapy/trunk/scrapy/contrib/cluster/hooks/__init__.py => scrapy/contrib/cluster/hooks/__init__.py rename : scrapy/trunk/scrapy/contrib/cluster/hooks/svn.py => scrapy/contrib/cluster/hooks/svn.py rename : scrapy/trunk/scrapy/contrib/cluster/master/__init__.py => scrapy/contrib/cluster/master/__init__.py rename : scrapy/trunk/scrapy/contrib/cluster/master/manager.py => scrapy/contrib/cluster/master/manager.py rename : scrapy/trunk/scrapy/contrib/cluster/master/web.py => scrapy/contrib/cluster/master/web.py rename : scrapy/trunk/scrapy/contrib/cluster/master/ws_api.txt => scrapy/contrib/cluster/master/ws_api.txt rename : scrapy/trunk/scrapy/contrib/cluster/tools/scrapy-cluster-ctl.py => scrapy/contrib/cluster/tools/scrapy-cluster-ctl.py rename : scrapy/trunk/scrapy/contrib/cluster/tools/test-worker.py => scrapy/contrib/cluster/tools/test-worker.py rename : scrapy/trunk/scrapy/contrib/cluster/worker/__init__.py => scrapy/contrib/cluster/worker/__init__.py rename : scrapy/trunk/scrapy/contrib/cluster/worker/manager.py => scrapy/contrib/cluster/worker/manager.py rename : scrapy/trunk/scrapy/contrib/codecs/__init__.py => scrapy/contrib/codecs/__init__.py rename : scrapy/trunk/scrapy/contrib/codecs/x_mac_roman.py => scrapy/contrib/codecs/x_mac_roman.py rename : scrapy/trunk/scrapy/contrib/debug.py => scrapy/contrib/debug.py rename : scrapy/trunk/scrapy/contrib/delayedclosedomain.py => scrapy/contrib/delayedclosedomain.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/__init__.py => scrapy/contrib/downloadermiddleware/__init__.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/cache.py => scrapy/contrib/downloadermiddleware/cache.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/common.py => scrapy/contrib/downloadermiddleware/common.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/cookies.py => scrapy/contrib/downloadermiddleware/cookies.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/debug.py => scrapy/contrib/downloadermiddleware/debug.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/errorpages.py => scrapy/contrib/downloadermiddleware/errorpages.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/httpauth.py => scrapy/contrib/downloadermiddleware/httpauth.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/httpcompression.py => scrapy/contrib/downloadermiddleware/httpcompression.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/redirect.py => scrapy/contrib/downloadermiddleware/redirect.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/retry.py => scrapy/contrib/downloadermiddleware/retry.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/robotstxt.py => scrapy/contrib/downloadermiddleware/robotstxt.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/stats.py => scrapy/contrib/downloadermiddleware/stats.py rename : scrapy/trunk/scrapy/contrib/downloadermiddleware/useragent.py => scrapy/contrib/downloadermiddleware/useragent.py rename : scrapy/trunk/scrapy/contrib/groupsettings.py => scrapy/contrib/groupsettings.py rename : scrapy/trunk/scrapy/contrib/item/__init__.py => scrapy/contrib/item/__init__.py rename : scrapy/trunk/scrapy/contrib/item/models.py => scrapy/contrib/item/models.py rename : scrapy/trunk/scrapy/contrib/itemsampler.py => scrapy/contrib/itemsampler.py rename : scrapy/trunk/scrapy/contrib/link_extractors.py => scrapy/contrib/link_extractors.py rename : scrapy/trunk/scrapy/contrib/memdebug.py => scrapy/contrib/memdebug.py rename : scrapy/trunk/scrapy/contrib/memusage.py => scrapy/contrib/memusage.py rename : scrapy/trunk/scrapy/contrib/pipeline/__init__.py => scrapy/contrib/pipeline/__init__.py rename : scrapy/trunk/scrapy/contrib/pipeline/images.py => scrapy/contrib/pipeline/images.py rename : scrapy/trunk/scrapy/contrib/pipeline/media.py => scrapy/contrib/pipeline/media.py rename : scrapy/trunk/scrapy/contrib/pipeline/s3images.py => scrapy/contrib/pipeline/s3images.py rename : scrapy/trunk/scrapy/contrib/pipeline/show.py => scrapy/contrib/pipeline/show.py rename : scrapy/trunk/scrapy/contrib/prioritizers.py => scrapy/contrib/prioritizers.py rename : scrapy/trunk/scrapy/contrib/response/__init__.py => scrapy/contrib/response/__init__.py rename : scrapy/trunk/scrapy/contrib/response/soup.py => scrapy/contrib/response/soup.py rename : scrapy/trunk/scrapy/contrib/schedulermiddleware/__init__.py => scrapy/contrib/schedulermiddleware/__init__.py rename : scrapy/trunk/scrapy/contrib/schedulermiddleware/duplicatesfilter.py => scrapy/contrib/schedulermiddleware/duplicatesfilter.py rename : scrapy/trunk/scrapy/contrib/spider/__init__.py => scrapy/contrib/spider/__init__.py rename : scrapy/trunk/scrapy/contrib/spider/profiler.py => scrapy/contrib/spider/profiler.py rename : scrapy/trunk/scrapy/contrib/spider/reloader.py => scrapy/contrib/spider/reloader.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/__init__.py => scrapy/contrib/spidermiddleware/__init__.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/depth.py => scrapy/contrib/spidermiddleware/depth.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/duplicatesfilter.py => scrapy/contrib/spidermiddleware/duplicatesfilter.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/limit.py => scrapy/contrib/spidermiddleware/limit.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/offsite.py => scrapy/contrib/spidermiddleware/offsite.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/referer.py => scrapy/contrib/spidermiddleware/referer.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/restrict.py => scrapy/contrib/spidermiddleware/restrict.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/urlfilter.py => scrapy/contrib/spidermiddleware/urlfilter.py rename : scrapy/trunk/scrapy/contrib/spidermiddleware/urllength.py => scrapy/contrib/spidermiddleware/urllength.py rename : scrapy/trunk/scrapy/contrib/spiders/__init__.py => scrapy/contrib/spiders/__init__.py rename : scrapy/trunk/scrapy/contrib/spiders/crawl.py => scrapy/contrib/spiders/crawl.py rename : scrapy/trunk/scrapy/contrib/spiders/feed.py => scrapy/contrib/spiders/feed.py rename : scrapy/trunk/scrapy/contrib/spiders/generic.py => scrapy/contrib/spiders/generic.py rename : scrapy/trunk/scrapy/contrib/web/__init__.py => scrapy/contrib/web/__init__.py rename : scrapy/trunk/scrapy/contrib/web/http.py => scrapy/contrib/web/http.py rename : scrapy/trunk/scrapy/contrib/web/json.py => scrapy/contrib/web/json.py rename : scrapy/trunk/scrapy/contrib/web/service.py => scrapy/contrib/web/service.py rename : scrapy/trunk/scrapy/contrib/web/site.py => scrapy/contrib/web/site.py rename : scrapy/trunk/scrapy/contrib/web/stats.py => scrapy/contrib/web/stats.py rename : scrapy/trunk/scrapy/contrib/webconsole/__init__.py => scrapy/contrib/webconsole/__init__.py rename : scrapy/trunk/scrapy/contrib/webconsole/enginestatus.py => scrapy/contrib/webconsole/enginestatus.py rename : scrapy/trunk/scrapy/contrib/webconsole/livestats.py => scrapy/contrib/webconsole/livestats.py rename : scrapy/trunk/scrapy/contrib/webconsole/scheduler.py => scrapy/contrib/webconsole/scheduler.py rename : scrapy/trunk/scrapy/contrib/webconsole/spiderctl.py => scrapy/contrib/webconsole/spiderctl.py rename : scrapy/trunk/scrapy/contrib/webconsole/spiderstats.py => scrapy/contrib/webconsole/spiderstats.py rename : scrapy/trunk/scrapy/contrib/webconsole/stats.py => scrapy/contrib/webconsole/stats.py rename : scrapy/trunk/scrapy/contrib_exp/__init__.py => scrapy/contrib_exp/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/adaptors/__init__.py => scrapy/contrib_exp/adaptors/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/adaptors/date.py => scrapy/contrib_exp/adaptors/date.py rename : scrapy/trunk/scrapy/contrib_exp/adaptors/extraction.py => scrapy/contrib_exp/adaptors/extraction.py rename : scrapy/trunk/scrapy/contrib_exp/adaptors/markup.py => scrapy/contrib_exp/adaptors/markup.py rename : scrapy/trunk/scrapy/contrib_exp/adaptors/misc.py => scrapy/contrib_exp/adaptors/misc.py rename : scrapy/trunk/scrapy/contrib_exp/downloadermiddleware/__init__.py => scrapy/contrib_exp/downloadermiddleware/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/downloadermiddleware/decompression.py => scrapy/contrib_exp/downloadermiddleware/decompression.py rename : scrapy/trunk/scrapy/contrib_exp/history/__init__.py => scrapy/contrib_exp/history/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/history/history.py => scrapy/contrib_exp/history/history.py rename : scrapy/trunk/scrapy/contrib_exp/history/middleware.py => scrapy/contrib_exp/history/middleware.py rename : scrapy/trunk/scrapy/contrib_exp/history/scheduler.py => scrapy/contrib_exp/history/scheduler.py rename : scrapy/trunk/scrapy/contrib_exp/history/store.py => scrapy/contrib_exp/history/store.py rename : scrapy/trunk/scrapy/contrib_exp/link/__init__.py => scrapy/contrib_exp/link/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/newitem/__init__.py => scrapy/contrib_exp/newitem/__init__.py rename : scrapy/trunk/scrapy/contrib_exp/newitem/adaptors.py => scrapy/contrib_exp/newitem/adaptors.py rename : scrapy/trunk/scrapy/contrib_exp/newitem/fields.py => scrapy/contrib_exp/newitem/fields.py rename : scrapy/trunk/scrapy/contrib_exp/newitem/models.py => scrapy/contrib_exp/newitem/models.py rename : scrapy/trunk/scrapy/contrib_exp/pipeline/shoveitem.py => scrapy/contrib_exp/pipeline/shoveitem.py rename : scrapy/trunk/scrapy/core/__init__.py => scrapy/core/__init__.py rename : scrapy/trunk/scrapy/core/downloader/__init__.py => scrapy/core/downloader/__init__.py rename : scrapy/trunk/scrapy/core/downloader/dnscache.py => scrapy/core/downloader/dnscache.py rename : scrapy/trunk/scrapy/core/downloader/handlers.py => scrapy/core/downloader/handlers.py rename : scrapy/trunk/scrapy/core/downloader/manager.py => scrapy/core/downloader/manager.py rename : scrapy/trunk/scrapy/core/downloader/middleware.py => scrapy/core/downloader/middleware.py rename : scrapy/trunk/scrapy/core/downloader/responsetypes/__init__.py => scrapy/core/downloader/responsetypes/__init__.py rename : scrapy/trunk/scrapy/core/downloader/responsetypes/mime.types => scrapy/core/downloader/responsetypes/mime.types rename : scrapy/trunk/scrapy/core/downloader/webclient.py => scrapy/core/downloader/webclient.py rename : scrapy/trunk/scrapy/core/engine.py => scrapy/core/engine.py rename : scrapy/trunk/scrapy/core/exceptions.py => scrapy/core/exceptions.py rename : scrapy/trunk/scrapy/core/manager.py => scrapy/core/manager.py rename : scrapy/trunk/scrapy/core/prioritizers.py => scrapy/core/prioritizers.py rename : scrapy/trunk/scrapy/core/scheduler/__init__.py => scrapy/core/scheduler/__init__.py rename : scrapy/trunk/scrapy/core/scheduler/middleware.py => scrapy/core/scheduler/middleware.py rename : scrapy/trunk/scrapy/core/scheduler/schedulers.py => scrapy/core/scheduler/schedulers.py rename : scrapy/trunk/scrapy/core/scheduler/store.py => scrapy/core/scheduler/store.py rename : scrapy/trunk/scrapy/core/signals.py => scrapy/core/signals.py rename : scrapy/trunk/scrapy/dupefilter/__init__.py => scrapy/dupefilter/__init__.py rename : scrapy/trunk/scrapy/extension/__init__.py => scrapy/extension/__init__.py rename : scrapy/trunk/scrapy/fetcher/__init__.py => scrapy/fetcher/__init__.py rename : scrapy/trunk/scrapy/http/__init__.py => scrapy/http/__init__.py rename : scrapy/trunk/scrapy/http/cookies.py => scrapy/http/cookies.py rename : scrapy/trunk/scrapy/http/headers.py => scrapy/http/headers.py rename : scrapy/trunk/scrapy/http/request/__init__.py => scrapy/http/request/__init__.py rename : scrapy/trunk/scrapy/http/request/form.py => scrapy/http/request/form.py rename : scrapy/trunk/scrapy/http/request/rpc.py => scrapy/http/request/rpc.py rename : scrapy/trunk/scrapy/http/response/__init__.py => scrapy/http/response/__init__.py rename : scrapy/trunk/scrapy/http/response/html.py => scrapy/http/response/html.py rename : scrapy/trunk/scrapy/http/response/text.py => scrapy/http/response/text.py rename : scrapy/trunk/scrapy/http/response/xml.py => scrapy/http/response/xml.py rename : scrapy/trunk/scrapy/http/url.py => scrapy/http/url.py rename : scrapy/trunk/scrapy/item/__init__.py => scrapy/item/__init__.py rename : scrapy/trunk/scrapy/item/adaptors.py => scrapy/item/adaptors.py rename : scrapy/trunk/scrapy/item/models.py => scrapy/item/models.py rename : scrapy/trunk/scrapy/item/pipeline.py => scrapy/item/pipeline.py rename : scrapy/trunk/scrapy/link/__init__.py => scrapy/link/__init__.py rename : scrapy/trunk/scrapy/link/extractors.py => scrapy/link/extractors.py rename : scrapy/trunk/scrapy/log/__init__.py => scrapy/log/__init__.py rename : scrapy/trunk/scrapy/mail/__init__.py => scrapy/mail/__init__.py rename : scrapy/trunk/scrapy/management/__init__.py => scrapy/management/__init__.py rename : scrapy/trunk/scrapy/management/telnet.py => scrapy/management/telnet.py rename : scrapy/trunk/scrapy/management/web.py => scrapy/management/web.py rename : scrapy/trunk/scrapy/patches/__init__.py => scrapy/patches/__init__.py rename : scrapy/trunk/scrapy/patches/monkeypatches.py => scrapy/patches/monkeypatches.py rename : scrapy/trunk/scrapy/spider/__init__.py => scrapy/spider/__init__.py rename : scrapy/trunk/scrapy/spider/manager.py => scrapy/spider/manager.py rename : scrapy/trunk/scrapy/spider/middleware.py => scrapy/spider/middleware.py rename : scrapy/trunk/scrapy/spider/models.py => scrapy/spider/models.py rename : scrapy/trunk/scrapy/stats/__init__.py => scrapy/stats/__init__.py rename : scrapy/trunk/scrapy/stats/corestats.py => scrapy/stats/corestats.py rename : scrapy/trunk/scrapy/stats/statscollector.py => scrapy/stats/statscollector.py rename : scrapy/trunk/scrapy/store/__init__.py => scrapy/store/__init__.py rename : scrapy/trunk/scrapy/store/db.py => scrapy/store/db.py rename : scrapy/trunk/scrapy/templates/project/module/__init__.py => scrapy/templates/project/module/__init__.py rename : scrapy/trunk/scrapy/templates/project/module/items.py.tmpl => scrapy/templates/project/module/items.py.tmpl rename : scrapy/trunk/scrapy/templates/project/module/pipelines.py.tmpl => scrapy/templates/project/module/pipelines.py.tmpl rename : scrapy/trunk/scrapy/templates/project/module/settings.py.tmpl => scrapy/templates/project/module/settings.py.tmpl rename : scrapy/trunk/scrapy/templates/project/module/spiders/__init__.py => scrapy/templates/project/module/spiders/__init__.py rename : scrapy/trunk/scrapy/templates/project/module/templates/spider_basic.tmpl => scrapy/templates/project/module/templates/spider_basic.tmpl rename : scrapy/trunk/scrapy/templates/project/module/templates/spider_crawl.tmpl => scrapy/templates/project/module/templates/spider_crawl.tmpl rename : scrapy/trunk/scrapy/templates/project/module/templates/spider_csvfeed.tmpl => scrapy/templates/project/module/templates/spider_csvfeed.tmpl rename : scrapy/trunk/scrapy/templates/project/module/templates/spider_xmlfeed.tmpl => scrapy/templates/project/module/templates/spider_xmlfeed.tmpl rename : scrapy/trunk/scrapy/templates/project/root/scrapy-ctl.py => scrapy/templates/project/root/scrapy-ctl.py rename : scrapy/trunk/scrapy/tests/__init__.py => scrapy/tests/__init__.py rename : scrapy/trunk/scrapy/tests/run.py => scrapy/tests/run.py rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/enc-ascii.html => scrapy/tests/sample_data/adaptors/enc-ascii.html rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/enc-cp1252.html => scrapy/tests/sample_data/adaptors/enc-cp1252.html rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/enc-latin1.html => scrapy/tests/sample_data/adaptors/enc-latin1.html rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/enc-utf8-meta-latin1.html => scrapy/tests/sample_data/adaptors/enc-utf8-meta-latin1.html rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/enc-utf8.html => scrapy/tests/sample_data/adaptors/enc-utf8.html rename : scrapy/trunk/scrapy/tests/sample_data/adaptors/extr_unquoted.xml => scrapy/tests/sample_data/adaptors/extr_unquoted.xml rename : scrapy/trunk/scrapy/tests/sample_data/compressed/feed-sample1.tar => scrapy/tests/sample_data/compressed/feed-sample1.tar rename : scrapy/trunk/scrapy/tests/sample_data/compressed/feed-sample1.xml => scrapy/tests/sample_data/compressed/feed-sample1.xml rename : scrapy/trunk/scrapy/tests/sample_data/compressed/feed-sample1.xml.bz2 => scrapy/tests/sample_data/compressed/feed-sample1.xml.bz2 rename : scrapy/trunk/scrapy/tests/sample_data/compressed/feed-sample1.xml.gz => scrapy/tests/sample_data/compressed/feed-sample1.xml.gz rename : scrapy/trunk/scrapy/tests/sample_data/compressed/feed-sample1.zip => scrapy/tests/sample_data/compressed/feed-sample1.zip rename : scrapy/trunk/scrapy/tests/sample_data/compressed/html-gzip.bin => scrapy/tests/sample_data/compressed/html-gzip.bin rename : scrapy/trunk/scrapy/tests/sample_data/compressed/html-rawdeflate.bin => scrapy/tests/sample_data/compressed/html-rawdeflate.bin rename : scrapy/trunk/scrapy/tests/sample_data/compressed/html-zlibdeflate.bin => scrapy/tests/sample_data/compressed/html-zlibdeflate.bin rename : scrapy/trunk/scrapy/tests/sample_data/feeds/feed-sample1.xml => scrapy/tests/sample_data/feeds/feed-sample1.xml rename : scrapy/trunk/scrapy/tests/sample_data/feeds/feed-sample2.xml => scrapy/tests/sample_data/feeds/feed-sample2.xml rename : scrapy/trunk/scrapy/tests/sample_data/feeds/feed-sample3.csv => scrapy/tests/sample_data/feeds/feed-sample3.csv rename : scrapy/trunk/scrapy/tests/sample_data/feeds/feed-sample4.csv => scrapy/tests/sample_data/feeds/feed-sample4.csv rename : scrapy/trunk/scrapy/tests/sample_data/feeds/feed-sample5.csv => scrapy/tests/sample_data/feeds/feed-sample5.csv rename : scrapy/trunk/scrapy/tests/sample_data/link_extractor/image_linkextractor.html => scrapy/tests/sample_data/link_extractor/image_linkextractor.html rename : scrapy/trunk/scrapy/tests/sample_data/link_extractor/linkextractor_latin1.html => scrapy/tests/sample_data/link_extractor/linkextractor_latin1.html rename : scrapy/trunk/scrapy/tests/sample_data/link_extractor/linkextractor_noenc.html => scrapy/tests/sample_data/link_extractor/linkextractor_noenc.html rename : scrapy/trunk/scrapy/tests/sample_data/link_extractor/regex_linkextractor.html => scrapy/tests/sample_data/link_extractor/regex_linkextractor.html rename : scrapy/trunk/scrapy/tests/sample_data/test_site/index.html => scrapy/tests/sample_data/test_site/index.html rename : scrapy/trunk/scrapy/tests/sample_data/test_site/item1.html => scrapy/tests/sample_data/test_site/item1.html rename : scrapy/trunk/scrapy/tests/sample_data/test_site/item2.html => scrapy/tests/sample_data/test_site/item2.html rename : scrapy/trunk/scrapy/tests/test_adaptors.py => scrapy/tests/test_adaptors.py rename : scrapy/trunk/scrapy/tests/test_aws.py => scrapy/tests/test_aws.py rename : scrapy/trunk/scrapy/tests/test_c14nurls.py => scrapy/tests/test_c14nurls.py rename : scrapy/trunk/scrapy/tests/test_contrib_response_soup.py => scrapy/tests/test_contrib_response_soup.py rename : scrapy/trunk/scrapy/tests/test_dependencies.py => scrapy/tests/test_dependencies.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_cookies.py => scrapy/tests/test_downloadermiddleware_cookies.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_decompression.py => scrapy/tests/test_downloadermiddleware_decompression.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_httpcompression.py => scrapy/tests/test_downloadermiddleware_httpcompression.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_redirect.py => scrapy/tests/test_downloadermiddleware_redirect.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_retry.py => scrapy/tests/test_downloadermiddleware_retry.py rename : scrapy/trunk/scrapy/tests/test_downloadermiddleware_useragent.py => scrapy/tests/test_downloadermiddleware_useragent.py rename : scrapy/trunk/scrapy/tests/test_dupefilter.py => scrapy/tests/test_dupefilter.py rename : scrapy/trunk/scrapy/tests/test_engine.py => scrapy/tests/test_engine.py rename : scrapy/trunk/scrapy/tests/test_http_cookies.py => scrapy/tests/test_http_cookies.py rename : scrapy/trunk/scrapy/tests/test_http_headers.py => scrapy/tests/test_http_headers.py rename : scrapy/trunk/scrapy/tests/test_http_request.py => scrapy/tests/test_http_request.py rename : scrapy/trunk/scrapy/tests/test_http_response.py => scrapy/tests/test_http_response.py rename : scrapy/trunk/scrapy/tests/test_http_url.py => scrapy/tests/test_http_url.py rename : scrapy/trunk/scrapy/tests/test_item.py => scrapy/tests/test_item.py rename : scrapy/trunk/scrapy/tests/test_itemadaptor.py => scrapy/tests/test_itemadaptor.py rename : scrapy/trunk/scrapy/tests/test_libxml2.py => scrapy/tests/test_libxml2.py rename : scrapy/trunk/scrapy/tests/test_link.py => scrapy/tests/test_link.py rename : scrapy/trunk/scrapy/tests/test_newitem.py => scrapy/tests/test_newitem.py rename : scrapy/trunk/scrapy/tests/test_pipeline_images.py => scrapy/tests/test_pipeline_images.py rename : scrapy/trunk/scrapy/tests/test_responsetypes.py => scrapy/tests/test_responsetypes.py rename : scrapy/trunk/scrapy/tests/test_robustscrapeditem.py => scrapy/tests/test_robustscrapeditem.py rename : scrapy/trunk/scrapy/tests/test_schedulermiddleware_duplicatesfilter.py => scrapy/tests/test_schedulermiddleware_duplicatesfilter.py rename : scrapy/trunk/scrapy/tests/test_serialization.py => scrapy/tests/test_serialization.py rename : scrapy/trunk/scrapy/tests/test_spidermiddleware_duplicatesfilter.py => scrapy/tests/test_spidermiddleware_duplicatesfilter.py rename : scrapy/trunk/scrapy/tests/test_spidermonkey.py => scrapy/tests/test_spidermonkey.py rename : scrapy/trunk/scrapy/tests/test_spiders/__init__.py => scrapy/tests/test_spiders/__init__.py rename : scrapy/trunk/scrapy/tests/test_spiders/testspider.py => scrapy/tests/test_spiders/testspider.py rename : scrapy/trunk/scrapy/tests/test_stats.py => scrapy/tests/test_stats.py rename : scrapy/trunk/scrapy/tests/test_storedb.py => scrapy/tests/test_storedb.py rename : scrapy/trunk/scrapy/tests/test_utils_datatypes.py => scrapy/tests/test_utils_datatypes.py rename : scrapy/trunk/scrapy/tests/test_utils_defer.py => scrapy/tests/test_utils_defer.py rename : scrapy/trunk/scrapy/tests/test_utils_iterators.py => scrapy/tests/test_utils_iterators.py rename : scrapy/trunk/scrapy/tests/test_utils_markup.py => scrapy/tests/test_utils_markup.py rename : scrapy/trunk/scrapy/tests/test_utils_misc.py => scrapy/tests/test_utils_misc.py rename : scrapy/trunk/scrapy/tests/test_utils_python.py => scrapy/tests/test_utils_python.py rename : scrapy/trunk/scrapy/tests/test_utils_request.py => scrapy/tests/test_utils_request.py rename : scrapy/trunk/scrapy/tests/test_utils_response.py => scrapy/tests/test_utils_response.py rename : scrapy/trunk/scrapy/tests/test_utils_url.py => scrapy/tests/test_utils_url.py rename : scrapy/trunk/scrapy/tests/test_webclient.py => scrapy/tests/test_webclient.py rename : scrapy/trunk/scrapy/tests/test_xpath.py => scrapy/tests/test_xpath.py rename : scrapy/trunk/scrapy/tests/test_xpath_extension.py => scrapy/tests/test_xpath_extension.py rename : scrapy/trunk/scrapy/utils/__init__.py => scrapy/utils/__init__.py rename : scrapy/trunk/scrapy/utils/c14n.py => scrapy/utils/c14n.py rename : scrapy/trunk/scrapy/utils/datatypes.py => scrapy/utils/datatypes.py rename : scrapy/trunk/scrapy/utils/db.py => scrapy/utils/db.py rename : scrapy/trunk/scrapy/utils/defer.py => scrapy/utils/defer.py rename : scrapy/trunk/scrapy/utils/display.py => scrapy/utils/display.py rename : scrapy/trunk/scrapy/utils/http.py => scrapy/utils/http.py rename : scrapy/trunk/scrapy/utils/iterators.py => scrapy/utils/iterators.py rename : scrapy/trunk/scrapy/utils/markup.py => scrapy/utils/markup.py rename : scrapy/trunk/scrapy/utils/misc.py => scrapy/utils/misc.py rename : scrapy/trunk/scrapy/utils/python.py => scrapy/utils/python.py rename : scrapy/trunk/scrapy/utils/request.py => scrapy/utils/request.py rename : scrapy/trunk/scrapy/utils/response.py => scrapy/utils/response.py rename : scrapy/trunk/scrapy/utils/serialization.py => scrapy/utils/serialization.py rename : scrapy/trunk/scrapy/utils/test.py => scrapy/utils/test.py rename : scrapy/trunk/scrapy/utils/url.py => scrapy/utils/url.py rename : scrapy/trunk/scrapy/xlib/BeautifulSoup.py => scrapy/xlib/BeautifulSoup.py rename : scrapy/trunk/scrapy/xlib/ClientForm.py => scrapy/xlib/ClientForm.py rename : scrapy/trunk/scrapy/xlib/__init__.py => scrapy/xlib/__init__.py rename : scrapy/trunk/scrapy/xlib/lrucache.py => scrapy/xlib/lrucache.py rename : scrapy/trunk/scrapy/xlib/lsprofcalltree.py => scrapy/xlib/lsprofcalltree.py rename : scrapy/trunk/scrapy/xlib/pydispatch/__init__.py => scrapy/xlib/pydispatch/__init__.py rename : scrapy/trunk/scrapy/xlib/pydispatch/dispatcher.py => scrapy/xlib/pydispatch/dispatcher.py rename : scrapy/trunk/scrapy/xlib/pydispatch/errors.py => scrapy/xlib/pydispatch/errors.py rename : scrapy/trunk/scrapy/xlib/pydispatch/license.txt => scrapy/xlib/pydispatch/license.txt rename : scrapy/trunk/scrapy/xlib/pydispatch/robust.py => scrapy/xlib/pydispatch/robust.py rename : scrapy/trunk/scrapy/xlib/pydispatch/robustapply.py => scrapy/xlib/pydispatch/robustapply.py rename : scrapy/trunk/scrapy/xlib/pydispatch/saferef.py => scrapy/xlib/pydispatch/saferef.py rename : scrapy/trunk/scrapy/xlib/spidermonkey/INSTALL.scrapy => scrapy/xlib/spidermonkey/INSTALL.scrapy rename : scrapy/trunk/scrapy/xlib/spidermonkey/__init__.py => scrapy/xlib/spidermonkey/__init__.py rename : scrapy/trunk/scrapy/xlib/spidermonkey/sm_settings.py => scrapy/xlib/spidermonkey/sm_settings.py rename : scrapy/trunk/scrapy/xlib/spidermonkey/spidermonkey.py => scrapy/xlib/spidermonkey/spidermonkey.py rename : scrapy/trunk/scrapy/xpath/__init__.py => scrapy/xpath/__init__.py rename : scrapy/trunk/scrapy/xpath/constructors.py => scrapy/xpath/constructors.py rename : scrapy/trunk/scrapy/xpath/document.py => scrapy/xpath/document.py rename : scrapy/trunk/scrapy/xpath/extension.py => scrapy/xpath/extension.py rename : scrapy/trunk/scrapy/xpath/selector.py => scrapy/xpath/selector.py rename : scrapy/trunk/scrapy/xpath/types.py => scrapy/xpath/types.py rename : scrapy/trunk/scripts/rpm-install.sh => scripts/rpm-install.sh rename : scrapy/trunk/setup.cfg => setup.cfg rename : scrapy/trunk/setup.py => setup.py
533 lines
21 KiB
ReStructuredText
533 lines
21 KiB
ReStructuredText
.. _ref-request-response:
|
|
|
|
============================
|
|
Request and Response objects
|
|
============================
|
|
|
|
.. module:: scrapy.http
|
|
:synopsis: Request and Response classes
|
|
|
|
Quick overview
|
|
==============
|
|
|
|
Scrapy uses :class:`Request` and :class:`Response` objects for crawling web
|
|
sites.
|
|
|
|
Typically, :class:`Request` objects are generated in the spiders and pass
|
|
across the system until they reach the Downloader, which executes the request
|
|
and returns a :class:`Response` object which travels back to the spider that
|
|
issued the request.
|
|
|
|
Both :class:`Request` and :class:`Response` classes have subclasses which adds
|
|
additional functionality not required in the base classes. These are described
|
|
below in :ref:`ref-request-subclasses` and :ref:`ref-response-subclasses`.
|
|
|
|
Request objects
|
|
===============
|
|
|
|
.. class:: Request(url[, callback, method='GET', body, headers, cookies, meta, encoding='utf-8', dont_filter=False, errback])
|
|
|
|
A :class:`Request` object represents an HTTP request, which is usually
|
|
generated in the Spider and executed by the Downloader, and thus generating
|
|
a :class:`Response`.
|
|
|
|
:param url: the URL of this request
|
|
:type url: string
|
|
|
|
:param callback: the function that will be called with the response of this
|
|
request (once its downloaded) as its first parameter. For more information
|
|
see :ref:`ref-request-callback-arguments` below.
|
|
:type callback: callable
|
|
|
|
:param method: the HTTP method of this request. Defaults to ``'GET'``.
|
|
:type method: string
|
|
|
|
:param meta: the initial values for the :attr:`Request.meta` attribute. If
|
|
given, the dict passed in this parameter will be shallow copied.
|
|
:type meta: dict
|
|
|
|
:param body: the request body. If a ``unicode`` is passed, then it's encoded to
|
|
``str`` using the `encoding` passed (which defaults to ``utf-8``). If
|
|
``body`` is not given,, an empty string is stored. Regardless of the
|
|
type of this argument, the final value stored will be a ``str``` (never
|
|
``unicode`` or ``None``).
|
|
:type body: str or unicode
|
|
|
|
:param headers: the headers of this request. The dict values can be strings
|
|
(for single valued headers) or lists (for multi-valued headers).
|
|
:type headers: dict
|
|
|
|
:param cookies: the request cookies. Example::
|
|
|
|
request_with_cookies = Request(url="http://www.example.com",
|
|
cookies={currency: 'USD', country: 'UY'})
|
|
|
|
When some site returns cookies (in a response) those are stored in the
|
|
cookies for that domain and will be sent again in future requests. That's
|
|
the typical behaviour of any regular web browser. However, if, for some
|
|
reason, you want to avoid merging with existing cookies you can instruct
|
|
Scrapy to do so by setting the ``dont_merge_cookies`` item in the
|
|
:attr:`Request.meta`.
|
|
|
|
Example of request without merging cookies::
|
|
|
|
request_with_cookies = Request(url="http://www.example.com",
|
|
cookies={currency: 'USD', country: 'UY'},
|
|
meta={'dont_merge_cookies': True})
|
|
:type cookies: dict
|
|
|
|
:param encoding: the encoding of this request (defaults to ``'utf-8'``).
|
|
This encoding will be used to percent-encode the URL and to convert the
|
|
body to ``str`` (if given as ``unicode``).
|
|
:type encoding: string
|
|
|
|
:param dont_filter: indicates that this request should not be filtered by
|
|
the scheduler. This is used when you want to perform an identical
|
|
request multiple times, to ignore the duplicates filter. Use it with
|
|
care, or you will get into crawling loops. Default to ``False``.
|
|
:type dont_filter: boolean
|
|
|
|
:param errback: a function that will be called if any exception was
|
|
raised while processing the request. This includes pages that failed
|
|
with 404 HTTP errors and such. It receives a `Twisted Failure`_ instance
|
|
as first parameter.
|
|
:type errback: callable
|
|
|
|
.. _Twisted Failure: http://twistedmatrix.com/documents/8.2.0/api/twisted.python.failure.Failure.html
|
|
|
|
.. attribute:: Request.url
|
|
|
|
A string containing the URL of this request. Keep in mind that this
|
|
attribute contains the escaped URL, so it can differ from the URL passed in
|
|
the constructor.
|
|
|
|
.. attribute:: Request.method
|
|
|
|
A string representing the HTTP method in the request. This is guaranteed to
|
|
be uppercase. Example: ``"GET"``, ``"POST"``, ``"PUT"``, etc
|
|
|
|
.. attribute:: Request.headers
|
|
|
|
A dictionary-like object which contains the request headers.
|
|
|
|
.. attribute:: Request.body
|
|
|
|
A str that contains the request body
|
|
|
|
.. attribute:: Request.meta
|
|
|
|
A dict that contains arbitrary metadata for this request. This dict is
|
|
empty for new Requests, and is usually populated by different Scrapy
|
|
components (extensions, middlewares, etc). So the data contained in this
|
|
dict depends on the extensions you have enabled.
|
|
|
|
This dict is `shallow copied`_ when the request is cloned using the
|
|
``copy()`` or ``replace()`` methods.
|
|
|
|
.. _shallow copied: http://docs.python.org/library/copy.html
|
|
|
|
.. attribute:: Request.cache
|
|
|
|
A dict that contains arbitrary cached data for this request. This dict is
|
|
empty for new Requests, and is usually populated by different Scrapy
|
|
components (extensions, middlewares, etc) to avoid duplicate processing. So
|
|
the data contained in this dict depends on the extensions you have enabled.
|
|
|
|
Unlike the ``meta`` attribute, this dict is not copied at all when the
|
|
request is cloned using the ``copy()`` or ``replace()`` methods.
|
|
|
|
.. method:: Request.copy()
|
|
|
|
Return a new Request which is a copy of this Request. The attribute
|
|
:attr:`Request.meta` is copied, while :attr:`Request.cache` is not. See also
|
|
:ref:`ref-request-callback-arguments`.
|
|
|
|
.. method:: Request.replace([url, callback, method, headers, body, cookies, meta, encoding, dont_filter])
|
|
|
|
Return a Request object with the same members, except for those members
|
|
given new values by whichever keyword arguments are specified. The attribute
|
|
:attr:`Request.meta` is copied by default (unless a new value is given
|
|
in the ``meta`` argument). The :attr:`Request.cache` attribute is always
|
|
cleared. See also :ref:`ref-request-callback-arguments`.
|
|
|
|
.. method:: Request.httprepr()
|
|
|
|
Return a string with the raw HTTP representation of this response.
|
|
|
|
.. _ref-request-callback-copy:
|
|
|
|
Caveats with copying Requests and callbacks
|
|
-------------------------------------------
|
|
|
|
When you copy a request using the :meth:`Request.copy` or
|
|
:meth:`Request.replace` methods the callback of the request is not copied by
|
|
default. This is because of legacy reasons along with limitations in the
|
|
underlying network library, which doesn't allow sharing `Twisted deferreds`_.
|
|
|
|
.. _Twisted deferreds: http://twistedmatrix.com/projects/core/documentation/howto/defer.html
|
|
|
|
For example::
|
|
|
|
request = Request("http://www.example.com", callback=myfunc)
|
|
request2 = request.copy() # doesn't copy the callback
|
|
request3 = request.replace(callback=request.callback)
|
|
|
|
In the above example, ``request2`` is a copy of ``request`` but it has no
|
|
callback, while ``request3`` is a copy of ``request`` and also contains the
|
|
callback.
|
|
|
|
.. _ref-request-callback-arguments:
|
|
|
|
Passing arguments to callback functions
|
|
---------------------------------------
|
|
|
|
The callback of a request is a function that will be called when the response
|
|
of that request is downloaded. The callback function will be called with the
|
|
:class:`Response` object downloaded as its first argument.
|
|
|
|
Example::
|
|
|
|
def parse_page1(self, response):
|
|
request = Request("http://www.example.com/some_page.html",
|
|
callback=self.parse_page2)
|
|
|
|
def parse_page2(self, response):
|
|
# this would log http://www.example.com/some_page.html
|
|
self.log("Visited %s" % response.url)
|
|
|
|
In some cases you may be interested in passing arguments to those callback
|
|
functions so you can receive those arguments later, when the response is
|
|
downloaded. There are two ways for doing this:
|
|
|
|
1. using a lambda function (or any other function/callable)
|
|
|
|
2. using the :attr:`Request.meta` attribute.
|
|
|
|
Here's an example of logging the referer URL of each page using each mechanism.
|
|
Keep in mind, however, that the referer URL could be accessed easier via
|
|
``response.request.url``).
|
|
|
|
Using lambda function::
|
|
|
|
def parse_page1(self, response):
|
|
myarg = response.url
|
|
request = Request("http://www.example.com/some_page.html",
|
|
callback=lambda r: self.parse_page2(r, myarg))
|
|
|
|
def parse_page2(self, response, referer_url):
|
|
self.log("Visited page %s from %s" % (response.url, referer_url))
|
|
|
|
Using Request.meta::
|
|
|
|
def parse_page1(self, response):
|
|
request = Request("http://www.example.com/some_page.html",
|
|
callback=self.parse_page2)
|
|
request.meta['referer_url'] = response.url
|
|
|
|
def parse_page2(self, response):
|
|
referer_url = response.request.meta['referer_url']
|
|
self.log("Visited page %s from %s" % (response.url, referer_url))
|
|
|
|
.. _ref-request-subclasses:
|
|
|
|
Request subclasses
|
|
==================
|
|
|
|
Here is the list of built-in :class:`Request` subclasses. You can also subclass
|
|
it to implement your own custom functionality.
|
|
|
|
FormRequest objects
|
|
-------------------
|
|
|
|
The FormRequest class extends the base :class:`Request` with functionality for
|
|
dealing with HTML forms. It uses the `ClientForm`_ library (bundled with
|
|
Scrapy) to pre-populate form fields with form data from :class:`Response`
|
|
objects.
|
|
|
|
.. _ClientForm: http://wwwsearch.sourceforge.net/ClientForm/
|
|
|
|
.. class:: FormRequest(url, [formdata, ...])
|
|
|
|
The :class:`FormRequest` class adds a new argument to the constructor. The
|
|
remaining arguments are the same as for the :class:`Request` class and are
|
|
not documented here.
|
|
|
|
:param formdata: is a dictionary (or iterable of (key, value) tuples)
|
|
containing HTML Form data which will be url-encoded and assigned to the
|
|
body of the request.
|
|
:type formdata: dict or iterable of tuples
|
|
|
|
The :class:`FormRequest` objects support the following class method in
|
|
addition to the standard :class:`Request` methods:
|
|
|
|
.. classmethod:: FormRequest.from_response(response, [formnumber=0, formdata, ...])
|
|
|
|
Returns a new :class:`FormRequest` object with its form field values
|
|
pre-populated with those found in the HTML ``<form>`` element contained
|
|
in the given response. For an example see :ref:`ref-request-userlogin`.
|
|
|
|
|
|
:param response: the response containing a HTML form which will be used
|
|
to pre-populate the form fields
|
|
:type response: :class:`Response` object
|
|
|
|
:param formnumber: the number of form to use, when the response contains
|
|
multiple forms. The first one (and also the default) is ``0``.
|
|
:type formnumber: integer
|
|
|
|
:param formdata: fields to override in the form data. If a field was
|
|
already present in the response ``<form>`` element, its value is
|
|
overridden by the one passed in this parameter.
|
|
:type formdata: dict
|
|
|
|
The other parameters of this class method are passed directly to the
|
|
:class:`FormRequest` constructor.
|
|
|
|
|
|
Request usage examples
|
|
======================
|
|
|
|
Using FormRequest to send data via HTTP POST
|
|
--------------------------------------------
|
|
|
|
If you want to simulate a HTML Form POST in your spider, and send a couple of
|
|
key-value fields you could return a :class:`FormRequest` object (from your
|
|
spider) like this::
|
|
|
|
return [FormRequest(url="http://www.example.com/post/action",
|
|
formdata={'name': 'John Doe', age: '27'},
|
|
callback=self.after_post)]
|
|
|
|
.. _ref-request-userlogin:
|
|
|
|
Using FormRequest.from_response() to simulate a user login
|
|
----------------------------------------------------------
|
|
|
|
It is usual for web sites to provide pre-populated form fields through ``<input
|
|
type="hidden">`` elements, such as session related data or authentication
|
|
tokens (for login pages). When scraping, you'll want these fields to be
|
|
automatically pre-populated and only override a couple of them, such as the
|
|
user name and password. You can use the :meth:`FormRequest.from_response`
|
|
method for this job. Here's an example spider which uses it::
|
|
|
|
class LoginSpider(BaseSpider):
|
|
domain_name = 'example.com'
|
|
start_urls = ['http://www.example.com/users/login.php']
|
|
|
|
def parse(self, response):
|
|
return [FormRequest.from_response(response,
|
|
formdata={'username': 'john', 'password': 'secret'},
|
|
callback=self.after_login)]
|
|
|
|
def after_login(self, response):
|
|
# check login succeed before going on
|
|
if "authentication failed" in response.body:
|
|
self.log("Login failed", level=log.ERROR)
|
|
return
|
|
|
|
# continue scraping with authenticated session...
|
|
|
|
|
|
Response objects
|
|
================
|
|
|
|
.. class:: Response(url, [status=200, headers, body, meta, flags])
|
|
|
|
A :class:`Response` object represents an HTTP response, which is usually
|
|
downloaded (by the Downloader) and fed to the Spiders for processing.
|
|
|
|
:param url: the URL of this response
|
|
:type url: string
|
|
|
|
:param headers: the headers of this response. The dict values can be strings
|
|
(for single valued headers) or lists (for multi-valued headers).
|
|
:type headers: dict
|
|
|
|
:param status: the HTTP status of the response. Defaults to ``200``.
|
|
:type status: integer
|
|
|
|
:param body: the response body. It must be str, not unicode, unless you're
|
|
using a encoding-aware :ref:`Response sublcass <ref-response-subclasses>`,
|
|
such as :class:`TextResponse`.
|
|
:type body: str
|
|
|
|
:param meta: the initial values for the :attr:`Response.meta` attribute. If
|
|
given, the dict will be shallow copied.
|
|
:type meta: dict
|
|
|
|
:param flags: is a list containing the initial values for the
|
|
:attr:`Response.flags` attribute. If given, the list will be shallow
|
|
copied.
|
|
:type flags: list
|
|
|
|
.. attribute:: Response.url
|
|
|
|
A string containing the URL of the response.
|
|
|
|
.. attribute:: Response.status
|
|
|
|
An integer representing the HTTP status of the response. Example: ``200``,
|
|
``404``.
|
|
|
|
.. attribute:: Response.headers
|
|
|
|
A dictionary-like object which contains the response headers.
|
|
|
|
.. attribute:: Response.body
|
|
|
|
A str containing the body of this Response. Keep in mind that Reponse.body
|
|
is always a str. If you want the unicode version use
|
|
:meth:`TextResponse.body_as_unicode` (only available in
|
|
:class:`TextResponse` and subclasses).
|
|
|
|
.. attribute:: Response.request
|
|
|
|
The :class:`Request` object that generated this response. This attribute is
|
|
assigned in the Scrapy engine, after the response and request has passed
|
|
through all :ref:`Downloader Middlewares <topics-downloader-middleware>`.
|
|
In particular, this means that:
|
|
|
|
- HTTP redirections will cause the original request (to the URL before
|
|
redirection) to be assigned to the redirected response (with the final
|
|
URL after redirection).
|
|
|
|
- Response.request.url doesn't always equals Response.url
|
|
|
|
- This attribute is only available in the spider code, and in the
|
|
:ref:`Spider Middlewares <topics-spider-middleware>`, but not in
|
|
Downloader Middlewares (although you have the Request available there by
|
|
other means) and handlers of the :signal:`response_downloaded` signal.
|
|
|
|
.. attribute:: Response.meta
|
|
|
|
A dict that contains arbitrary metadata for this response, similar to the
|
|
:attr:`Request.meta` attribute. See the :attr:`Request.meta` attribute for
|
|
more info.
|
|
|
|
.. attribute:: Response.flags
|
|
|
|
A list that contains flags for this response. Flags are labels used for
|
|
tagging Responses. For example: `'cached'`, `'redirected`', etc. And
|
|
they're shown on the string representation of the Response (`__str__`
|
|
method) which is used by the engine for logging.
|
|
|
|
.. attribute:: Response.cache
|
|
|
|
A dict that contains arbitrary cached data for this response, similar to
|
|
the :attr:`Request.cache` attribute. See the :attr:`Request.cache`
|
|
attribute for more info.
|
|
|
|
.. method:: Response.copy()
|
|
|
|
Return a new Response which is a copy of this Response. The attribute
|
|
:attr:`Response.meta` is copied, while :attr:`Response.cache` is not.
|
|
|
|
.. method:: Response.replace([url, status, headers, body, meta, flags, cls])
|
|
|
|
Return a Response object with the same members, except for those members
|
|
given new values by whichever keyword arguments are specified. The
|
|
attribute :attr:`Response.meta` is copied by default (unless a new value
|
|
is given in the ``meta`` argument). The :attr:`Response.cache`
|
|
attribute is always cleared.
|
|
|
|
.. method:: Response.httprepr()
|
|
|
|
Return a string with the raw HTTP representation of this response.
|
|
|
|
.. _ref-response-subclasses:
|
|
|
|
Response subclasses
|
|
===================
|
|
|
|
Here is the list of available built-in Response subclasses. You can also
|
|
subclass the Response class to implement your own functionality.
|
|
|
|
TextResponse objects
|
|
--------------------
|
|
|
|
.. class:: TextResponse(url, [encoding[, ...]])
|
|
|
|
:class:`TextResponse` objects adds encoding capabilities to the base
|
|
:class:`Response` class, which is meant to be used only for binary data,
|
|
such as images, sounds or any media file.
|
|
|
|
:class:`TextResponse` objects support a new constructor arguments, in
|
|
addition to the base :class:`Response` objects. The remaining functionality
|
|
is the same as for the :class:`Response` class and is not documented here.
|
|
|
|
:param encoding: is a string which contains the encoding to use for this
|
|
response. If you create a :class:`TextResponse` object with a unicode
|
|
body it will be encoded using this encoding (remember the body attribute
|
|
is always a string). If ``encoding`` is ``None`` (default value), the
|
|
encoding will be looked up in the response headers anb body instead.
|
|
:type encoding: string
|
|
|
|
:class:`TextResponse` objects support the following attributes in addition
|
|
to the standard :class:`Response` ones:
|
|
|
|
.. attribute:: TextResponse.encoding
|
|
|
|
A string with the encoding of this response. The encoding is resolved in the
|
|
following order:
|
|
|
|
1. the encoding passed in the constructor `encoding` argument
|
|
|
|
2. the encoding declared in the Content-Type HTTP header
|
|
|
|
3. the encoding declared in the response body. The TextResponse class
|
|
doesn't provide any special functionality for this. However, the
|
|
:class:`HtmlResponse` and :class:`XmlResponse` classes do.
|
|
|
|
4. the encoding inferred by looking at the response body. This is the more
|
|
fragile method but also the last one tried.
|
|
|
|
:class:`TextResponse` objects support the following methods in addition to
|
|
the standard :class:`Response` ones:
|
|
|
|
.. method:: TextResponse.headers_encoding()
|
|
|
|
Returns a string with the encoding declared in the headers (ie. the
|
|
Content-Type HTTP header).
|
|
|
|
.. method:: TextResponse.body_encoding()
|
|
|
|
Returns a string with the encoding of the body, either declared or inferred
|
|
from its contents. The body encoding declaration is implemented in
|
|
:class:`TextResponse` subclasses such as: :class:`HtmlResponse` or
|
|
:class:`XmlResponse`.
|
|
|
|
.. method:: TextResponse.body_as_unicode()
|
|
|
|
Returns the body of the response as unicode. This is equivalent to::
|
|
|
|
response.body.encode(response.encoding)
|
|
|
|
But **not** equivalent to::
|
|
|
|
unicode(response.body)
|
|
|
|
Since, in the latter case, you would be using you system default encoding
|
|
(typically `ascii`) to convert the body to uniode, instead of the response
|
|
encoding.
|
|
|
|
HtmlResponse objects
|
|
--------------------
|
|
|
|
.. class:: HtmlResponse(url[, ...])
|
|
|
|
The :class:`HtmlResponse` class is a subclass of :class:`TextResponse`
|
|
which adds encoding auto-discovering support by looking into the HTML `meta
|
|
http-equiv`_ attribute. See :attr:`TextResponse.encoding`.
|
|
|
|
.. _meta http-equiv: http://www.w3schools.com/TAGS/att_meta_http_equiv.asp
|
|
|
|
XmlResponse objects
|
|
-------------------
|
|
|
|
.. class:: XmlResponse(url[, ...])
|
|
|
|
The :class:`XmlResponse` class is a subclass of :class:`TextResponse` which
|
|
adds encoding auto-discovering support by looking into the XML declaration
|
|
line. See :attr:`TextResponse.encoding`.
|
|
|