1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 20:04:06 +00:00

62 Commits

Author SHA1 Message Date
Julia Medina
bdca06240c Fix settings repr on the logs of the shell and tutorial docs topics 2014-06-10 11:26:50 -03:00
Daniel Graña
1117687c47 update docs 2014-04-23 23:39:58 -03:00
Mikhail Korobov
2d3803672b DOC use top-level shortcuts in docs 2014-04-15 01:09:35 +06:00
Alexey Bezhan
210a0a6fe1 Fix some typos, whitespace and small errors in docs 2014-02-27 18:02:22 +00:00
Paul Tremberth
57f30bcb04 Docs: 4-space indent for final spider example 2014-02-01 23:34:55 +01:00
Rolando Espinoza
4255e12bc7 Updated the tutorial crawl output with latest output. 2014-01-23 18:18:56 -04:00
Rolando Espinoza
9aab9224cb Updated shell docs with the crawler reference and fixed the actual shell output.
Also updated the shell example with a reproducible code example.
2014-01-23 18:04:57 -04:00
Mikhail Korobov
a27d91f0a6 Rename BaseSpider to Spider. See GH-495. 2013-12-30 19:46:41 +06:00
RasPat1
ff21281b95 Note about selector class import
This is the salient point of this code compared to the last example.  We have a selector now and this is how we use it.  Especially since the user has just come from the shell where the pre-instantiated selector is taken for granted.
2013-12-15 13:46:42 -05:00
Pablo Hoffman
f2741c413e fix method name in tutorial. closes GH-480 2013-12-02 13:24:12 -02:00
Daniel Graña
155ea08ea1 use sel name for Selector's instances in docs, internals and shell 2013-10-15 15:58:42 -02:00
Daniel Graña
1abb1af0c6 fix typos and wording on selector's introduction 2013-10-15 10:13:43 -02:00
Daniel Graña
4645f9e03c Updates docs to reflect unified selectors api 2013-10-14 16:31:20 -02:00
Pablo Hoffman
e1683ddf9b fix doc typo 2013-10-09 17:24:12 -02:00
Pablo Hoffman
b1d1a36a1e add note about enclosing urls with quotes when running from command-line. closes GH-384 2013-09-18 18:01:28 -03:00
Kumara Tharmalingam
bbb0603091 Fixed directory location for dmoz_spider.py file
It should be under 'tutorial/spiders' not 'dmoz/spiders'
2013-09-15 21:55:52 -07:00
Pablo Hoffman
22edc44c6c doc: remove links to diveintopython.org, which is no longer available. closes #246 2013-02-14 11:09:40 -02:00
Valentin-Costel Hăloiu
00bfb37e79 Update master 2012-07-04 06:55:01 +03:00
Pablo Hoffman
2fb5e62c39 doc: update overview page to point to the genspider command. refs #107 2012-04-19 02:37:22 -03:00
Pablo Hoffman
0be421fbf0 fixed reference to tutorial directory 2011-12-23 18:57:11 -02:00
Daniel Graña
bcb31988f2 change tutorial to follow changes on dmoz site 2011-12-14 13:03:31 -02:00
Pablo Hoffman
ade5efdc61 added -o option to scrapy crawl, a convenient shortcut for using feed exports 2011-10-22 20:53:49 -02:00
Pablo Hoffman
76af0cdd44 updated documentation and code to use -s instead of --set option 2011-09-01 14:35:37 -03:00
Pablo Hoffman
5bf733b6f6 Changed default representation of items to pretty-printed dicts. This improves
default logging by making log more readable in the default case, for both Scraped and Dropped lines.

Projects can still customize how items are represented by overriding the item's __str__ method, as usual.
2011-06-03 01:13:01 -03:00
Pablo Hoffman
951ba507f9 Removed support for default values in Scrapy items, which have proven confusing in the past 2011-05-19 21:42:46 -03:00
Pablo Hoffman
503f302010 removed remaining references to scheduler middleware from doc, as it will be removed on next release 2011-05-18 19:48:48 -03:00
Pablo Hoffman
bb2b67c862 updated tutorial to use 'dmoz' as the name of the spider instead of 'dmoz.org', so that it's more similar to the dirbot example project 2011-04-28 09:31:57 -03:00
Pablo Hoffman
bf73002428 removed googledir example, replaced by dirbot project on github. updated docs accordingly 2011-04-28 02:28:39 -03:00
Pablo Hoffman
181d1c09ae Fixed typo and code indentation in the doc. Closes #307 and #308 2011-02-09 11:19:46 -02:00
Pablo Hoffman
b4fbc6c5fa Updated Scrapy Tutorial to reference feed exports, instead a custom written pipeline, and extended item pipeline documentation to include a JSON writer. 2010-10-10 20:31:05 -02:00
Pablo Hoffman
f4accb6c7f Updated dmoz xpaths of Scrapy tutorial 2010-10-07 18:22:01 -02:00
Pablo Hoffman
9aefa242d5 Applied documentation patch provided by Lucian Ursu (closes #207) 2010-08-21 01:26:35 -03:00
Pablo Hoffman
1d3b9e2ca8 Scrapy shell refactoring 2010-08-20 11:26:14 -03:00
Pablo Hoffman
7858244dca Scrapy shell: moved python console starting code to scrapy.utils.console and get rid of noisy console banners 2010-08-20 01:33:02 -03:00
Pablo Hoffman
34554da201 Deprecated scrapy-ctl.py command in favour of simpler "scrapy" command. Closes #199. Also updated documenation accordingly and added convenient scrapy.bat script for running from Windows.
--HG--
rename : debian/scrapy-ctl.1 => debian/scrapy.1
rename : docs/topics/scrapy-ctl.rst => docs/topics/cmdline.rst
2010-08-18 19:48:32 -03:00
Pablo Hoffman
43d47e5d9b Some improvements to Item Pipeline (closes #195):
* Made Item Pipeline Manager a subclass of scrapy.middleware.MiddlewareManager
* Added open_spider/close_spider methods with support for returning deferreds from them
* Inverted the process_item() arguments to be more friendly with deferred
  callbacks (backwards compatibility kept through arguments introspection)
* Updated documentation with new methods and process_item() arguments change
2010-08-12 10:48:37 -03:00
Ismael Carnales
e145ec686c Replaced default spider manager (TwistedPluginSpiderManger) with a simpler one that doesn't depend on Twisted Plugins infrastructure. 2010-07-30 17:30:32 -03:00
Pablo Hoffman
9e37ec4230 fixed documentation typo (closes #151) 2010-07-13 19:03:02 -03:00
Rolando Espinoza La fuente
db5c3df679 SEP12 implementation
* Rename BaseSpider.domain_name to BaseSpider.name

    This patch implements the domain_name to name change in BaseSpider class and
    change all spider instantiations to use the new attribute.

  * Add allowed_domains to spider

    This patch implements the merging of spider.domain_name and
    spider.extra_domain_names in spider.allowed_domains for offsite checking
    purposes.

    Note that spider.domain_name is not touched by this patch, only not used.

  * Remove spider.domain_name references from scrapy.stats

    * Rename domain_stats to spider_stats in MemoryStatsCollector
    * Use ``spider`` instead of ``domain`` in SimpledbStatsCollector
    * Rename domain_stats_history table to spider_data_history and rename domain
    field to spider in MysqlStatsCollector

  * Refactor genspider command

    The new signature for genspider is: genspider [options] <domain_name>.

    Genspider uses domain_name for spider name and for the module name.

  * Remove spider.domain_name references

  * Update crawl command signature <spider|url>

  * docs: updated references to domain_name

  * examples/experimental: use spider.name

  * genspider: require <name> <domain>

  * spidermanager: renamed crawl_domain to crawl_spider_name

  * spiderctl: updated references of *domain* to spider

  * added backward compatiblity with legacy spider's attributes
    'domain_name' and 'extra_domain_names'
2010-04-01 18:27:22 -03:00
Pablo Hoffman
264cd2e035 Automated merge with http://hg.scrapy.org/scrapy-0.8 2010-03-19 10:32:42 -03:00
Daniel Grana
17091902f3 Explicity say where to save item class in "Defining our item" section of tutorial 2010-03-12 14:12:49 -02:00
Rolando Espinoza La fuente
1402da31c5 docs: fixed typos and updated code examples 2010-01-11 12:28:22 -04:00
Pablo Hoffman
7728a23e99 Changed item pipeline API to pass spider references (instead of domain names) to process_item() method 2009-11-06 13:46:36 -02:00
Pablo Hoffman
37d9e015bb minor fix to tutorial 2009-10-07 20:15:49 -02:00
Ismael Carnales
5862ba7db7 modified doc to reflect the new spider callback return policy (lists not needed) 2009-09-22 11:25:40 -03:00
Pablo Hoffman
8a074c9cb5 removed scrapy-admin.py command, and left only scrapy-ctl as the only scrapy command 2009-08-24 15:43:36 -03:00
Ismael Carnales
4b5aa30867 minor update to tutorial 2009-08-24 14:34:17 -03:00
Pablo Hoffman
9635a7839c rearranged documentation into a better organization
--HG--
rename : docs/topics/index.rst => docs/index.rst
2009-08-21 21:49:54 -03:00
Ismael Carnales
c08d3aa9cc updated tutorial to use new items api 2009-08-21 14:16:27 -03:00
Pablo Hoffman
33b53c59d5 moved scrapy.xpath to scrapy.selector
--HG--
rename : scrapy/xpath/__init__.py => scrapy/selector/__init__.py
rename : scrapy/xpath/document.py => scrapy/selector/document.py
rename : scrapy/xpath/factories.py => scrapy/selector/factories.py
2009-08-19 21:50:52 -03:00