1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 13:04:01 +00:00

54 Commits

Author SHA1 Message Date
Elias Dorneles
4dcecc98f9 moved example data to a better place 2015-03-26 15:45:17 -03:00
Elias Dorneles
7402e27230 fix community link 2015-03-26 15:35:31 -03:00
Elias Dorneles
729861c864 fixing indentation 2015-03-26 15:31:42 -03:00
Elias Dorneles
13d0ecde77 addressing more review comments, to avoid ambiguity on desired reading flow 2015-03-26 15:26:16 -03:00
Elias Dorneles
76e3bf1250 addressing comments from the review plus further editing 2015-03-26 14:26:20 -03:00
Elias Dorneles
8f4a268f37 added bit about async requests, improved phrasing 2015-03-26 12:14:56 -03:00
Elias Dorneles
32423d4a33 some improvements to overview page 2015-03-25 19:27:52 -03:00
Shadab Zafar
5a58d64131 Fix some redirection links in documentation
Fixes #606
2015-03-18 19:41:26 -03:00
Daniel Graña
a9292cfab7 jsonrpc webservice moved to https://github.com/scrapy/scrapy-jsonrpc repository 2014-08-15 23:28:13 -03:00
Daniel Graña
2ad8db6ae6 Merge pull request #761 from dangra/lxmlextractor
Promote LxmlLinkExtractor as LxmlExtractor
2014-06-25 15:07:02 -03:00
Daniel Graña
a9ecef5662 promote LxmlLinkExtractor as default in docs 2014-06-25 14:34:30 -03:00
Daniel Graña
5b2faf61c3 recognize jl extension as jsonlines exporter and update docs 2014-06-25 13:55:15 -03:00
Carlos Rivera
946b854ddf grammatical issue 2014-05-06 15:41:59 -05:00
Daniel Graña
1117687c47 update docs 2014-04-23 23:39:58 -03:00
Mikhail Korobov
2d3803672b DOC use top-level shortcuts in docs 2014-04-15 01:09:35 +06:00
Julia Medina
80081054a2 Fix broken links in documentation 2014-04-09 18:57:52 -03:00
Pablo Hoffman
ed6fd4933f Merge pull request #524 from hobsonlane/master
documentation code example corrections per pablohoffman
2014-01-16 06:44:51 -08:00
Hobson Lane
85a80d0752 remove "for brevity's sake" line and correct "Torrent item"
Torrent item -> TorrentItem class
2014-01-15 17:29:23 -08:00
Hobson Lane
a3db95985b another import name correction by pablo 2014-01-14 21:04:15 -08:00
Ferdy Rodriguez
807dd25324 fixed error on tor's name 2014-01-13 00:03:58 -06:00
Ferdy Rodriguez
8b9348cfaf Changed TOR Info as previous was removed from www.mininova.org 2014-01-12 23:46:04 -06:00
Hobson Lane
6ba0857a5c documentation code example correction corrections per pablohoffman 2014-01-10 10:37:27 -08:00
Pablo Hoffman
e8ee449a2a Merge pull request #432 from darkrho/crawl-url
Removed URL reference in crawl command and .tld suffix in docs for spider names
2013-10-21 09:40:58 -07:00
Rolando Espinoza La fuente
34543c2b2e DOCS removed .tld suffix for spider names for the sake of consistency. 2013-10-19 23:03:20 -04:00
Daniel Graña
155ea08ea1 use sel name for Selector's instances in docs, internals and shell 2013-10-15 15:58:42 -02:00
Daniel Graña
4645f9e03c Updates docs to reflect unified selectors api 2013-10-14 16:31:20 -02:00
Hart
c00c4d7148 correction to description of example XPath retrieval in overview doc 2013-08-03 17:08:58 -07:00
Juan M Uys
4de3aa4932 Update overview.rst 2013-04-08 14:13:15 +02:00
Pablo Hoffman
2fb5e62c39 doc: update overview page to point to the genspider command. refs #107 2012-04-19 02:37:22 -03:00
Pablo Hoffman
ade5efdc61 added -o option to scrapy crawl, a convenient shortcut for using feed exports 2011-10-22 20:53:49 -02:00
Pablo Hoffman
76af0cdd44 updated documentation and code to use -s instead of --set option 2011-09-01 14:35:37 -03:00
Pablo Hoffman
5da6ffb57b Automated merge with ssh://hg.scrapy.org:2222/scrapy-0.12 2011-08-11 09:11:19 -03:00
Pablo Hoffman
bc2d2183e9 fixed import in doc 2011-08-11 09:11:08 -03:00
Pablo Hoffman
c59340150f Added cached DNS resolver based on old caching resolver extension from scrapy.contrib.resolver. This new one is *not* an extension, it comes builtin and always enabled. 2011-07-27 03:45:15 -03:00
Pablo Hoffman
57c43fdce6 added SitemapSpider, with tests and doc 2011-06-15 11:54:34 -03:00
Pablo Hoffman
5f65c26080 Some minor improvements to feature list in Scrapy at a Glance documentation page 2010-10-16 19:02:08 -02:00
Pablo Hoffman
e3d67d74f7 docs/intro/overview.rst: add example of scraped data and introduce loaders 2010-09-06 10:04:00 -03:00
Pablo Hoffman
00d55fbbd1 Updated 'Scrapy at a glance' document replacing item pipeline example by a simpler usage of feed exports 2010-09-05 23:38:37 -03:00
Pablo Hoffman
9aefa242d5 Applied documentation patch provided by Lucian Ursu (closes #207) 2010-08-21 01:26:35 -03:00
Pablo Hoffman
e741a807d2 Added new Feed exports extension with documentation and storage tests. Closes #197.
Also deprecated File export pipeline (to be removed in Scrapy 0.11).

Still need to add tests for FeedExport main extension code.
2010-08-17 14:27:48 -03:00
Pablo Hoffman
43d47e5d9b Some improvements to Item Pipeline (closes #195):
* Made Item Pipeline Manager a subclass of scrapy.middleware.MiddlewareManager
* Added open_spider/close_spider methods with support for returning deferreds from them
* Inverted the process_item() arguments to be more friendly with deferred
  callbacks (backwards compatibility kept through arguments introspection)
* Updated documentation with new methods and process_item() arguments change
2010-08-12 10:48:37 -03:00
Pablo Hoffman
6a33d6c4d0 * Added Scrapy Web Service with documentation and tests.
* Marked Web Console as deprecated.
* Removed Web Console documentation to discourage its use.
2010-06-09 13:46:22 -03:00
Rolando Espinoza La fuente
db5c3df679 SEP12 implementation
* Rename BaseSpider.domain_name to BaseSpider.name

    This patch implements the domain_name to name change in BaseSpider class and
    change all spider instantiations to use the new attribute.

  * Add allowed_domains to spider

    This patch implements the merging of spider.domain_name and
    spider.extra_domain_names in spider.allowed_domains for offsite checking
    purposes.

    Note that spider.domain_name is not touched by this patch, only not used.

  * Remove spider.domain_name references from scrapy.stats

    * Rename domain_stats to spider_stats in MemoryStatsCollector
    * Use ``spider`` instead of ``domain`` in SimpledbStatsCollector
    * Rename domain_stats_history table to spider_data_history and rename domain
    field to spider in MysqlStatsCollector

  * Refactor genspider command

    The new signature for genspider is: genspider [options] <domain_name>.

    Genspider uses domain_name for spider name and for the module name.

  * Remove spider.domain_name references

  * Update crawl command signature <spider|url>

  * docs: updated references to domain_name

  * examples/experimental: use spider.name

  * genspider: require <name> <domain>

  * spidermanager: renamed crawl_domain to crawl_spider_name

  * spiderctl: updated references of *domain* to spider

  * added backward compatiblity with legacy spider's attributes
    'domain_name' and 'extra_domain_names'
2010-04-01 18:27:22 -03:00
Pablo Hoffman
99a876754c Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00
Pablo Hoffman
7728a23e99 Changed item pipeline API to pass spider references (instead of domain names) to process_item() method 2009-11-06 13:46:36 -02:00
Ismael Carnales
5862ba7db7 modified doc to reflect the new spider callback return policy (lists not needed) 2009-09-22 11:25:40 -03:00
Ismael Carnales
39540b188a changed torrent in overview doc 2009-08-24 15:11:04 -03:00
Pablo Hoffman
9635a7839c rearranged documentation into a better organization
--HG--
rename : docs/topics/index.rst => docs/index.rst
2009-08-21 21:49:54 -03:00
Pablo Hoffman
e8504a054c moved scrapy.newitem to scrapy.item and declared newitem api officially stable. updated docs and example project. deprecated old ScrapedItem 2009-08-19 21:39:58 -03:00
Ismael Carnales
48b40bd620 renamed x method of selectors to select 2009-08-17 15:58:06 -03:00