1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 18:44:12 +00:00

531 Commits

Author SHA1 Message Date
Pablo Hoffman
9d38a99aa8 updated missing doc reference from previous commit 2010-08-10 17:47:04 -03:00
Pablo Hoffman
784722774b moved scrapy.core.signals to scrapy.signals, keeping backwards compatibility 2010-08-10 17:40:53 -03:00
Pablo Hoffman
c359a34d7d moved scrapy.core.exceptions to scrapy.exceptions, keeping backwards compatibility
--HG--
rename : scrapy/core/exceptions.py => scrapy/exceptions.py
2010-08-10 17:36:48 -03:00
Pablo Hoffman
b1c0280616 removed scheduler middleware doc, as scheduler middleware will be removed soon 2010-08-10 16:59:49 -03:00
Pablo Hoffman
c7d9f6e270 Added JSON item exporter with doc and unittests (closes #192), and also:
* put all json exporters in scrapy.contrib.exporters and deprecated
  scrapy.contrib.exporters.jsonlines to reduce module nesting
* use JSON exporter with EXPORT_FORMAT=json in file export pipeline
2010-08-07 15:52:59 -03:00
Pablo Hoffman
49851d7f55 Automated merge with http://hg.scrapy.org/scrapy-0.9 2010-08-02 17:20:55 -03:00
Pablo Hoffman
6c68e4ce15 fixed documentation typo 2010-08-02 17:20:13 -03:00
Pablo Hoffman
453e7bf38c Scrapy logging refactoring (closes #188):
* added Twisted log observer for Scrapy, with unittests
 * use numeric values from Python logging module for log levels
 * removed scrapy.log.exc() function - use scrapy.log.err() instead
 * removed logmessage_received signal - write a (twisted) log observer instead
 * dropped support for obsolete `domain` argument
 * dropped support for old setting names: LOGLEVEL, LOGFILE (replaced by LOG_LEVEL, LOG_FILE)
 * deprecated `component` argument
2010-08-02 08:49:14 -03:00
Ismael Carnales
e145ec686c Replaced default spider manager (TwistedPluginSpiderManger) with a simpler one that doesn't depend on Twisted Plugins infrastructure. 2010-07-30 17:30:32 -03:00
Pablo Hoffman
e2290a5359 Some changes to Crawl spider:
* added process_request attribute to rules
* removed docstrings, since it duplicates documentation
2010-07-22 18:40:35 -03:00
Daniel Grana
3e013f564b update docs for defaultheaders middleware and change spider attribute to match global setting name 2010-07-16 16:17:08 -03:00
Daniel Grana
6883a99c1e Automated merge with ssh://hg.scrapy.org/scrapy-0.9 2010-07-16 14:56:00 -03:00
Pablo Hoffman
b91d40ba78 Fixed grammar error in doc (patch by stav) - closes #176 2010-07-16 11:34:18 -03:00
Pablo Hoffman
90a04f0530 Automated merge with http://hg.scrapy.org/scrapy-0.9 2010-07-13 19:47:55 -03:00
Pablo Hoffman
9e37ec4230 fixed documentation typo (closes #151) 2010-07-13 19:03:02 -03:00
Ping Yin
b3a65d3313 HTTPCACHE: Don't cache response with codes in HTTPCACHE_IGNORE_HTTP_CODES 2010-07-09 13:14:25 -03:00
Ismael Carnales
2571e1b7aa docs: Some DjangoItem docs improvements, closes #134. Thanks tn! 2010-06-27 09:09:54 -03:00
Pablo Hoffman
115e9f2162 Added FAQ entry about running Scrapy deployment. 2010-06-14 18:21:12 -03:00
Pablo Hoffman
ede1df4b4f updated copyright year, and indentation space 2010-06-14 07:16:51 -03:00
Pablo Hoffman
bd16d1cd48 Added SMTP-AUTH support to scrapy.mail (closes #149) 2010-06-13 17:14:46 -03:00
Pablo Hoffman
6a33d6c4d0 * Added Scrapy Web Service with documentation and tests.
* Marked Web Console as deprecated.
* Removed Web Console documentation to discourage its use.
2010-06-09 13:46:22 -03:00
Pablo Hoffman
73305b1eb3 Added support for Requests without callbacks (#166) - the Spider.parse() method
is used in those cases.

Also removed Request.deferred attribute.
2010-06-08 18:18:02 -03:00
Pablo Hoffman
38b5793152 Some changes to telnet console:
* moved module from scrapy.management.telnet to scrapy.telnet (to minimize
  nested modules)
* added signal for updating telnet console variables (fixes #165)

--HG--
rename : scrapy/management/telnet.py => scrapy/telnet.py
2010-06-02 17:49:18 -03:00
Pablo Hoffman
031eb1e5ed removed no longer used SpiderScheduler (obsoleted by ExecutionQueue) 2010-05-28 17:27:15 -03:00
Ismael Carnales
a71dc295af Some mail improvements and tests.
* Add mail_sent signal and use it in MailSender
* Add MAIL_DEBUG setting to not send mails when testing
* Add MailSender tests
2010-05-28 16:51:47 -03:00
Ping Yin
6059221716 Compose: stop process on None value by default
By doing this, we can use str.lower as a processor safely without
checking whether the given value is None.

By passing stop_on_none=False as keyword argument, this behaviour can be changed.

Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-08 10:59:47 +08:00
Ping Yin
15b879f845 ItemLoader: Update docs for {add,replace,get}_{value,xpath}
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-05-18 17:54:25 +08:00
Pablo Hoffman
bfd9cb42e5 Automated merge with http://hg.scrapy.org/scrapy-0.8 2010-05-17 20:11:27 -03:00
Pablo Hoffman
076cdfd585 Added documentation about contributing to Scrapy 2010-05-17 20:10:46 -03:00
Pablo Hoffman
7a55158fed fixed documentation bug (thanks rhill for reporting) 2010-05-11 11:25:03 -03:00
Steven Almeroth
5d03405cac FormRequest.from_response doc fix. closes #155
--HG--
extra : rebase_source : d54979f6a15e5e997072dcbbc6d43b426189312b
2010-04-26 22:28:07 -03:00
Pablo Hoffman
2121a30c74 added note about installing Zope.Interface in windows platforms 2010-04-24 18:19:52 -03:00
Daniel Grana
6c12106803 Remove shpinx warning introduced by shorter title overline 2010-04-18 23:42:56 -03:00
Lucian Ursu
2f8c052484 #154: Language fixes to the documentation 2010-04-18 23:39:54 -03:00
Pablo Hoffman
dfdac356af added missing default values to file xporter doc 2010-04-02 02:49:18 -03:00
Pablo Hoffman
f19c939925 fixed doc typo 2010-03-26 08:28:32 -03:00
Pablo Hoffman
99a876754c Improved "What else?" section of "Scrapy at a glance" overview 2010-03-20 20:24:18 -03:00
Pablo Hoffman
234fd709ad fixed doc typo (thanks Victor) 2010-03-19 10:32:17 -03:00
Daniel Grana
184cf6684f Remove HttpException references from docs. Since 0.7, scrapy returns non-200 as Response objects and does not raise HttpException anymore 2010-03-18 10:05:33 -03:00
Daniel Grana
17091902f3 Explicity say where to save item class in "Defining our item" section of tutorial 2010-03-12 14:12:49 -02:00
Daniel Grana
c925c9e9a0 Notify spider when requests are ignored by HttpErrorMiddleware, and generally when any call to process_spider_input raises an exception 2010-05-12 16:41:06 -03:00
Daniel Grana
c0d45846b8 Automated merge with ssh://hg.scrapy.org/scrapy-0.8 2010-04-26 22:29:45 -03:00
Pablo Hoffman
81f6502e37 Automated merge with http://hg.scrapy.org/scrapy-0.8/ 2010-04-24 18:22:13 -03:00
Daniel Grana
658e6f15e9 Automated merge with ssh://hg.scrapy.org/scrapy-0.8 2010-04-18 23:44:59 -03:00
Daniel Grana
68a875edb0 update ENCODING_ALIASES setting default value in settings documentation topic 2010-04-07 10:54:54 -03:00
Pablo Hoffman
de32612c99 Automated merge with http://hg.scrapy.org/scrapy-0.8 2010-04-02 02:49:51 -03:00
Rolando Espinoza La fuente
db5c3df679 SEP12 implementation
* Rename BaseSpider.domain_name to BaseSpider.name

    This patch implements the domain_name to name change in BaseSpider class and
    change all spider instantiations to use the new attribute.

  * Add allowed_domains to spider

    This patch implements the merging of spider.domain_name and
    spider.extra_domain_names in spider.allowed_domains for offsite checking
    purposes.

    Note that spider.domain_name is not touched by this patch, only not used.

  * Remove spider.domain_name references from scrapy.stats

    * Rename domain_stats to spider_stats in MemoryStatsCollector
    * Use ``spider`` instead of ``domain`` in SimpledbStatsCollector
    * Rename domain_stats_history table to spider_data_history and rename domain
    field to spider in MysqlStatsCollector

  * Refactor genspider command

    The new signature for genspider is: genspider [options] <domain_name>.

    Genspider uses domain_name for spider name and for the module name.

  * Remove spider.domain_name references

  * Update crawl command signature <spider|url>

  * docs: updated references to domain_name

  * examples/experimental: use spider.name

  * genspider: require <name> <domain>

  * spidermanager: renamed crawl_domain to crawl_spider_name

  * spiderctl: updated references of *domain* to spider

  * added backward compatiblity with legacy spider's attributes
    'domain_name' and 'extra_domain_names'
2010-04-01 18:27:22 -03:00
Pablo Hoffman
2299deda66 updated wrong link in doc 2010-03-26 14:02:33 -03:00
Pablo Hoffman
7cf2f87e27 Automated merge with http://hg.scrapy.org/scrapy-0.8 2010-03-26 08:29:34 -03:00
Pablo Hoffman
1330697c3d Some improvements to Response encoding support:
* added encoding aliases, configurable through a new ENCODING_ALIASES setting
* Response.encoding now returns the real encoding detected for the body
* simplified TextResponse API by removing body_encoding() and
  headers_encoding() methods
* Response.encoding now tries to infer the encoding from the body always (it
  was done before only on HtmlResponse and TextResponse)
* removed scrapy.utils.encoding.add_encoding_alias() function
* updated implementation of scrapy.utils.response function to reflect these API
  changes
* updated documentation to reflect API changes
2010-03-25 15:47:10 -03:00