scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-03-01 18:08:14 +00:00

Author	SHA1	Message	Date
Pablo Hoffman	e2ed27e4fd	Added documentation for Ubuntu packages. Refs #211	2010-08-23 21:28:32 -03:00
Pablo Hoffman	9aefa242d5	Applied documentation patch provided by Lucian Ursu (closes #207 )	2010-08-21 01:26:35 -03:00
Pablo Hoffman	1d3b9e2ca8	Scrapy shell refactoring	2010-08-20 11:26:14 -03:00
Pablo Hoffman	7858244dca	Scrapy shell: moved python console starting code to scrapy.utils.console and get rid of noisy console banners	2010-08-20 01:33:02 -03:00
Pablo Hoffman	34554da201	Deprecated scrapy-ctl.py command in favour of simpler "scrapy" command. Closes #199 . Also updated documenation accordingly and added convenient scrapy.bat script for running from Windows. --HG-- rename : debian/scrapy-ctl.1 => debian/scrapy.1 rename : docs/topics/scrapy-ctl.rst => docs/topics/cmdline.rst	2010-08-18 19:48:32 -03:00
Pablo Hoffman	e741a807d2	Added new Feed exports extension with documentation and storage tests. Closes #197 . Also deprecated File export pipeline (to be removed in Scrapy 0.11). Still need to add tests for FeedExport main extension code.	2010-08-17 14:27:48 -03:00
Pablo Hoffman	43d47e5d9b	Some improvements to Item Pipeline (closes #195 ): * Made Item Pipeline Manager a subclass of scrapy.middleware.MiddlewareManager * Added open_spider/close_spider methods with support for returning deferreds from them * Inverted the process_item() arguments to be more friendly with deferred callbacks (backwards compatibility kept through arguments introspection) * Updated documentation with new methods and process_item() arguments change	2010-08-12 10:48:37 -03:00
Ismael Carnales	e145ec686c	Replaced default spider manager (TwistedPluginSpiderManger) with a simpler one that doesn't depend on Twisted Plugins infrastructure.	2010-07-30 17:30:32 -03:00
Pablo Hoffman	9e37ec4230	fixed documentation typo (closes #151 )	2010-07-13 19:03:02 -03:00
Pablo Hoffman	6a33d6c4d0	* Added Scrapy Web Service with documentation and tests. * Marked Web Console as deprecated. * Removed Web Console documentation to discourage its use.	2010-06-09 13:46:22 -03:00
Pablo Hoffman	81f6502e37	Automated merge with http://hg.scrapy.org/scrapy-0.8/	2010-04-24 18:22:13 -03:00
Pablo Hoffman	2121a30c74	added note about installing Zope.Interface in windows platforms	2010-04-24 18:19:52 -03:00
Rolando Espinoza La fuente	db5c3df679	SEP12 implementation * Rename BaseSpider.domain_name to BaseSpider.name This patch implements the domain_name to name change in BaseSpider class and change all spider instantiations to use the new attribute. * Add allowed_domains to spider This patch implements the merging of spider.domain_name and spider.extra_domain_names in spider.allowed_domains for offsite checking purposes. Note that spider.domain_name is not touched by this patch, only not used. * Remove spider.domain_name references from scrapy.stats * Rename domain_stats to spider_stats in MemoryStatsCollector * Use ``spider`` instead of ``domain`` in SimpledbStatsCollector * Rename domain_stats_history table to spider_data_history and rename domain field to spider in MysqlStatsCollector * Refactor genspider command The new signature for genspider is: genspider [options] <domain_name>. Genspider uses domain_name for spider name and for the module name. * Remove spider.domain_name references * Update crawl command signature <spider\|url> * docs: updated references to domain_name * examples/experimental: use spider.name * genspider: require <name> <domain> * spidermanager: renamed crawl_domain to crawl_spider_name * spiderctl: updated references of domain to spider * added backward compatiblity with legacy spider's attributes 'domain_name' and 'extra_domain_names'	2010-04-01 18:27:22 -03:00
Pablo Hoffman	1dfc79b5d0	Automated merge with http://hg.scrapy.org/scrapy-0.8	2010-03-20 20:48:11 -03:00
Pablo Hoffman	99a876754c	Improved "What else?" section of "Scrapy at a glance" overview	2010-03-20 20:24:18 -03:00
Pablo Hoffman	264cd2e035	Automated merge with http://hg.scrapy.org/scrapy-0.8	2010-03-19 10:32:42 -03:00
Daniel Grana	17091902f3	Explicity say where to save item class in "Defining our item" section of tutorial	2010-03-12 14:12:49 -02:00
Rolando Espinoza La fuente	7235040936	merged upstream	2010-02-19 17:41:45 -04:00
Pablo Hoffman	48739ae60c	install.rst: added explanation about why libxml2 2.6.28 or above is required	2010-01-13 12:20:24 -02:00
Rolando Espinoza La fuente	1402da31c5	docs: fixed typos and updated code examples	2010-01-11 12:28:22 -04:00
Pablo Hoffman	d60412ce19	titlecased Scrapy easy_install and some fixes to sign_release.sh script	2009-12-13 14:23:31 -02:00
Pablo Hoffman	7728a23e99	Changed item pipeline API to pass spider references (instead of domain names) to process_item() method	2009-11-06 13:46:36 -02:00
Pablo Hoffman	9b5fef4f48	fixed typo in intro/install doc (thanks phaithful)	2009-10-28 09:34:31 -02:00
Pablo Hoffman	37d9e015bb	minor fix to tutorial	2009-10-07 20:15:49 -02:00
Pablo Hoffman	a0eec7eaf6	some typos fixes and updates to install doc	2009-09-29 09:44:02 -03:00
Ismael Carnales	1646482bef	reformatted installation guide	2009-09-29 08:41:34 -03:00
Ismael Carnales	5862ba7db7	modified doc to reflect the new spider callback return policy (lists not needed)	2009-09-22 11:25:40 -03:00
Pablo Hoffman	6e93872955	updated installation guide for using releases	2009-09-17 11:06:55 -03:00
Pablo Hoffman	8a074c9cb5	removed scrapy-admin.py command, and left only scrapy-ctl as the only scrapy command	2009-08-24 15:43:36 -03:00
Ismael Carnales	39540b188a	changed torrent in overview doc	2009-08-24 15:11:04 -03:00
Ismael Carnales	4b5aa30867	minor update to tutorial	2009-08-24 14:34:17 -03:00
Pablo Hoffman	9635a7839c	rearranged documentation into a better organization --HG-- rename : docs/topics/index.rst => docs/index.rst	2009-08-21 21:49:54 -03:00
Ismael Carnales	c08d3aa9cc	updated tutorial to use new items api	2009-08-21 14:16:27 -03:00
Pablo Hoffman	33b53c59d5	moved scrapy.xpath to scrapy.selector --HG-- rename : scrapy/xpath/__init__.py => scrapy/selector/__init__.py rename : scrapy/xpath/document.py => scrapy/selector/document.py rename : scrapy/xpath/factories.py => scrapy/selector/factories.py	2009-08-19 21:50:52 -03:00
Pablo Hoffman	e8504a054c	moved scrapy.newitem to scrapy.item and declared newitem api officially stable. updated docs and example project. deprecated old ScrapedItem	2009-08-19 21:39:58 -03:00
Ismael Carnales	48b40bd620	renamed x method of selectors to select	2009-08-17 15:58:06 -03:00
Pablo Hoffman	5aeab5b291	converted scrapy.item package to module --HG-- rename : scrapy/item/models.py => scrapy/item.py	2009-08-12 21:31:50 -03:00
Pablo Hoffman	b296d4169e	minor doc update for making it more windows-friendly	2009-08-09 17:08:42 -03:00
Pablo Hoffman	38c3f7d0b4	Some changes to logging of scraped items: 1. "Scraped Item" log level changed to DEBUG 2. "Dropped Item" log level changed to WARNING 3. added "Passed Item" log message with INFO level	2009-07-23 11:49:48 -03:00
Pablo Hoffman	e43e28bf1d	minimal doc improvement	2009-07-23 09:12:49 -03:00
Ismael Carnales	6d24ae5920	added reference to working with relative xpaths in the tutorial	2009-07-23 09:05:14 -03:00
Pablo Hoffman	1125825996	doc: minor updates to tutorial	2009-07-15 22:10:00 -03:00
Pablo Hoffman	ae7333d598	added simplejson optional dependency to doc	2009-07-09 16:49:20 -03:00
Pablo Hoffman	0c4c153819	improved Scrapy documentation index for better usability	2009-07-01 09:51:57 -03:00
Pablo Hoffman	80cd534f92	removed redundant botname from log lines	2009-06-25 16:48:04 -03:00
Pablo Hoffman	b1dad251ae	Deprecated Common Downloader Middleware and added DefaultHeaders Downloader Middleware	2009-05-25 14:41:06 -03:00
Pablo Hoffman	befd28eef4	docs/tutorial: added reminder about adding pipeline to ITEM_PIPELINES settings (thanks jamie)	2009-05-20 00:57:44 -03:00
Pablo Hoffman	04610a25dc	fixed bug in tutorial regarding csv writer pipeline, and other minor corrections	2009-05-19 03:07:08 -03:00
Daniel Grana	abfc52cd17	docs: modify install document to mercurial based installation instructions	2009-05-19 01:50:44 -03:00
Pablo Hoffman	86498abdf1	Sorted out Link Extractors organization by moving all them to scrapy.contrib.linkextractors. The most relevant being: scrapy.link.extractors.RegexLinkExtractor which was moved to: scrapy.contrib.linkextractors.sgml.SgmlLinkExtractor The old location still works but throws a deprecation warning. It will be removed before the 0.7 release. Documentation and tests were also updated. Also, in this changeset, a new regex-based link extractor was added to scrapy.contrib.linkextractors.regex. --HG-- rename : scrapy/tests/sample_data/link_extractor/regex_linkextractor.html => scrapy/tests/sample_data/link_extractor/sgml_linkextractor.html rename : scrapy/tests/test_link.py => scrapy/tests/test_contrib_linkextractors.py	2009-05-18 19:19:37 -03:00

1 2 3 4

151 Commits