1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 12:03:58 +00:00

767 Commits

Author SHA1 Message Date
Pablo Hoffman
9ea309c354 rename deployment.rst -> deploy.rst (consist with others like debug.rst) 2015-04-09 16:56:35 -03:00
Pablo Hoffman
d8184a7239 Merge pull request #1124 from rdowinton/deployment-doc
Added deployment section covering scrapyd-deploy and shub
2015-04-09 16:53:25 -03:00
Alexander Sibiryakov
85aa3c7596 Dns cache size and timeout options 2015-04-02 18:30:59 +02:00
nyov
92b574309e documentation build warning fixes 2015-04-01 19:46:21 +00:00
Daniel Graña
27591b55fc Merge pull request #1123 from sibiryakov/reactor-threadpool-size
[MRG+1] Reactor threadpool max size setting
2015-04-01 15:08:03 -03:00
Alexander Sibiryakov
b794cdaf4b Broad crawls notes. 2015-04-01 12:07:03 +02:00
Alexander Sibiryakov
e7b274edf3 Reformat to 80 characters per line. 2015-04-01 11:49:55 +02:00
Alexander Sibiryakov
94fceb4c15 Fixing underscore size. 2015-04-01 11:25:10 +02:00
Richard Dowinton
2d142d6401 Added deployment section covering scrapyd-deploy and shub 2015-03-31 12:17:31 +01:00
Alexander Sibiryakov
5864d291d4 Setting documentation. 2015-03-31 11:10:56 +02:00
Pablo Hoffman
bb4c922d85 Merge pull request #1081 from scrapy/dict-items
Allow spiders to return dicts.
2015-03-27 15:19:27 -03:00
Daniel Graña
55a23d102f Merge pull request #1086 from Curita/response-urljoin
Add Response.urljoin() helper
2015-03-27 15:17:54 -03:00
Julia Medina
f4e241a018 Merge pull request #1106 from eliasdorneles/overview-page-improvements
[MRG+1] some improvements to overview page
2015-03-27 15:16:33 -03:00
Mikhail Korobov
39085ae18f Merge pull request #1098 from nyov/nyov/userconfig
[+1 MRG]look in ~/.config/scrapy.cfg for user config
2015-03-27 02:11:35 +05:00
nyov
1134a9cab0 config: look in ~/.config/scrapy.cfg as well 2015-03-26 20:36:14 +00:00
Elias Dorneles
76e3bf1250 addressing comments from the review plus further editing 2015-03-26 14:26:20 -03:00
Ramiro Morales
933dbc6be6 Oops 2015-03-25 18:33:17 -03:00
Ramiro Morales
ca2575001e Add missing callback arg in jobs topic example. 2015-03-25 18:32:20 -03:00
Mikhail Korobov
5ac91e4883 DOC remove Dynamic Creation of Item Classes section
It was a hack, and dicts-as-items cover most use cases.

Dicts don't allow to attach metadata to fields,
but e.g. adding "_meta" key and removing it in a custom serializer
is no worse than creating classes dynamically.
2015-03-23 18:11:35 +05:00
Julia Medina
cda3922507 Add Response.urljoin() helper 2015-03-19 19:07:52 -03:00
Pablo Hoffman
c81eefaf81 fix doc links 2015-03-19 17:42:48 -03:00
Mikhail Korobov
8ac397670f DOC move .. module: declaration to a proper place 2015-03-19 21:41:36 +05:00
Faisal Anees
643984e1b4 Updated architecture.rst
Added http://krondo.com/blog/?page_id=1327 as a resource
2015-03-18 23:55:22 -03:00
Mikhail Korobov
f16a33f34e DOC change structure of spider docs:
* start with scrapy.Spider, then mention spider arguments,
  then describe generic spiders;
* change wording regarding start_urls/start_requests;
* show an example of start_requests vs start_urls;
* show an example of dicts as items;
* as defining Item is an optional step now, docs for Items are
  moved below Spider docs.
2015-03-19 05:25:15 +05:00
Mikhail Korobov
817dbc6cbd DOC mention dicts in documentation; explain better what are Items for 2015-03-19 05:16:14 +05:00
Julia Medina
959aaad205 Document re_first 2015-03-18 21:11:08 -03:00
Mateusz Golewski
127c6c694a Fix extract_first() docs 2015-03-18 21:11:08 -03:00
Mateusz Golewski
012211accd Add docs for extract_first() 2015-03-18 21:11:08 -03:00
Julia Medina
4fb818a250 Run linkfix over current docs 2015-03-18 20:04:14 -03:00
Shadab Zafar
5a58d64131 Fix some redirection links in documentation
Fixes #606
2015-03-18 19:41:26 -03:00
Nicolás Alejandro Ramírez Quiros
ee82fe0e24 Merge pull request #1016 from SudShekhar/jsonProcessor
[MRG+1] Added JmesSelect
2015-03-18 08:11:25 -03:00
Mikhail Korobov
39635e5f55 Allow spiders to return dicts. See GH-1064. 2015-03-18 07:26:56 +05:00
Pablo Hoffman
934584a355 Merge pull request #1020 from jojje/gzip_http_cache
[MRG+1] add gzip compression to filesystem http cache backend
2015-03-17 14:32:06 -03:00
Pablo Hoffman
f924567591 Merge pull request #983 from ArturGaspar/linkextractor_css
[MRG+1] CSS support in link extractors
2015-03-17 01:07:47 -03:00
nramirezuy
c13e23641b httpcache dont_cache meta #19 #689 2015-03-16 11:50:04 -03:00
Mikhail Korobov
baf5c59386 Merge pull request #1071 from eliasdorneles/updating-request-meta-special-keys
updating list of Request.meta special keys
2015-03-13 16:38:19 +05:00
Elias Dorneles
57a5ee0097 added example value to set for proxy meta key 2015-03-12 23:20:44 -03:00
Elias Dorneles
f7031c08ff updating list of Request.meta special keys 2015-03-10 22:29:07 -03:00
Sudhanshu Shekhar
839ffba971 Added the first version of SelectJmes
Utilizes jmespath. Also, added tests and documentation for the same.
2015-02-24 22:59:01 +05:30
Nicolás Alejandro Ramírez Quiros
8a3b9b6131 Merge pull request #1011 from SudShekhar/master
Extension example fix to something that makes more sense
2015-01-30 15:45:52 -02:00
Sudhanshu Shekhar
e42a1ac1a1 Reset items_scraped instead of item_count
items_scraped is the counter that needs to be reset each time we have scraped a specific number of items in the code instead of item_count (which represents the specific number of items needed before a message is logged). Updating the source code to reflect this.
Removed some irrelevant words from the log message.
Signed-off-by: Sudhanshu Shekhar <sudshekhar02@gmail.com>
2015-01-30 23:13:06 +05:30
Jonas Tingeborn
bd5d99a2d2 add gzip compression to filesystem http cache backend 2015-01-21 20:18:11 +01:00
Capi Etheriel
4bc14da59e Updates documentation on dynamic item classes.
Fixes #398
2015-01-19 17:21:56 -02:00
Mikhail Korobov
283d6a5344 DOC a couple more references are fixed 2015-01-19 22:07:03 +05:00
Mikhail Korobov
73e6b35622 DOC fix a reference 2015-01-19 22:02:46 +05:00
Artur Gaspar
b0730a1d16 documentation for CSS support in link extractors 2014-12-11 18:22:08 -02:00
Stefan
3602fc4fcb fixed the variable types in mailsender documentation 2014-12-10 22:48:09 +01:00
Lev Berman
e04b0aff74 An attempt to resolve #977, add signal to be sent when request is dropped by the scheduler 2014-11-27 15:10:15 +03:00
tpeng
a69f042d10 add 2 more test cases and minor doc fixes 2014-11-19 15:31:07 +01:00
tpeng
fa84730e70 avoid download large response
introduce DOWNLOAD_MAXSIZE and DOWNLOAD_WARNSIZE in settings and
download_maxsize/download_warnsize in spider/request meta, so
downloader stop downloading as soon as the received data exceed the
limit. also check the twsisted response's length in advance to stop
downloading as early as possible.
2014-11-12 12:28:02 +01:00