Ismael Carnales
5c1dd25284
removed spiders cache from example project, thxs Michael
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40858
2009-02-16 13:15:55 +00:00
Ismael Carnales
3e8bcafd8b
added copytree from python 2.6 to utils.misc and make startproject use it to ignore .svn and .pyc files
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40857
2009-02-16 12:50:00 +00:00
Ismael Carnales
522fe78c2b
remove old project example
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40856
2009-02-16 11:52:49 +00:00
Ismael Carnales
bb9a732edf
updated tutorial with new googledir project from r853
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40855
2009-02-16 11:51:07 +00:00
Ismael Carnales
eb1e62a28c
add new googledir example project with new structure
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40854
2009-02-16 11:49:44 +00:00
Daniel Grana
755235b568
utils: renamed load_class function as load_object
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40853
2009-02-13 17:21:50 +00:00
Daniel Grana
3876a1827f
cluster: move branched cluster as experimental contrib
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40852
2009-02-13 17:20:10 +00:00
Pablo Hoffman
47cf45c916
removed redudant part of Scrapy introduction to make it simpler. thanks Ismael for pointing that out
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40851
2009-02-12 20:58:42 +00:00
Daniel Grana
8557f6bc57
duplicatefilter: lower log level of skipped requests message
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40850
2009-02-12 15:55:11 +00:00
Daniel Grana
2b14251510
cache: read metadata only when when looking for cached items. refs #61
...
thanks Patrick Mezard for patch.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40849
2009-02-12 08:00:07 +00:00
Daniel Grana
d68646422d
storedb: gracefully fail test if mysql is not installed.
...
thanks Patrick Mezard for patch.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40848
2009-02-12 07:59:40 +00:00
Daniel Grana
3e4bc6141a
core: remove obsolte groupfilter code
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40847
2009-02-12 07:39:57 +00:00
Daniel Grana
c9f2865c83
core: Get rid of duplicate filtering as a scheduler builtin feature. closes #49 .
...
Implements a DuplicatesFilterMiddleware as spidermiddleware, a wraper
using a minimal defined API of a filtering class configurable by
settings.
Enabling this middleware doesn't gives us same functionality compared to
scheduler duplicate filter builtin, but it filter the most important
source for duplicate requests, the spiders.
What requests aren't filtered by new middleware? The ones originated
from any part of scrapy outside of spiders, like S3 images requests or
any other request manually schedule using ``scrapyengine.schedule()``
method.
Previously, we usually added dont_filter=True to requests created
outside of spiders to avoid collisions downloading same pages than
spider. Now, this is not required anymore because new middleware filters
just the spider generated requests.
There is a caveat, as usual downloadmiddlewares can returns a Request
object at any point of the chain, and that request is scheduled and
downloaded as usual too. One of the downloadmiddlewares using this
feature is RedirectMiddleware that counts on scheduler filtering builtin
to avoid redirection loops. I think we can implement a request time to
live decreasing counter and add it to request's ``meta`` attribute with
a default value if not present, and decrement each time the request is
redirected.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40846
2009-02-12 07:07:13 +00:00
Daniel Grana
0aba276a64
tutorial: indent class docstring
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40845
2009-02-12 04:39:49 +00:00
Daniel Grana
73d8177ecc
tutorial: lot of line wrapping and changes to double backticks instead of emphatized words
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40844
2009-02-12 04:38:13 +00:00
Daniel Grana
7c056c620e
tutorial: fix outofdate and broken tutorial after adaptors were moved to experimental
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40843
2009-02-12 03:58:54 +00:00
Daniel Grana
f43c58da1f
adds a reference about empty request body and Content-Length header fix, closes #60 .
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40842
2009-02-10 18:27:59 +00:00
Daniel Grana
8fdf06ed08
Avoid sending Content-Length header when body is an empty string.
...
Some sites can't handle "Content-Length: 0" header, but twisted
HTTPClientFactory adds Content-Length header unless body is None.
scrapy enforces request.body usage as string, using None is not
possible.
thanks Matt for report, See:
http://groups.google.com/group/scrapy-users/browse_thread/thread/380ffa111879989e?hl=en
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40841
2009-02-10 13:23:38 +00:00
Pablo Hoffman
f4224be411
- added get_meta_refresh to scrapy.utils.response
...
- added tests for get_meta_refresh_url
- added RedirectMiddleware tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40840
2009-02-10 06:20:43 +00:00
Pablo Hoffman
d376b20996
minor fix to doctest
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40839
2009-02-10 04:46:07 +00:00
Pablo Hoffman
90c1597cef
fixed bug with items_to_csv and Excel. thanks Mat for the patch. See:
...
http://groups.google.com/group/scrapy-developers/browse_thread/thread/9e2b80d226333011
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40838
2009-02-10 02:43:19 +00:00
Daniel Grana
1b2de80124
remove scrapy/dos external referenced from scrapy.org site
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40837
2009-02-09 17:20:41 +00:00
Ismael Carnales
844b59b5b3
reduced introduction text in proposed doc
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40836
2009-02-09 15:06:22 +00:00
Ismael Carnales
d98f35af94
added items section (ScrapedItems) to proposed doc
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40835
2009-02-09 14:29:48 +00:00
Ismael Carnales
3a3723ba2e
removed reference to guid in ScrapedItem
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40834
2009-02-09 13:59:15 +00:00
Pablo Hoffman
38c62ca357
settings must be documented in alphabetical order
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40833
2009-02-08 21:00:26 +00:00
Ismael Carnales
45340d3d87
better index for proposed documentation
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40832
2009-02-06 20:19:17 +00:00
Ismael Carnales
5ca4728805
added complete spiders topic and reference (in one file) using an autodoc and manual doc mix
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40831
2009-02-06 20:15:34 +00:00
Ismael Carnales
3fb945be7b
fixed doc on BaseSpider
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40830
2009-02-06 20:02:13 +00:00
Ismael Carnales
485b9c9ed5
updated docstrings of feedspiders for autodoc
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40829
2009-02-06 20:01:27 +00:00
Ismael Carnales
9ba07653c6
added a proposed introduction (with all the main topics) to the proposed documentation, this would be the start for the new tutorial
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40828
2009-02-06 16:20:49 +00:00
Ismael Carnales
be04a7f328
modify docs configuration for autodoc to work
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40827
2009-02-05 15:39:52 +00:00
Ismael Carnales
96be669af2
fix tutorial1 errors
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40826
2009-02-05 13:44:32 +00:00
Daniel Grana
fe2f018b1a
FormRequest: urlencode multiples values of a single key using doseq
...
this prevents urllib.urlencode from sending the repr of the value when
it founds a list or tuple.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40825
2009-02-05 13:41:10 +00:00
Ismael Carnales
6650e56354
changed representation of the project tree in the tutorial
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40824
2009-02-05 13:35:14 +00:00
Daniel Grana
8f349737f8
add missing dashes to docs
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40823
2009-02-05 12:43:30 +00:00
Daniel Grana
bc2ea867ca
Adds docs for PROJECT_NAME setting. refs #58
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40822
2009-02-05 12:39:43 +00:00
Daniel Grana
bf87fb8455
add proper docstring to string_camelcase. refs #58
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40821
2009-02-05 12:20:25 +00:00
Ismael Carnales
751a844e36
updated tutorial to reflect project's structure change
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40820
2009-02-05 12:14:49 +00:00
Daniel Grana
f7dac0e449
Creates an usable project structure on startproject. closes #58 and closes #54
...
--HG--
rename : scrapy/trunk/scrapy/conf/project_template/__init__.py => scrapy/trunk/scrapy/templates/project/module/__init__.py
rename : scrapy/trunk/scrapy/conf/project_template/items.py => scrapy/trunk/scrapy/templates/project/module/items.py
rename : scrapy/trunk/scrapy/conf/project_template/scrapy_settings.py => scrapy/trunk/scrapy/templates/project/module/settings.py
rename : scrapy/trunk/scrapy/conf/project_template/spiders/__init__.py => scrapy/trunk/scrapy/templates/project/module/spiders/__init__.py
rename : scrapy/trunk/scrapy/conf/project_template/templates/spider_basic.tmpl => scrapy/trunk/scrapy/templates/project/module/templates/spider_basic.tmpl
rename : scrapy/trunk/scrapy/conf/project_template/templates/spider_crawl.tmpl => scrapy/trunk/scrapy/templates/project/module/templates/spider_crawl.tmpl
rename : scrapy/trunk/scrapy/conf/project_template/templates/spider_csvfeed.tmpl => scrapy/trunk/scrapy/templates/project/module/templates/spider_csvfeed.tmpl
rename : scrapy/trunk/scrapy/conf/project_template/templates/spider_xmlfeed.tmpl => scrapy/trunk/scrapy/templates/project/module/templates/spider_xmlfeed.tmpl
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40819
2009-02-05 12:12:15 +00:00
Ismael Carnales
bccef8a463
fixed basespider links in topics/spider
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40818
2009-01-31 06:09:50 +00:00
Ismael Carnales
fa8ebd1147
corrected naming and references in ref/spiders doc
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40817
2009-01-31 06:01:33 +00:00
Pablo Hoffman
6220609718
added bin/runtests.sh script to run tests
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40816
2009-01-30 23:20:49 +00:00
Pablo Hoffman
20a61caa2c
fixed some doc typos
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40815
2009-01-30 23:20:21 +00:00
Pablo Hoffman
005a642240
added SETTINGS_DISABLED environment variable to turn off custom settings (and only use Scrapy defaults)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40814
2009-01-30 23:18:23 +00:00
Pablo Hoffman
16efbf87fc
added some references to documentation and fixed some doc typos (thanks Patrick)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40813
2009-01-30 22:33:50 +00:00
Pablo Hoffman
c672223091
splitted spiders doc from link extractor docs, moved the corresponding parts to ref and topics
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40812
2009-01-30 21:53:40 +00:00
Pablo Hoffman
483ef3ba7f
changed typo in Items docs
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40811
2009-01-30 21:52:41 +00:00
Ismael Carnales
a834eef6c9
added BaseSpider attributes and method references and metadata (from tutorial)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40810
2009-01-30 19:24:43 +00:00
Ismael Carnales
7e6c9c2e25
formatting changes and references to spiders added in the tutorial
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40809
2009-01-30 19:14:16 +00:00