Andres Moreira
f4f9626c3f
Remove old code.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40204
2008-09-03 19:11:52 +00:00
Andres Moreira
3cb1ab8794
Add rule engine to the framework. Rules are executed in a pipeline.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40203
2008-09-03 19:06:34 +00:00
olveyra
1fa947dd67
- Improved attribute name checks
...
- added support to tuple definition of pipeline
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40202
2008-09-03 13:53:28 +00:00
olveyra
82a9fa9ffc
more efficient name attribute check in adaptors pipeline
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40201
2008-09-02 18:52:04 +00:00
olveyra
6b16ee5e67
- assure deferred_degenerate will take an iterable (bug raised when no
...
spider middleware is enabled)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40200
2008-09-02 17:39:22 +00:00
olveyra
f54ba9f7e9
removed unused import
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40199
2008-09-02 15:41:22 +00:00
olveyra
972896cd87
removed canonicalize from get function in shell
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40198
2008-09-02 15:20:16 +00:00
olveyra
2a30073ece
fix get function (strip and canonicalize url)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40197
2008-09-02 14:12:17 +00:00
olveyra
7913a86250
updated settings template
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40196
2008-09-02 12:38:59 +00:00
olveyra
472a0de139
- Fixes in adaptors code, after testing
...
- added attrs_list param to insertadaptor method
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40195
2008-09-01 20:06:10 +00:00
Pablo Hoffman
41fa98801c
removed unneeded exception code
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40194
2008-09-01 19:28:30 +00:00
Pablo Hoffman
a88fb416c7
changed Referer middleware class name
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40193
2008-09-01 04:28:18 +00:00
Pablo Hoffman
037e6c2125
improved SpiderMiddleware's docstrings
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40192
2008-09-01 04:18:12 +00:00
Pablo Hoffman
c9cafd5c43
added UrlFilterMiddleware
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40191
2008-09-01 04:16:51 +00:00
Pablo Hoffman
d7d94482a9
added update_fingerprint method to Request
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40190
2008-09-01 04:09:51 +00:00
Pablo Hoffman
96d24c7640
fixed some documentation errors
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40189
2008-09-01 03:34:53 +00:00
Pablo Hoffman
29c3715c5a
changed remove_fragments argument to keep_fragments, for consistency with the other canonicalize_url arguments
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40188
2008-09-01 03:33:18 +00:00
Pablo Hoffman
30803b9e89
added canonicalize_url function to scrapy.utils.url, along with a complete suite of tests
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40187
2008-09-01 03:31:11 +00:00
Pablo Hoffman
34355048c3
some functions were added to scrapy.utils.url without following our policies for adding tests (to scrapy.tests) and documentation (as docstrings). fixed that.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40186
2008-09-01 01:19:32 +00:00
olveyra
f3013bb9ad
Improved Adaptors code
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40185
2008-08-31 00:25:13 +00:00
olveyra
0e6562cb47
moved some url utils from decobot to scrapy
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40184
2008-08-27 17:37:32 +00:00
olveyra
9164150bed
- avoid to raise an exception when no arg is given to replay command
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40183
2008-08-27 17:21:12 +00:00
olveyra
b11b84fff1
- moved scrape command to shell
...
- fixes
- get and scrapehelp functions added as ipython magic commands
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40182
2008-08-27 13:52:45 +00:00
Pablo Hoffman
bfe6168f3b
cleaned up simpages code a bit, added some documentation
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40181
2008-08-25 00:00:14 +00:00
Pablo Hoffman
eee86b9827
added prototype page similarity code, to detect different layouts
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40180
2008-08-24 19:10:27 +00:00
olveyra
397d3ff247
Added a synchronous get method which also updates console user namespace.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40179
2008-08-23 18:21:47 +00:00
olveyra
e83dcb588e
allow to use scrape command without an url
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40178
2008-08-22 13:38:29 +00:00
olveyra
77053113cd
reverted clean_markup code movement
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40177
2008-08-21 17:12:32 +00:00
olveyra
a2bd70ba21
moved clean_markup to scrapy.utils.markup
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40176
2008-08-21 15:07:55 +00:00
olveyra
643ea99f36
fixed a clean code movement error: forget to apply remove tags when
...
text does not contains cdata
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40175
2008-08-20 12:52:31 +00:00
Pablo Hoffman
1c8c73ebfa
added some validation to new spider module names
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40174
2008-08-19 19:52:31 +00:00
olveyra
17dec39c29
removed temporal fix in 171
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40173
2008-08-19 13:42:32 +00:00
olveyra
0f49c7c0d4
temporal fix to avoid exceptions before commit in decobot
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40172
2008-08-18 15:53:53 +00:00
olveyra
5b3662ee89
- Added generic clean adaptors
...
- removed attribute name from adaptor function method (adaptors should
not nor need to know attribute names)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40171
2008-08-18 15:18:40 +00:00
olveyra
8877426a13
minor fixes
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40170
2008-08-15 17:04:36 +00:00
Andres Moreira
801b804a4d
Added support to replay update to crawl again all the pages downloaded in the replay file.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40169
2008-08-15 14:59:48 +00:00
olveyra
9b0dd66ec1
improved explanation comment of the RequestLimitMiddleware
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40168
2008-08-15 12:35:09 +00:00
Andres Moreira
ee59bd87ab
Changed messages of downloaded respones to received respones.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40167
2008-08-14 12:23:37 +00:00
Andres Moreira
cdd8895614
The response downloadeds are manage by a new signal, response_received and I changed the methods associated that. Changed the method response_download to response_received. Added code to support the update in response_received.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40166
2008-08-14 12:22:07 +00:00
anibal
d95e542374
ignoring temp directories for spider tests
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40165
2008-08-14 11:53:29 +00:00
Pablo Hoffman
3975bf95c7
added response_received signal
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40164
2008-08-14 09:57:17 +00:00
olveyra
632b975ce7
import fix
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40163
2008-08-14 01:13:14 +00:00
Pablo Hoffman
280ad944ea
changed setting default value
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40162
2008-08-13 21:06:49 +00:00
olveyra
8e72e4e60e
- Introduction of class BaseAdaptor
...
- Contrib Adaptors
- location_str moved from decobot to scrapy
- Added setting DEFAULT_DATA_ENCODING
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40161
2008-08-13 19:49:25 +00:00
olveyra
3a018cadff
avoid trying to stop a not running task (this bug
...
caused stalled processes in production servers)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40160
2008-08-12 17:55:54 +00:00
olveyra
d0f12be4c0
removed a bad character from comment that caused an encoding error
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40159
2008-08-12 16:18:03 +00:00
anibal
baf32540d6
we should add some svn hook to use pylint before commit :)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40158
2008-08-12 15:59:05 +00:00
olveyra
e3f70a8101
commented debug lines in last commit
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40157
2008-08-12 15:13:48 +00:00
olveyra
a690d33f24
added scheduler request queue limit for spiders (spider middleware)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40156
2008-08-12 15:11:15 +00:00
olveyra
290702d988
Cluster crawler fixes
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40155
2008-08-09 02:01:41 +00:00