olveyra
6b16ee5e67
- assure deferred_degenerate will take an iterable (bug raised when no
...
spider middleware is enabled)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40200
2008-09-02 17:39:22 +00:00
olveyra
f54ba9f7e9
removed unused import
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40199
2008-09-02 15:41:22 +00:00
olveyra
972896cd87
removed canonicalize from get function in shell
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40198
2008-09-02 15:20:16 +00:00
olveyra
2a30073ece
fix get function (strip and canonicalize url)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40197
2008-09-02 14:12:17 +00:00
olveyra
7913a86250
updated settings template
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40196
2008-09-02 12:38:59 +00:00
olveyra
472a0de139
- Fixes in adaptors code, after testing
...
- added attrs_list param to insertadaptor method
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40195
2008-09-01 20:06:10 +00:00
Pablo Hoffman
41fa98801c
removed unneeded exception code
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40194
2008-09-01 19:28:30 +00:00
Pablo Hoffman
a88fb416c7
changed Referer middleware class name
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40193
2008-09-01 04:28:18 +00:00
Pablo Hoffman
037e6c2125
improved SpiderMiddleware's docstrings
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40192
2008-09-01 04:18:12 +00:00
Pablo Hoffman
c9cafd5c43
added UrlFilterMiddleware
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40191
2008-09-01 04:16:51 +00:00
Pablo Hoffman
d7d94482a9
added update_fingerprint method to Request
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40190
2008-09-01 04:09:51 +00:00
Pablo Hoffman
96d24c7640
fixed some documentation errors
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40189
2008-09-01 03:34:53 +00:00
Pablo Hoffman
29c3715c5a
changed remove_fragments argument to keep_fragments, for consistency with the other canonicalize_url arguments
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40188
2008-09-01 03:33:18 +00:00
Pablo Hoffman
30803b9e89
added canonicalize_url function to scrapy.utils.url, along with a complete suite of tests
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40187
2008-09-01 03:31:11 +00:00
Pablo Hoffman
34355048c3
some functions were added to scrapy.utils.url without following our policies for adding tests (to scrapy.tests) and documentation (as docstrings). fixed that.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40186
2008-09-01 01:19:32 +00:00
olveyra
f3013bb9ad
Improved Adaptors code
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40185
2008-08-31 00:25:13 +00:00
olveyra
0e6562cb47
moved some url utils from decobot to scrapy
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40184
2008-08-27 17:37:32 +00:00
olveyra
9164150bed
- avoid to raise an exception when no arg is given to replay command
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40183
2008-08-27 17:21:12 +00:00
olveyra
b11b84fff1
- moved scrape command to shell
...
- fixes
- get and scrapehelp functions added as ipython magic commands
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40182
2008-08-27 13:52:45 +00:00
Pablo Hoffman
bfe6168f3b
cleaned up simpages code a bit, added some documentation
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40181
2008-08-25 00:00:14 +00:00
Pablo Hoffman
eee86b9827
added prototype page similarity code, to detect different layouts
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40180
2008-08-24 19:10:27 +00:00
olveyra
397d3ff247
Added a synchronous get method which also updates console user namespace.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40179
2008-08-23 18:21:47 +00:00
olveyra
e83dcb588e
allow to use scrape command without an url
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40178
2008-08-22 13:38:29 +00:00
olveyra
77053113cd
reverted clean_markup code movement
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40177
2008-08-21 17:12:32 +00:00
olveyra
a2bd70ba21
moved clean_markup to scrapy.utils.markup
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40176
2008-08-21 15:07:55 +00:00
olveyra
643ea99f36
fixed a clean code movement error: forget to apply remove tags when
...
text does not contains cdata
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40175
2008-08-20 12:52:31 +00:00
Pablo Hoffman
1c8c73ebfa
added some validation to new spider module names
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40174
2008-08-19 19:52:31 +00:00
olveyra
17dec39c29
removed temporal fix in 171
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40173
2008-08-19 13:42:32 +00:00
olveyra
0f49c7c0d4
temporal fix to avoid exceptions before commit in decobot
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40172
2008-08-18 15:53:53 +00:00
olveyra
5b3662ee89
- Added generic clean adaptors
...
- removed attribute name from adaptor function method (adaptors should
not nor need to know attribute names)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40171
2008-08-18 15:18:40 +00:00
olveyra
8877426a13
minor fixes
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40170
2008-08-15 17:04:36 +00:00
Andres Moreira
801b804a4d
Added support to replay update to crawl again all the pages downloaded in the replay file.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40169
2008-08-15 14:59:48 +00:00
olveyra
9b0dd66ec1
improved explanation comment of the RequestLimitMiddleware
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40168
2008-08-15 12:35:09 +00:00
Andres Moreira
ee59bd87ab
Changed messages of downloaded respones to received respones.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40167
2008-08-14 12:23:37 +00:00
Andres Moreira
cdd8895614
The response downloadeds are manage by a new signal, response_received and I changed the methods associated that. Changed the method response_download to response_received. Added code to support the update in response_received.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40166
2008-08-14 12:22:07 +00:00
anibal
d95e542374
ignoring temp directories for spider tests
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40165
2008-08-14 11:53:29 +00:00
Pablo Hoffman
3975bf95c7
added response_received signal
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40164
2008-08-14 09:57:17 +00:00
olveyra
632b975ce7
import fix
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40163
2008-08-14 01:13:14 +00:00
Pablo Hoffman
280ad944ea
changed setting default value
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40162
2008-08-13 21:06:49 +00:00
olveyra
8e72e4e60e
- Introduction of class BaseAdaptor
...
- Contrib Adaptors
- location_str moved from decobot to scrapy
- Added setting DEFAULT_DATA_ENCODING
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40161
2008-08-13 19:49:25 +00:00
olveyra
3a018cadff
avoid trying to stop a not running task (this bug
...
caused stalled processes in production servers)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40160
2008-08-12 17:55:54 +00:00
olveyra
d0f12be4c0
removed a bad character from comment that caused an encoding error
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40159
2008-08-12 16:18:03 +00:00
anibal
baf32540d6
we should add some svn hook to use pylint before commit :)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40158
2008-08-12 15:59:05 +00:00
olveyra
e3f70a8101
commented debug lines in last commit
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40157
2008-08-12 15:13:48 +00:00
olveyra
a690d33f24
added scheduler request queue limit for spiders (spider middleware)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40156
2008-08-12 15:11:15 +00:00
olveyra
290702d988
Cluster crawler fixes
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40155
2008-08-09 02:01:41 +00:00
olveyra
5b4d9f8f85
fix
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40154
2008-08-08 17:16:27 +00:00
olveyra
29cd1bc3cb
Added pb-capable crawler. The idea is to improve cluster performance
...
adding communication between crawler and master.
At the momento, a remote stop method to the crawler was added to sustitute
the previous stop based on kernel signal.
Further will add monitoring functionality, because the processes are
very silent, mainly when unavailable report is not issued,
and offen happens lots of thing that nobody realize on if some
fortuite events wouldn't happent
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40153
2008-08-08 16:57:33 +00:00
olveyra
6cfbe78d63
Added a periodic ping to maintain mysql connection active
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40152
2008-08-07 16:01:25 +00:00
olveyra
4606b7f9c5
deleting old cluster branch
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40151
2008-08-07 12:26:43 +00:00