elpolilla
7df63477bf
added response decompression tool
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40213
2008-09-09 12:56:01 +00:00
olveyra
3a2981801c
minor code fix
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40212
2008-09-08 12:06:53 +00:00
olveyra
3fe36e77d2
Removed PRIORITY constants, added DEFAULT_PRIORITY setting
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40211
2008-09-06 17:03:47 +00:00
olveyra
8b09be8601
adaptors with generic matching function
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40210
2008-09-05 21:51:53 +00:00
Andres Moreira
09b11a3a7b
Added an histogram plot to simpages group to the report. Added quantities of html elements to symbol of the simhash. Now is more effective.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40209
2008-09-05 14:45:24 +00:00
olveyra
c2af3124ba
- added negative attribute name match
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40208
2008-09-04 19:50:25 +00:00
olveyra
4e61f05761
second security fix
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40207
2008-09-04 19:33:36 +00:00
olveyra
51bdd6944b
- removed contrib/adaptors.py
...
- display adaptor name when an exception raises inside the adaptor
pipeline
- add debug parameter to item attribute method to display input/output
of each adaptor
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40206
2008-09-04 19:11:53 +00:00
olveyra
daf51203e3
security fix
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40205
2008-09-04 18:15:45 +00:00
Andres Moreira
f4f9626c3f
Remove old code.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40204
2008-09-03 19:11:52 +00:00
Andres Moreira
3cb1ab8794
Add rule engine to the framework. Rules are executed in a pipeline.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40203
2008-09-03 19:06:34 +00:00
olveyra
1fa947dd67
- Improved attribute name checks
...
- added support to tuple definition of pipeline
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40202
2008-09-03 13:53:28 +00:00
olveyra
82a9fa9ffc
more efficient name attribute check in adaptors pipeline
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40201
2008-09-02 18:52:04 +00:00
olveyra
6b16ee5e67
- assure deferred_degenerate will take an iterable (bug raised when no
...
spider middleware is enabled)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40200
2008-09-02 17:39:22 +00:00
olveyra
f54ba9f7e9
removed unused import
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40199
2008-09-02 15:41:22 +00:00
olveyra
972896cd87
removed canonicalize from get function in shell
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40198
2008-09-02 15:20:16 +00:00
olveyra
2a30073ece
fix get function (strip and canonicalize url)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40197
2008-09-02 14:12:17 +00:00
olveyra
7913a86250
updated settings template
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40196
2008-09-02 12:38:59 +00:00
olveyra
472a0de139
- Fixes in adaptors code, after testing
...
- added attrs_list param to insertadaptor method
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40195
2008-09-01 20:06:10 +00:00
Pablo Hoffman
41fa98801c
removed unneeded exception code
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40194
2008-09-01 19:28:30 +00:00
Pablo Hoffman
a88fb416c7
changed Referer middleware class name
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40193
2008-09-01 04:28:18 +00:00
Pablo Hoffman
037e6c2125
improved SpiderMiddleware's docstrings
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40192
2008-09-01 04:18:12 +00:00
Pablo Hoffman
c9cafd5c43
added UrlFilterMiddleware
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40191
2008-09-01 04:16:51 +00:00
Pablo Hoffman
d7d94482a9
added update_fingerprint method to Request
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40190
2008-09-01 04:09:51 +00:00
Pablo Hoffman
96d24c7640
fixed some documentation errors
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40189
2008-09-01 03:34:53 +00:00
Pablo Hoffman
29c3715c5a
changed remove_fragments argument to keep_fragments, for consistency with the other canonicalize_url arguments
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40188
2008-09-01 03:33:18 +00:00
Pablo Hoffman
30803b9e89
added canonicalize_url function to scrapy.utils.url, along with a complete suite of tests
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40187
2008-09-01 03:31:11 +00:00
Pablo Hoffman
34355048c3
some functions were added to scrapy.utils.url without following our policies for adding tests (to scrapy.tests) and documentation (as docstrings). fixed that.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40186
2008-09-01 01:19:32 +00:00
olveyra
f3013bb9ad
Improved Adaptors code
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40185
2008-08-31 00:25:13 +00:00
olveyra
0e6562cb47
moved some url utils from decobot to scrapy
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40184
2008-08-27 17:37:32 +00:00
olveyra
9164150bed
- avoid to raise an exception when no arg is given to replay command
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40183
2008-08-27 17:21:12 +00:00
olveyra
b11b84fff1
- moved scrape command to shell
...
- fixes
- get and scrapehelp functions added as ipython magic commands
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40182
2008-08-27 13:52:45 +00:00
Pablo Hoffman
bfe6168f3b
cleaned up simpages code a bit, added some documentation
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40181
2008-08-25 00:00:14 +00:00
Pablo Hoffman
eee86b9827
added prototype page similarity code, to detect different layouts
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40180
2008-08-24 19:10:27 +00:00
olveyra
397d3ff247
Added a synchronous get method which also updates console user namespace.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40179
2008-08-23 18:21:47 +00:00
olveyra
e83dcb588e
allow to use scrape command without an url
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40178
2008-08-22 13:38:29 +00:00
olveyra
77053113cd
reverted clean_markup code movement
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40177
2008-08-21 17:12:32 +00:00
olveyra
a2bd70ba21
moved clean_markup to scrapy.utils.markup
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40176
2008-08-21 15:07:55 +00:00
olveyra
643ea99f36
fixed a clean code movement error: forget to apply remove tags when
...
text does not contains cdata
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40175
2008-08-20 12:52:31 +00:00
Pablo Hoffman
1c8c73ebfa
added some validation to new spider module names
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40174
2008-08-19 19:52:31 +00:00
olveyra
17dec39c29
removed temporal fix in 171
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40173
2008-08-19 13:42:32 +00:00
olveyra
0f49c7c0d4
temporal fix to avoid exceptions before commit in decobot
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40172
2008-08-18 15:53:53 +00:00
olveyra
5b3662ee89
- Added generic clean adaptors
...
- removed attribute name from adaptor function method (adaptors should
not nor need to know attribute names)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40171
2008-08-18 15:18:40 +00:00
olveyra
8877426a13
minor fixes
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40170
2008-08-15 17:04:36 +00:00
Andres Moreira
801b804a4d
Added support to replay update to crawl again all the pages downloaded in the replay file.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40169
2008-08-15 14:59:48 +00:00
olveyra
9b0dd66ec1
improved explanation comment of the RequestLimitMiddleware
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40168
2008-08-15 12:35:09 +00:00
Andres Moreira
ee59bd87ab
Changed messages of downloaded respones to received respones.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40167
2008-08-14 12:23:37 +00:00
Andres Moreira
cdd8895614
The response downloadeds are manage by a new signal, response_received and I changed the methods associated that. Changed the method response_download to response_received. Added code to support the update in response_received.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40166
2008-08-14 12:22:07 +00:00
anibal
d95e542374
ignoring temp directories for spider tests
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40165
2008-08-14 11:53:29 +00:00
Pablo Hoffman
3975bf95c7
added response_received signal
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40164
2008-08-14 09:57:17 +00:00