1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 15:43:44 +00:00

4354 Commits

Author SHA1 Message Date
Andres Moreira
f4f9626c3f Remove old code.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40204
2008-09-03 19:11:52 +00:00
Andres Moreira
3cb1ab8794 Add rule engine to the framework. Rules are executed in a pipeline.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40203
2008-09-03 19:06:34 +00:00
olveyra
1fa947dd67 - Improved attribute name checks
- added support to tuple definition of pipeline

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40202
2008-09-03 13:53:28 +00:00
olveyra
82a9fa9ffc more efficient name attribute check in adaptors pipeline
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40201
2008-09-02 18:52:04 +00:00
olveyra
6b16ee5e67 - assure deferred_degenerate will take an iterable (bug raised when no
spider middleware is enabled)

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40200
2008-09-02 17:39:22 +00:00
olveyra
f54ba9f7e9 removed unused import
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40199
2008-09-02 15:41:22 +00:00
olveyra
972896cd87 removed canonicalize from get function in shell
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40198
2008-09-02 15:20:16 +00:00
olveyra
2a30073ece fix get function (strip and canonicalize url)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40197
2008-09-02 14:12:17 +00:00
olveyra
7913a86250 updated settings template
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40196
2008-09-02 12:38:59 +00:00
olveyra
472a0de139 - Fixes in adaptors code, after testing
- added attrs_list param to insertadaptor method

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40195
2008-09-01 20:06:10 +00:00
Pablo Hoffman
41fa98801c removed unneeded exception code
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40194
2008-09-01 19:28:30 +00:00
Pablo Hoffman
a88fb416c7 changed Referer middleware class name
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40193
2008-09-01 04:28:18 +00:00
Pablo Hoffman
037e6c2125 improved SpiderMiddleware's docstrings
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40192
2008-09-01 04:18:12 +00:00
Pablo Hoffman
c9cafd5c43 added UrlFilterMiddleware
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40191
2008-09-01 04:16:51 +00:00
Pablo Hoffman
d7d94482a9 added update_fingerprint method to Request
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40190
2008-09-01 04:09:51 +00:00
Pablo Hoffman
96d24c7640 fixed some documentation errors
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40189
2008-09-01 03:34:53 +00:00
Pablo Hoffman
29c3715c5a changed remove_fragments argument to keep_fragments, for consistency with the other canonicalize_url arguments
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40188
2008-09-01 03:33:18 +00:00
Pablo Hoffman
30803b9e89 added canonicalize_url function to scrapy.utils.url, along with a complete suite of tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40187
2008-09-01 03:31:11 +00:00
Pablo Hoffman
34355048c3 some functions were added to scrapy.utils.url without following our policies for adding tests (to scrapy.tests) and documentation (as docstrings). fixed that.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40186
2008-09-01 01:19:32 +00:00
olveyra
f3013bb9ad Improved Adaptors code
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40185
2008-08-31 00:25:13 +00:00
olveyra
0e6562cb47 moved some url utils from decobot to scrapy
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40184
2008-08-27 17:37:32 +00:00
olveyra
9164150bed - avoid to raise an exception when no arg is given to replay command
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40183
2008-08-27 17:21:12 +00:00
olveyra
b11b84fff1 - moved scrape command to shell
- fixes
- get and scrapehelp functions added as ipython magic commands

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40182
2008-08-27 13:52:45 +00:00
Pablo Hoffman
bfe6168f3b cleaned up simpages code a bit, added some documentation
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40181
2008-08-25 00:00:14 +00:00
Pablo Hoffman
eee86b9827 added prototype page similarity code, to detect different layouts
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40180
2008-08-24 19:10:27 +00:00
olveyra
397d3ff247 Added a synchronous get method which also updates console user namespace.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40179
2008-08-23 18:21:47 +00:00
olveyra
e83dcb588e allow to use scrape command without an url
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40178
2008-08-22 13:38:29 +00:00
olveyra
77053113cd reverted clean_markup code movement
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40177
2008-08-21 17:12:32 +00:00
olveyra
a2bd70ba21 moved clean_markup to scrapy.utils.markup
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40176
2008-08-21 15:07:55 +00:00
olveyra
643ea99f36 fixed a clean code movement error: forget to apply remove tags when
text does not contains cdata

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40175
2008-08-20 12:52:31 +00:00
Pablo Hoffman
1c8c73ebfa added some validation to new spider module names
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40174
2008-08-19 19:52:31 +00:00
olveyra
17dec39c29 removed temporal fix in 171
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40173
2008-08-19 13:42:32 +00:00
olveyra
0f49c7c0d4 temporal fix to avoid exceptions before commit in decobot
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40172
2008-08-18 15:53:53 +00:00
olveyra
5b3662ee89 - Added generic clean adaptors
- removed attribute name from adaptor function method (adaptors should
not nor need to know attribute names)

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40171
2008-08-18 15:18:40 +00:00
olveyra
8877426a13 minor fixes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40170
2008-08-15 17:04:36 +00:00
Andres Moreira
801b804a4d Added support to replay update to crawl again all the pages downloaded in the replay file.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40169
2008-08-15 14:59:48 +00:00
olveyra
9b0dd66ec1 improved explanation comment of the RequestLimitMiddleware
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40168
2008-08-15 12:35:09 +00:00
Andres Moreira
ee59bd87ab Changed messages of downloaded respones to received respones.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40167
2008-08-14 12:23:37 +00:00
Andres Moreira
cdd8895614 The response downloadeds are manage by a new signal, response_received and I changed the methods associated that. Changed the method response_download to response_received. Added code to support the update in response_received.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40166
2008-08-14 12:22:07 +00:00
anibal
d95e542374 ignoring temp directories for spider tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40165
2008-08-14 11:53:29 +00:00
Pablo Hoffman
3975bf95c7 added response_received signal
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40164
2008-08-14 09:57:17 +00:00
olveyra
632b975ce7 import fix
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40163
2008-08-14 01:13:14 +00:00
Pablo Hoffman
280ad944ea changed setting default value
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40162
2008-08-13 21:06:49 +00:00
olveyra
8e72e4e60e - Introduction of class BaseAdaptor
- Contrib Adaptors
- location_str moved from decobot to scrapy
- Added setting DEFAULT_DATA_ENCODING

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40161
2008-08-13 19:49:25 +00:00
olveyra
3a018cadff avoid trying to stop a not running task (this bug
caused stalled processes in production servers)

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40160
2008-08-12 17:55:54 +00:00
olveyra
d0f12be4c0 removed a bad character from comment that caused an encoding error
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40159
2008-08-12 16:18:03 +00:00
anibal
baf32540d6 we should add some svn hook to use pylint before commit :)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40158
2008-08-12 15:59:05 +00:00
olveyra
e3f70a8101 commented debug lines in last commit
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40157
2008-08-12 15:13:48 +00:00
olveyra
a690d33f24 added scheduler request queue limit for spiders (spider middleware)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40156
2008-08-12 15:11:15 +00:00
olveyra
290702d988 Cluster crawler fixes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40155
2008-08-09 02:01:41 +00:00