Pablo Hoffman
34355048c3
some functions were added to scrapy.utils.url without following our policies for adding tests (to scrapy.tests) and documentation (as docstrings). fixed that.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40186
2008-09-01 01:19:32 +00:00
olveyra
f3013bb9ad
Improved Adaptors code
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40185
2008-08-31 00:25:13 +00:00
olveyra
0e6562cb47
moved some url utils from decobot to scrapy
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40184
2008-08-27 17:37:32 +00:00
olveyra
9164150bed
- avoid to raise an exception when no arg is given to replay command
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40183
2008-08-27 17:21:12 +00:00
olveyra
b11b84fff1
- moved scrape command to shell
...
- fixes
- get and scrapehelp functions added as ipython magic commands
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40182
2008-08-27 13:52:45 +00:00
Pablo Hoffman
bfe6168f3b
cleaned up simpages code a bit, added some documentation
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40181
2008-08-25 00:00:14 +00:00
Pablo Hoffman
eee86b9827
added prototype page similarity code, to detect different layouts
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40180
2008-08-24 19:10:27 +00:00
olveyra
397d3ff247
Added a synchronous get method which also updates console user namespace.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40179
2008-08-23 18:21:47 +00:00
olveyra
e83dcb588e
allow to use scrape command without an url
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40178
2008-08-22 13:38:29 +00:00
olveyra
77053113cd
reverted clean_markup code movement
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40177
2008-08-21 17:12:32 +00:00
olveyra
a2bd70ba21
moved clean_markup to scrapy.utils.markup
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40176
2008-08-21 15:07:55 +00:00
olveyra
643ea99f36
fixed a clean code movement error: forget to apply remove tags when
...
text does not contains cdata
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40175
2008-08-20 12:52:31 +00:00
Pablo Hoffman
1c8c73ebfa
added some validation to new spider module names
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40174
2008-08-19 19:52:31 +00:00
olveyra
17dec39c29
removed temporal fix in 171
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40173
2008-08-19 13:42:32 +00:00
olveyra
0f49c7c0d4
temporal fix to avoid exceptions before commit in decobot
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40172
2008-08-18 15:53:53 +00:00
olveyra
5b3662ee89
- Added generic clean adaptors
...
- removed attribute name from adaptor function method (adaptors should
not nor need to know attribute names)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40171
2008-08-18 15:18:40 +00:00
olveyra
8877426a13
minor fixes
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40170
2008-08-15 17:04:36 +00:00
Andres Moreira
801b804a4d
Added support to replay update to crawl again all the pages downloaded in the replay file.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40169
2008-08-15 14:59:48 +00:00
olveyra
9b0dd66ec1
improved explanation comment of the RequestLimitMiddleware
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40168
2008-08-15 12:35:09 +00:00
Andres Moreira
ee59bd87ab
Changed messages of downloaded respones to received respones.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40167
2008-08-14 12:23:37 +00:00
Andres Moreira
cdd8895614
The response downloadeds are manage by a new signal, response_received and I changed the methods associated that. Changed the method response_download to response_received. Added code to support the update in response_received.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40166
2008-08-14 12:22:07 +00:00
anibal
d95e542374
ignoring temp directories for spider tests
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40165
2008-08-14 11:53:29 +00:00
Pablo Hoffman
3975bf95c7
added response_received signal
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40164
2008-08-14 09:57:17 +00:00
olveyra
632b975ce7
import fix
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40163
2008-08-14 01:13:14 +00:00
Pablo Hoffman
280ad944ea
changed setting default value
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40162
2008-08-13 21:06:49 +00:00
olveyra
8e72e4e60e
- Introduction of class BaseAdaptor
...
- Contrib Adaptors
- location_str moved from decobot to scrapy
- Added setting DEFAULT_DATA_ENCODING
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40161
2008-08-13 19:49:25 +00:00
olveyra
3a018cadff
avoid trying to stop a not running task (this bug
...
caused stalled processes in production servers)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40160
2008-08-12 17:55:54 +00:00
olveyra
d0f12be4c0
removed a bad character from comment that caused an encoding error
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40159
2008-08-12 16:18:03 +00:00
anibal
baf32540d6
we should add some svn hook to use pylint before commit :)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40158
2008-08-12 15:59:05 +00:00
olveyra
e3f70a8101
commented debug lines in last commit
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40157
2008-08-12 15:13:48 +00:00
olveyra
a690d33f24
added scheduler request queue limit for spiders (spider middleware)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40156
2008-08-12 15:11:15 +00:00
olveyra
290702d988
Cluster crawler fixes
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40155
2008-08-09 02:01:41 +00:00
olveyra
5b4d9f8f85
fix
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40154
2008-08-08 17:16:27 +00:00
olveyra
29cd1bc3cb
Added pb-capable crawler. The idea is to improve cluster performance
...
adding communication between crawler and master.
At the momento, a remote stop method to the crawler was added to sustitute
the previous stop based on kernel signal.
Further will add monitoring functionality, because the processes are
very silent, mainly when unavailable report is not issued,
and offen happens lots of thing that nobody realize on if some
fortuite events wouldn't happent
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40153
2008-08-08 16:57:33 +00:00
olveyra
6cfbe78d63
Added a periodic ping to maintain mysql connection active
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40152
2008-08-07 16:01:25 +00:00
olveyra
4606b7f9c5
deleting old cluster branch
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40151
2008-08-07 12:26:43 +00:00
olveyra
59d6e92582
upss...removing a print...
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40150
2008-08-07 11:59:44 +00:00
olveyra
3f7100fe1e
improved last change to left definitively there
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40149
2008-08-07 11:56:12 +00:00
olveyra
6b890c1b28
added a log to see if db settings are really being passed
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40148
2008-08-07 11:49:00 +00:00
olveyra
246bdedcfb
fixed a get setting in UrlToGuidService
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40147
2008-08-05 18:32:03 +00:00
olveyra
22eb48dae5
added MYSQL_CONNECTION_SETTINGS settings
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40146
2008-08-05 16:51:12 +00:00
olveyra
388f7641cf
reverted last change. Seems this options is available only in very
...
recent versions of mysql-python
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40145
2008-08-05 12:05:28 +00:00
olveyra
7fc5923249
add reconnect=1 parameter to mysql_connect in order to reconnect after
...
a large inactivity period
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40144
2008-08-05 11:41:49 +00:00
olveyra
7a195a9794
allow usage of integer "dont_filter", meaning number of times
...
dont_filter will apply when redirecting.
Under this scheme, dont_filter=True is the same as dont_filter=1
and (convenient?) dont filter <= 0 means dont_filter = True forever
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40143
2008-08-01 18:13:07 +00:00
elpolilla
3badb39ed7
removed report feature for not being generic
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40142
2008-07-31 23:08:54 +00:00
samus_
380a65e721
fix for the replays and cache timeout
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40141
2008-07-31 17:59:35 +00:00
elpolilla
0f65d2f208
total of variants added to the report feature
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40140
2008-07-31 15:42:50 +00:00
Matias Aguirre
f92538430b
Adding missing templates
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40139
2008-07-30 23:47:28 +00:00
elpolilla
f955c9693b
smallest change ever in the reports, but improves the act of reading them
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40138
2008-07-30 16:01:28 +00:00
elpolilla
c76200561d
bugfix in scraping report
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40137
2008-07-30 11:20:22 +00:00