1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-24 08:03:59 +00:00

4186 Commits

Author SHA1 Message Date
Pablo Hoffman
34355048c3 some functions were added to scrapy.utils.url without following our policies for adding tests (to scrapy.tests) and documentation (as docstrings). fixed that.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40186
2008-09-01 01:19:32 +00:00
olveyra
f3013bb9ad Improved Adaptors code
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40185
2008-08-31 00:25:13 +00:00
olveyra
0e6562cb47 moved some url utils from decobot to scrapy
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40184
2008-08-27 17:37:32 +00:00
olveyra
9164150bed - avoid to raise an exception when no arg is given to replay command
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40183
2008-08-27 17:21:12 +00:00
olveyra
b11b84fff1 - moved scrape command to shell
- fixes
- get and scrapehelp functions added as ipython magic commands

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40182
2008-08-27 13:52:45 +00:00
Pablo Hoffman
bfe6168f3b cleaned up simpages code a bit, added some documentation
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40181
2008-08-25 00:00:14 +00:00
Pablo Hoffman
eee86b9827 added prototype page similarity code, to detect different layouts
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40180
2008-08-24 19:10:27 +00:00
olveyra
397d3ff247 Added a synchronous get method which also updates console user namespace.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40179
2008-08-23 18:21:47 +00:00
olveyra
e83dcb588e allow to use scrape command without an url
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40178
2008-08-22 13:38:29 +00:00
olveyra
77053113cd reverted clean_markup code movement
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40177
2008-08-21 17:12:32 +00:00
olveyra
a2bd70ba21 moved clean_markup to scrapy.utils.markup
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40176
2008-08-21 15:07:55 +00:00
olveyra
643ea99f36 fixed a clean code movement error: forget to apply remove tags when
text does not contains cdata

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40175
2008-08-20 12:52:31 +00:00
Pablo Hoffman
1c8c73ebfa added some validation to new spider module names
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40174
2008-08-19 19:52:31 +00:00
olveyra
17dec39c29 removed temporal fix in 171
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40173
2008-08-19 13:42:32 +00:00
olveyra
0f49c7c0d4 temporal fix to avoid exceptions before commit in decobot
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40172
2008-08-18 15:53:53 +00:00
olveyra
5b3662ee89 - Added generic clean adaptors
- removed attribute name from adaptor function method (adaptors should
not nor need to know attribute names)

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40171
2008-08-18 15:18:40 +00:00
olveyra
8877426a13 minor fixes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40170
2008-08-15 17:04:36 +00:00
Andres Moreira
801b804a4d Added support to replay update to crawl again all the pages downloaded in the replay file.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40169
2008-08-15 14:59:48 +00:00
olveyra
9b0dd66ec1 improved explanation comment of the RequestLimitMiddleware
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40168
2008-08-15 12:35:09 +00:00
Andres Moreira
ee59bd87ab Changed messages of downloaded respones to received respones.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40167
2008-08-14 12:23:37 +00:00
Andres Moreira
cdd8895614 The response downloadeds are manage by a new signal, response_received and I changed the methods associated that. Changed the method response_download to response_received. Added code to support the update in response_received.
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40166
2008-08-14 12:22:07 +00:00
anibal
d95e542374 ignoring temp directories for spider tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40165
2008-08-14 11:53:29 +00:00
Pablo Hoffman
3975bf95c7 added response_received signal
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40164
2008-08-14 09:57:17 +00:00
olveyra
632b975ce7 import fix
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40163
2008-08-14 01:13:14 +00:00
Pablo Hoffman
280ad944ea changed setting default value
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40162
2008-08-13 21:06:49 +00:00
olveyra
8e72e4e60e - Introduction of class BaseAdaptor
- Contrib Adaptors
- location_str moved from decobot to scrapy
- Added setting DEFAULT_DATA_ENCODING

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40161
2008-08-13 19:49:25 +00:00
olveyra
3a018cadff avoid trying to stop a not running task (this bug
caused stalled processes in production servers)

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40160
2008-08-12 17:55:54 +00:00
olveyra
d0f12be4c0 removed a bad character from comment that caused an encoding error
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40159
2008-08-12 16:18:03 +00:00
anibal
baf32540d6 we should add some svn hook to use pylint before commit :)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40158
2008-08-12 15:59:05 +00:00
olveyra
e3f70a8101 commented debug lines in last commit
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40157
2008-08-12 15:13:48 +00:00
olveyra
a690d33f24 added scheduler request queue limit for spiders (spider middleware)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40156
2008-08-12 15:11:15 +00:00
olveyra
290702d988 Cluster crawler fixes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40155
2008-08-09 02:01:41 +00:00
olveyra
5b4d9f8f85 fix
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40154
2008-08-08 17:16:27 +00:00
olveyra
29cd1bc3cb Added pb-capable crawler. The idea is to improve cluster performance
adding communication between crawler and master.
At the momento, a remote stop method to the crawler was added to sustitute
the previous stop based on kernel signal.
Further will add monitoring functionality, because the processes are
very silent, mainly when unavailable report is not issued,
and offen happens lots of thing that nobody realize on if some
fortuite events wouldn't happent

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40153
2008-08-08 16:57:33 +00:00
olveyra
6cfbe78d63 Added a periodic ping to maintain mysql connection active
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40152
2008-08-07 16:01:25 +00:00
olveyra
4606b7f9c5 deleting old cluster branch
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40151
2008-08-07 12:26:43 +00:00
olveyra
59d6e92582 upss...removing a print...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40150
2008-08-07 11:59:44 +00:00
olveyra
3f7100fe1e improved last change to left definitively there
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40149
2008-08-07 11:56:12 +00:00
olveyra
6b890c1b28 added a log to see if db settings are really being passed
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40148
2008-08-07 11:49:00 +00:00
olveyra
246bdedcfb fixed a get setting in UrlToGuidService
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40147
2008-08-05 18:32:03 +00:00
olveyra
22eb48dae5 added MYSQL_CONNECTION_SETTINGS settings
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40146
2008-08-05 16:51:12 +00:00
olveyra
388f7641cf reverted last change. Seems this options is available only in very
recent versions of mysql-python

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40145
2008-08-05 12:05:28 +00:00
olveyra
7fc5923249 add reconnect=1 parameter to mysql_connect in order to reconnect after
a large inactivity period

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40144
2008-08-05 11:41:49 +00:00
olveyra
7a195a9794 allow usage of integer "dont_filter", meaning number of times
dont_filter will apply when redirecting.

Under this scheme, dont_filter=True is the same as dont_filter=1
and (convenient?) dont filter <= 0 means dont_filter = True forever

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40143
2008-08-01 18:13:07 +00:00
elpolilla
3badb39ed7 removed report feature for not being generic
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40142
2008-07-31 23:08:54 +00:00
samus_
380a65e721 fix for the replays and cache timeout
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40141
2008-07-31 17:59:35 +00:00
elpolilla
0f65d2f208 total of variants added to the report feature
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40140
2008-07-31 15:42:50 +00:00
Matias Aguirre
f92538430b Adding missing templates
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40139
2008-07-30 23:47:28 +00:00
elpolilla
f955c9693b smallest change ever in the reports, but improves the act of reading them
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40138
2008-07-30 16:01:28 +00:00
elpolilla
c76200561d bugfix in scraping report
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40137
2008-07-30 11:20:22 +00:00