1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 23:23:48 +00:00

2663 Commits

Author SHA1 Message Date
olveyra
632b975ce7 import fix
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40163
2008-08-14 01:13:14 +00:00
Pablo Hoffman
280ad944ea changed setting default value
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40162
2008-08-13 21:06:49 +00:00
olveyra
8e72e4e60e - Introduction of class BaseAdaptor
- Contrib Adaptors
- location_str moved from decobot to scrapy
- Added setting DEFAULT_DATA_ENCODING

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40161
2008-08-13 19:49:25 +00:00
olveyra
3a018cadff avoid trying to stop a not running task (this bug
caused stalled processes in production servers)

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40160
2008-08-12 17:55:54 +00:00
olveyra
d0f12be4c0 removed a bad character from comment that caused an encoding error
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40159
2008-08-12 16:18:03 +00:00
anibal
baf32540d6 we should add some svn hook to use pylint before commit :)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40158
2008-08-12 15:59:05 +00:00
olveyra
e3f70a8101 commented debug lines in last commit
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40157
2008-08-12 15:13:48 +00:00
olveyra
a690d33f24 added scheduler request queue limit for spiders (spider middleware)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40156
2008-08-12 15:11:15 +00:00
olveyra
290702d988 Cluster crawler fixes
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40155
2008-08-09 02:01:41 +00:00
olveyra
5b4d9f8f85 fix
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40154
2008-08-08 17:16:27 +00:00
olveyra
29cd1bc3cb Added pb-capable crawler. The idea is to improve cluster performance
adding communication between crawler and master.
At the momento, a remote stop method to the crawler was added to sustitute
the previous stop based on kernel signal.
Further will add monitoring functionality, because the processes are
very silent, mainly when unavailable report is not issued,
and offen happens lots of thing that nobody realize on if some
fortuite events wouldn't happent

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40153
2008-08-08 16:57:33 +00:00
olveyra
6cfbe78d63 Added a periodic ping to maintain mysql connection active
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40152
2008-08-07 16:01:25 +00:00
olveyra
4606b7f9c5 deleting old cluster branch
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40151
2008-08-07 12:26:43 +00:00
olveyra
59d6e92582 upss...removing a print...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40150
2008-08-07 11:59:44 +00:00
olveyra
3f7100fe1e improved last change to left definitively there
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40149
2008-08-07 11:56:12 +00:00
olveyra
6b890c1b28 added a log to see if db settings are really being passed
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40148
2008-08-07 11:49:00 +00:00
olveyra
246bdedcfb fixed a get setting in UrlToGuidService
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40147
2008-08-05 18:32:03 +00:00
olveyra
22eb48dae5 added MYSQL_CONNECTION_SETTINGS settings
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40146
2008-08-05 16:51:12 +00:00
olveyra
388f7641cf reverted last change. Seems this options is available only in very
recent versions of mysql-python

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40145
2008-08-05 12:05:28 +00:00
olveyra
7fc5923249 add reconnect=1 parameter to mysql_connect in order to reconnect after
a large inactivity period

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40144
2008-08-05 11:41:49 +00:00
olveyra
7a195a9794 allow usage of integer "dont_filter", meaning number of times
dont_filter will apply when redirecting.

Under this scheme, dont_filter=True is the same as dont_filter=1
and (convenient?) dont filter <= 0 means dont_filter = True forever

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40143
2008-08-01 18:13:07 +00:00
elpolilla
3badb39ed7 removed report feature for not being generic
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40142
2008-07-31 23:08:54 +00:00
samus_
380a65e721 fix for the replays and cache timeout
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40141
2008-07-31 17:59:35 +00:00
elpolilla
0f65d2f208 total of variants added to the report feature
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40140
2008-07-31 15:42:50 +00:00
Matias Aguirre
f92538430b Adding missing templates
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40139
2008-07-30 23:47:28 +00:00
elpolilla
f955c9693b smallest change ever in the reports, but improves the act of reading them
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40138
2008-07-30 16:01:28 +00:00
elpolilla
c76200561d bugfix in scraping report
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40137
2008-07-30 11:20:22 +00:00
Matias Aguirre
7b0877c50e Changes:
* Simplify article app, it isn't necessary to save them
      in db, instead this tool should render static templates
      directly based in the url.
      Example: if the url is "/article/today" it will look for
      the template "articles/today.html" in articles templates
      directory. This app is configured to handle any url, so
      it will render an url like "/about" (if there isn't other
      url defined to handle "about" before article definition),
      and in this case will try to render the template
      "article.html" in articles templates dir

    * Removed models, not necessary now

    * Removed templatetags, not necessary now

    * Removed flatpages middleware ??

    * Added url to articles app, this will used as a last case
      to handle undefined urls.

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40136
2008-07-29 13:54:20 +00:00
Daniel Grana
28bb53fa22 process result's items using generators to give pipeline a chance to consume while parsing new items
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40135
2008-07-29 12:27:55 +00:00
elpolilla
640e8b9131 scrapy report util improved
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40134
2008-07-29 01:14:01 +00:00
olveyra
02b87f7d49 in settings template, set DEFAULT_ITEM_CLASS to
scrapy.item.ScrapedItem to enable console

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40133
2008-07-28 22:29:55 +00:00
olveyra
1cbfe46161 disabled webconsole by default
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40132
2008-07-28 22:11:19 +00:00
elpolilla
ea9571aaa0 --report option modified (i had forgotten to report the variants)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40131
2008-07-28 13:01:34 +00:00
elpolilla
f86324fac3 --report option added to crawl command
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40130
2008-07-28 12:42:00 +00:00
Pablo Hoffman
4b2e20abfd added scrapy.utils.markup module
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40129
2008-07-28 04:15:51 +00:00
olveyra
c9c624dd66 minor adjustments
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40128
2008-07-27 22:00:08 +00:00
olveyra
86c7f37d6a moved options nocache and nopipeline from decobot to scrapy
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40127
2008-07-27 16:11:23 +00:00
olveyra
b59f62dc91 Added function convert_entity from decobot.utils.text_extraction
(to complete and fix revision 124)

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40126
2008-07-27 14:54:11 +00:00
olveyra
1e2ddb47a3 remove wrong import from decobot and added unquote_html
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40125
2008-07-27 02:49:20 +00:00
olveyra
137ec64318 minor adjustments and some fixes, readded scrapy-admin.py with
execution permission

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40124
2008-07-27 02:23:14 +00:00
olveyra
b851ec5f10 deleted scrapy-admin to commit again with execution access
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40123
2008-07-27 01:38:53 +00:00
samus_
fbb5860f49 re-enabling replays with the new mechanism
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40122
2008-07-26 15:51:39 +00:00
olveyra
bc00f8cce2 re-reverted commit 119 back again to 118.
The code removed is confusing.

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40121
2008-07-25 22:42:57 +00:00
Pablo Hoffman
67ee6bff2e restored code removed in r118. there's nothing wrong with that code
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40120
2008-07-25 21:28:49 +00:00
olveyra
c97c495d16 removed code that generates confusing and mistaken import error
message when the import error raises inside scrapy_settings.

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40119
2008-07-25 19:40:43 +00:00
samus_
0bb68f34f6 forgot to import sys module :P
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40118
2008-07-25 19:26:46 +00:00
olveyra
8e18ecd5ce first version of scrapy-admin
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40117
2008-07-25 19:11:01 +00:00
samus_
bd7c80ed0a adding new replay method (beta)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40116
2008-07-25 18:50:45 +00:00
Matias Aguirre
22e3d3a02e Add save_on_top to admin sections
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40115
2008-07-25 17:36:34 +00:00
Matias Aguirre
e421206573 Add publish field
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40114
2008-07-25 16:59:27 +00:00