Pablo Hoffman
247fc26598
moved scrapy.tac to extras/
...
--HG--
rename : bin/scrapy.tac => extras/scrapy.tac
2010-06-13 23:09:08 -03:00
Pablo Hoffman
09182efaff
added scrapy-sqs.py to deployed scripts
2010-06-13 19:17:17 -03:00
Pablo Hoffman
37f71a9957
upstart script: exec twistd and use pidfile
2010-06-13 18:59:52 -03:00
Pablo Hoffman
91e1e0aff3
fixed bug and updated old code in googledir example project
2010-06-13 17:31:33 -03:00
Pablo Hoffman
bd16d1cd48
Added SMTP-AUTH support to scrapy.mail ( closes #149 )
2010-06-13 17:14:46 -03:00
Pablo Hoffman
495f23dea2
utils.serialize: added support for encoding Deferreds, and to refer spiders by name using 'spider::name'
2010-06-11 18:16:09 -03:00
Pablo Hoffman
1b083911e6
scrapy-ws.py: added stop command
2010-06-11 18:14:01 -03:00
Pablo Hoffman
ed5d7561f9
Added SQS Execution Queue, and example script to add spiders to the queue
2010-06-11 17:22:14 -03:00
olveyra
efe9811d92
Populate annotation metadata with data not used by IBL extractor.
2010-06-11 13:09:56 -03:00
Pablo Hoffman
ea8b5ddfd5
debian package: fix dh_auto_build confusing with Makefile, added scrapy-ws.py to deployed scripts
2010-06-11 12:48:35 -03:00
Pablo Hoffman
03912a6504
Added Ping Yin to AUTHORS
2010-06-11 11:33:02 -03:00
Pablo Hoffman
d13b50a234
Added sources and Makefile for building Debian package
2010-06-11 01:18:16 -03:00
Pablo Hoffman
d76276408e
scrapy.service: fixed minor logging bug on win32 platform with different line endings
2010-06-10 14:50:06 -03:00
Pablo Hoffman
a8b80f3e2f
scrapy.service: added support for logging stdout/stderr tails of finished processes
2010-06-10 14:08:54 -03:00
Pablo Hoffman
a33e8b507f
scrapy.service: fixed bug with process respawning
2010-06-10 13:39:45 -03:00
Pablo Hoffman
075b59f4af
some improvements and fixes to scrapy.service
2010-06-10 11:51:46 -03:00
Pablo Hoffman
6a33d6c4d0
* Added Scrapy Web Service with documentation and tests.
...
* Marked Web Console as deprecated.
* Removed Web Console documentation to discourage its use.
2010-06-09 13:46:22 -03:00
Pablo Hoffman
2499dfee5e
removed obsolete test
2010-06-09 13:06:05 -03:00
Daniel Grana
62f5c61a9d
fix broken request tests. refs #166
2010-06-09 00:44:18 -03:00
Pablo Hoffman
73305b1eb3
Added support for Requests without callbacks ( #166 ) - the Spider.parse() method
...
is used in those cases.
Also removed Request.deferred attribute.
2010-06-08 18:18:02 -03:00
Pablo Hoffman
76ed9d442b
Relocated some modules:
...
* scrapy.spider.middelware moved to scrapy.core.spidermw
* scrapy.core.scheduler.schedulers to scrapy.core.scheduler
* scrapy.core.scheduler.middleware to scrapy.core.schedulermw
Also removed dir: scrapy/core/scheduler/
--HG--
rename : scrapy/core/scheduler/schedulers.py => scrapy/core/scheduler.py
rename : scrapy/core/scheduler/middleware.py => scrapy/core/schedulermw.py
rename : scrapy/spider/middleware.py => scrapy/core/spidermw.py
2010-06-07 15:11:25 -03:00
Pablo Hoffman
72df5cb7ef
removed unused code
2010-06-03 01:07:40 -03:00
Pablo Hoffman
38b5793152
Some changes to telnet console:
...
* moved module from scrapy.management.telnet to scrapy.telnet (to minimize
nested modules)
* added signal for updating telnet console variables (fixes #165 )
--HG--
rename : scrapy/management/telnet.py => scrapy/telnet.py
2010-06-02 17:49:18 -03:00
Pablo Hoffman
4595c92cc2
Core logic improvement: wait for Downloader and Scraper to close the spiders before going on and finish closing them
2010-06-01 13:49:01 -03:00
Pablo Hoffman
9523cab25c
Fixed bug that was causing the engine to notify the manager of spider closes too early
2010-06-01 11:07:04 -03:00
Ping Yin
fcdc4ee7d9
downloadermiddleware/redirect: always do "HEAD" if origin request method is HEAD
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-05-04 16:11:45 +08:00
Pablo Hoffman
031eb1e5ed
removed no longer used SpiderScheduler (obsoleted by ExecutionQueue)
2010-05-28 17:27:15 -03:00
Rolando Espinoza La fuente
e995c5c7ff
Skipped IBL tests if nltk/numpy are not available.
2010-05-28 16:53:17 -03:00
Ismael Carnales
a71dc295af
Some mail improvements and tests.
...
* Add mail_sent signal and use it in MailSender
* Add MAIL_DEBUG setting to not send mails when testing
* Add MailSender tests
2010-05-28 16:51:47 -03:00
Pablo Hoffman
dfa7b23959
Fixed SpiderManager tests that failed with dropin.cache write permissions errors in some cases
...
--HG--
rename : scrapy/tests/test_contrib_spidermanager/spider1.py => scrapy/tests/test_contrib_spidermanager/test_spiders/spider1.py
rename : scrapy/tests/test_contrib_spidermanager/spider2.py => scrapy/tests/test_contrib_spidermanager/test_spiders/spider2.py
2010-05-26 11:58:31 -03:00
Pablo Hoffman
dff763c683
Removed Scrapy engine singleton from scrapy.core.engine.scrapyengine. Now
...
engine can only be accesed through Scrapy Manager 'engine' attribute - ie.
scrapy.core.manager.engine.
2010-05-26 10:29:32 -03:00
Pablo Hoffman
2d3135603e
added scrapy-ctl view command
2010-05-26 10:29:32 -03:00
Pablo Hoffman
2905a2083b
moved scrapy.command.models module to scrapy.command
2010-05-26 10:29:32 -03:00
Pablo Hoffman
14bfeabede
moved scrapy.command.cmdline module to scrapy.cmdline (keeping backwards compatibility until 0.10)
...
--HG--
rename : scrapy/command/cmdline.py => scrapy/cmdline.py
2010-05-26 10:29:32 -03:00
Pablo Hoffman
56abafec61
moved scrapy.command.commands module to scrapy.commands
...
--HG--
rename : scrapy/command/commands/__init__.py => scrapy/commands/__init__.py
rename : scrapy/command/commands/crawl.py => scrapy/commands/crawl.py
rename : scrapy/command/commands/fetch.py => scrapy/commands/fetch.py
rename : scrapy/command/commands/genspider.py => scrapy/commands/genspider.py
rename : scrapy/command/commands/list.py => scrapy/commands/list.py
rename : scrapy/command/commands/parse.py => scrapy/commands/parse.py
rename : scrapy/command/commands/runspider.py => scrapy/commands/runspider.py
rename : scrapy/command/commands/settings.py => scrapy/commands/settings.py
rename : scrapy/command/commands/shell.py => scrapy/commands/shell.py
rename : scrapy/command/commands/start.py => scrapy/commands/start.py
rename : scrapy/command/commands/startproject.py => scrapy/commands/startproject.py
2010-05-26 10:29:32 -03:00
Pablo Hoffman
cae22930c8
Added ExecutionQueue class for feeding spiders and requests to scrape. This
...
class can (and is meant to) be subclassed by projects that want to use a custom
mechanism for feeding spiders to crawl. For example, a queue that pulls spiders
to scrape from Amazon SQS (an example will be added soon).
Also introduced a rather big core refactoring of Scrapy manager and Scrapy
engine.
2010-05-26 10:29:32 -03:00
Pablo Hoffman
8c1feb7ae4
Ported S3ImagesStore to use boto threads. This simplifies the code and makes
...
the following things no longer needed:
1. custom spider for S3 requests (ex. _S3AmazonAWSSpider)
2. scrapy.contrib.aws.AWSMiddleware
3. scrapy.utils.aws
2010-05-26 10:29:32 -03:00
Daniel Grana
c8c19a8e53
Automated merge with ssh://hg.scrapy.org/scrapy
2010-05-21 17:54:41 -03:00
Daniel Grana
cce9c4da49
silence HttpError exceptions raised by httperror spidermiddleware if not handled by spider
2010-05-21 17:54:32 -03:00
Ping Yin
f2363afe6f
LinkExtractor: split _process_links from _extract_links
...
Separate the extraction and process logic, so we can override in subclass easier.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-27 14:58:11 +08:00
Ping Yin
6059221716
Compose: stop process on None value by default
...
By doing this, we can use str.lower as a processor safely without
checking whether the given value is None.
By passing stop_on_none=False as keyword argument, this behaviour can be changed.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-08 10:59:47 +08:00
Ping Yin
15b879f845
ItemLoader: Update docs for {add,replace,get}_{value,xpath}
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-05-18 17:54:25 +08:00
Ping Yin
8f53a72306
ItemLoader: add test for adding a dict value
...
After arg_to_iter is changed to return [arg] if arg is a dict,
the added test will pass.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-24 21:21:12 +08:00
Ping Yin
8497301784
arg_to_iter: return [arg] if arg is a dict
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-24 21:20:23 +08:00
Ping Yin
bd844f690b
{add,replace}_xpath: add processors, kw args and allow field_name to be None
...
Also add method get_xpath.
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:34:55 +08:00
Ping Yin
a6c315552c
ItemLoader: Update tests for {add,replace,get}_value
...
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:49:25 +08:00
Ping Yin
913b5db242
{add,replace,get}_value: accept keyword args, now only 're'
...
if re given, extract data from the given value by this regex
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:45:01 +08:00
Ping Yin
ddfaf6049f
{add,replace}_value: add processors args and allow field_name to be None
...
* value is first proccessed by processors before passing to input
processor
* if field_name is None, values for multiple fields may be
added/replaced. The keys of the processed value are as the field names
* add get_value function for the processor logic
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:42:55 +08:00
Ping Yin
cf35e09d35
ItemLoader: don't limit item to Item object
...
Now, for example, item can be a dict
Signed-off-by: Ping Yin <pkufranky@gmail.com>
2010-04-23 01:28:57 +08:00
Pablo Hoffman
bfd9cb42e5
Automated merge with http://hg.scrapy.org/scrapy-0.8
2010-05-17 20:11:27 -03:00