elpolilla
f5eb71fb69
Fixed bug in url_query_cleaner that returned wrong parameters for urls with fragments and added test
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40379
2008-11-14 01:32:49 +00:00
Pablo Hoffman
79c9ac388d
updated scrapy.utils.misc.hash_values to drop usage of deprecated sha module in favor or hashlib, added tests to hash_values function
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40378
2008-11-13 15:50:27 +00:00
elpolilla
01b157d10b
Reverted delist adaptor
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40377
2008-11-13 12:27:36 +00:00
elpolilla
5f4053d9fc
Reverted to revision 370
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40376
2008-11-13 12:20:32 +00:00
elpolilla
fab0e6938c
Removed unnecesary check in RobustScrapedItem's getattr
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40375
2008-11-13 11:01:44 +00:00
elpolilla
4afa29b26a
Fixed adaptors test
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40374
2008-11-13 10:46:43 +00:00
elpolilla
4ef7c46303
Added ItemAttribute objects
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40373
2008-11-13 10:35:24 +00:00
elpolilla
90d28badf6
Turned unquote adaptor into a function, and added adaptor factory
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40372
2008-11-13 10:13:11 +00:00
elpolilla
36080ada86
Added some missing docstrings to adaptors
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40371
2008-11-13 10:04:04 +00:00
elpolilla
323241b6d6
Reverted changes in r368
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40370
2008-11-11 12:51:39 +00:00
elpolilla
c5d134053b
Added possibility of checking whether a response should or shouldnt be parsed with the corresponding parse_suffix method (by defining check_suffix) in CrawlSpider
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40369
2008-11-10 15:30:45 +00:00
elpolilla
6db6b8c52e
GUID wasnt being set when calling a spider's parse_url method (from the parse command). Fixed that.
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40368
2008-11-10 12:26:11 +00:00
samus_
22d713bba9
seems scrapy uses a test spider (scrapy.tests.test_spiders.testplugin.TestSpider) moving the test to decobot
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40367
2008-11-08 13:34:02 +00:00
samus_
3e88890bc3
added check for valid code on spiders
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40366
2008-11-08 12:26:54 +00:00
Damian Canabal
5e54ac52ff
fixed new_response_from_xpaths function, unicode string was passed as body instead of ResponseBody object
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40365
2008-11-04 17:45:26 +00:00
elpolilla
a2461fbeea
Wrong attribute assignation fixed again
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40364
2008-11-04 14:06:28 +00:00
elpolilla
224d3c5185
Bugfix in parse command
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40363
2008-11-04 13:27:31 +00:00
elpolilla
5bad79836f
Fixed bad checking while setting attributes
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40362
2008-11-04 11:26:23 +00:00
elpolilla
bd38a312d4
- Fixed bug in attributes assignation (empty attributes being set)
...
- Added GUID setting to FeedSpider
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40361
2008-11-04 10:57:59 +00:00
samus_
9b46c20da2
moved xpathselector_iternodes from scrapy.utils.xml to scrapy.utils.iterators and renamed it to "xmliter", also renamed csv_iter to csviter and added tests
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40360
2008-11-03 16:10:43 +00:00
elpolilla
1ef65b97b5
Fixed typo
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40359
2008-11-03 14:00:33 +00:00
elpolilla
1a45754cf2
Removed an out-of-scrapy reference in the BasicSpider
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40358
2008-11-03 13:57:21 +00:00
elpolilla
bc13a5924a
Removed ugly loading of string codecs
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40357
2008-11-03 12:28:00 +00:00
elpolilla
defcb45120
Implemented CrawlSpider and XMLFeedSpider classes
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40356
2008-11-03 11:48:43 +00:00
elpolilla
f2bab50979
- Fixed bad implementation of the SetGUIDPipeline
...
- Modified item's attribute method to have an optional 'add' argument
- Renamed normalize_urls adaptor to canonicalize_urls
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40355
2008-11-03 11:02:58 +00:00
elpolilla
776818db71
Little change in test spider that broke the engine test
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40354
2008-11-03 10:43:40 +00:00
elpolilla
0574bbd44a
Added some improvements to LinkExtractor
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40353
2008-11-03 10:24:50 +00:00
elpolilla
6a2e288f22
Added support for x-mac-roman string codec
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40352
2008-10-31 11:52:35 +00:00
elpolilla
9882679bbb
Bugfix in ExtractImages adaptor
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40351
2008-10-29 02:25:32 +00:00
elpolilla
d61cd60756
Moved SetGUIDPipeline to contrib/item because it was unnecesary to put it on an exclusive module
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40350
2008-10-29 01:35:23 +00:00
elpolilla
c73dc5ad6c
Added normalize_urls adaptor, which was mentioned in the previous changeset, but not actually commited
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40349
2008-10-29 01:28:04 +00:00
elpolilla
377bea4976
- Added SetGUIDPipeline and the guid generation helper for the BasicSpider
...
- Fixed some issues with BasicSpider
- Added a normalize_url adaptor
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40348
2008-10-29 01:25:59 +00:00
elpolilla
4288cb3f17
- Removed AdaptorFunc objects
...
- Changed "AdaptorPipe" to "AdaptorDict"
- Moved adaptors to contrib/adaptors
- Fixed some tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40347
2008-10-27 11:58:56 +00:00
Pablo Hoffman
ed98a84235
item sampler: added ITEMSAMPLER_MAX_RESPONSE_SIZE support, keeping only the first item scraped (in spidermiddleware)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40346
2008-10-27 11:20:32 +00:00
Pablo Hoffman
1b4d41321e
added note for debian distros
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40345
2008-10-27 03:39:11 +00:00
Pablo Hoffman
f4dab4eb45
added ItemSamplerPipeline
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40344
2008-10-27 03:38:03 +00:00
Pablo Hoffman
968a55eaf3
improved RequestLimitMiddleware to conform to better programming standards
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40343
2008-10-27 01:24:19 +00:00
Pablo Hoffman
23524ccc86
ScrapedItem cannot have a constructor that receives the adaptor_pipe. adaptor_pipe must be assigned by calling set_adaptors() from the outside (typically in a spider)
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40342
2008-10-24 03:42:05 +00:00
elpolilla
ebe847bc6b
Activated adaptor pipeline creation at items __init__
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40341
2008-10-24 00:37:51 +00:00
elpolilla
42e6ed74e5
Modified some code to avoid problems with the _adaptor_pipe attribute and the replays, or with the item themselves
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40340
2008-10-24 00:29:27 +00:00
elpolilla
06c2509634
Improved adaptors code and fixed some tests related with that
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40339
2008-10-24 00:20:14 +00:00
Pablo Hoffman
91ddfd6f80
removed debugging code
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40338
2008-10-23 12:44:24 +00:00
Pablo Hoffman
ce3bbd1a2b
enabled unsafeTracebacks to master for sending full tracebacks to workers, splitted master scheduled() method in 2 methods: schedule() and reschedule()
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40337
2008-10-23 12:43:31 +00:00
Pablo Hoffman
215151dd86
improved worker error logging for communication with master
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40336
2008-10-23 12:41:49 +00:00
Pablo Hoffman
005044b0d8
web console: added support for logging to a different file using WEBCONSOLE_LOGFILE setting
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40335
2008-10-23 11:39:03 +00:00
Pablo Hoffman
874ac0c256
changes to logging and DEFAULT_PRIORITY removed
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40334
2008-10-23 04:44:40 +00:00
Pablo Hoffman
52596f350c
some more fixes to cluster worker
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40333
2008-10-23 04:43:41 +00:00
Pablo Hoffman
3e1ad8d653
removed commas from log messages
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40332
2008-10-23 04:00:13 +00:00
Pablo Hoffman
928112a989
added ResponseCode class to contain all response codes, and other assorted code improvements
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40331
2008-10-23 01:26:48 +00:00
Pablo Hoffman
4b03435ca0
enabled unsafeTracebacks in worker to send full tracebacks to master
...
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40330
2008-10-23 01:15:37 +00:00