1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 16:04:24 +00:00

849 Commits

Author SHA1 Message Date
elpolilla
e5a764cd2a . Modified LinkExtractors extract_links for being inconsistent. Moved extra parameters to the constructors.
. Renamed ImageLinkExtractor to HtmlLinkExtractor and moved it to scrapy.contrib.link_extractors.

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40599
2009-01-02 14:06:51 +00:00
elpolilla
2979521b63 Reverted change in r590 for inconsistency in the addition operation (doesnt work equally in both ways)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40598
2009-01-02 13:07:18 +00:00
samus_
ebd5f465aa this changeset improves the extractors' implementation:
* moved LinkExtractor.extract_links to a private method and created wrapper in order to be able to work with text directly
* removed fugly new_response_from_xpaths from scrapy.utils.response and replaced it with a better internal algorithm
* moved former _normalize_input from scrapy.utils.iterators to scrapy.utils.response to fill the hole
* turned extractors' output lists into generators; this is safe because the result is always used in for..in constructs
* adapted test for generators (test should be rewritted anyways)

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40597
2009-01-02 08:42:36 +00:00
elpolilla
c9e48dc5ed Disabled ImageLinkExtractor test for triggering mysterious leaks in libxml2
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40596
2009-01-02 04:12:59 +00:00
elpolilla
e0cb3c1ec4 Added test's missing sample file
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40595
2009-01-02 02:36:27 +00:00
elpolilla
91a23e61bf Renamed LinkExtractors extract_urls method to extract_links
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40594
2009-01-02 02:34:44 +00:00
elpolilla
c82c799d07 Added ImageLinkExtractor's missing docstring
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40593
2009-01-02 02:25:17 +00:00
elpolilla
5757e54a28 - Added repr method to Link objects
- Added ImageLinkExtractor and tests

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40592
2009-01-02 02:18:11 +00:00
elpolilla
7bd73e0c09 Modified XPathSelectorLists: adding them should return a new XPathSelectorList
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40591
2008-12-31 13:23:25 +00:00
olveyra
99143ff357 doc string fix
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40590
2008-12-30 19:38:15 +00:00
Ismael Carnales
7e4c9200c2 corrected code blocks
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40589
2008-12-30 15:04:59 +00:00
olveyra
1433af5561 fix to commit (r586)
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40588
2008-12-30 14:51:37 +00:00
olveyra
710247e096 Allow to load a default spider when no spider was found for a given url. Also a generic spider
was added in contrib/spiders

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40587
2008-12-30 13:49:00 +00:00
Pablo Hoffman
d047409579 rearranged and sorted out default scrapy settings. closes #20
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40586
2008-12-30 13:34:57 +00:00
Pablo Hoffman
1f247b1efc added settings documentation topic, and completed available settings reference. closes #30
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40585
2008-12-30 13:28:36 +00:00
olveyra
af79bedf8f Added doc line in RegexLinkExtractor for the case when no allow/deny argument is given
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40584
2008-12-30 13:04:06 +00:00
Ismael Carnales
8ef8766070 the badge is back to green ... hulk is angry
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40583
2008-12-30 11:27:19 +00:00
Ismael Carnales
ad2c2368a7 using small gray badge
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40582
2008-12-30 11:25:42 +00:00
Ismael Carnales
c149e2de2c removed menu from footer, added django badge
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40581
2008-12-30 11:16:32 +00:00
Ismael Carnales
c9491b6041 replicating header menu in footer
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40580
2008-12-30 10:56:15 +00:00
Ismael Carnales
81fecc0067 fixed settings indentation and added reference
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40579
2008-12-30 10:45:19 +00:00
Ismael Carnales
36277f3a32 changed settings from description unit to crossreference
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40578
2008-12-29 19:02:56 +00:00
Ismael Carnales
5a038e7d27 added docs breadrumb
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40577
2008-12-29 18:46:43 +00:00
Ismael Carnales
e36f5624a7 fixed caps in news, reordered menu items
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40576
2008-12-29 18:35:13 +00:00
Ismael Carnales
d619a11f88 make sidebar bigger
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40575
2008-12-29 18:28:40 +00:00
Ismael Carnales
2f2c5cf19d removed django.contrib.comments requirement
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40574
2008-12-29 18:27:16 +00:00
Ismael Carnales
3d85932ae0 moved blog to news
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40573
2008-12-29 18:20:58 +00:00
Ismael Carnales
48682189b4 using a modified version of django simple blog
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40572
2008-12-29 16:50:49 +00:00
Ismael Carnales
a5b609ac5f moved blog templates to backup folder
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40571
2008-12-29 16:13:19 +00:00
elpolilla
25a38e53d5 Fixed minor encoding issues in adaptors
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40570
2008-12-29 15:54:16 +00:00
elpolilla
bbd2a9d6f0 Fixed typo in test
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40569
2008-12-29 15:17:34 +00:00
elpolilla
5eb7e5ba89 Fixed lots of encoding issues, and improved some adaptors tests
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40568
2008-12-29 15:10:21 +00:00
Pablo Hoffman
fc3f66bd1c fixed grammar error
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40567
2008-12-29 12:10:01 +00:00
Pablo Hoffman
b5a34dd21d started writing settings documentation
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40566
2008-12-29 11:38:34 +00:00
Pablo Hoffman
a593af5f53 fixed cyclic import between scrapy.core.engine and scrapy.utils.db
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40565
2008-12-28 08:45:27 +00:00
Pablo Hoffman
8a885e4dc8 moved scrapy-docs svn:external where it belongs
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40564
2008-12-27 21:38:27 +00:00
Pablo Hoffman
06ad08c681 moved dia diagram to docs/media
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40563
2008-12-27 21:32:18 +00:00
Pablo Hoffman
7e5b85ca9e moved scrapy docs from website source to scrapy source, since it makes more sense there
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40562
2008-12-27 21:27:22 +00:00
Pablo Hoffman
97d4754200 moved docs to docs-old
--HG--
rename : scrapy/trunk/docs/scrapy-architecture.dia => scrapy/trunk/docs-old/scrapy-architecture.dia
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40561
2008-12-27 21:20:29 +00:00
Pablo Hoffman
913c34a828 added AUTHORS file
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40560
2008-12-27 19:57:10 +00:00
Pablo Hoffman
77a99999f8 improved text
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40559
2008-12-27 19:16:12 +00:00
Pablo Hoffman
a617c2cb36 some updates to home and download page
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40558
2008-12-27 19:15:13 +00:00
Pablo Hoffman
ed5e25c913 removed tagline from scrapy logo
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40557
2008-12-27 18:21:37 +00:00
Pablo Hoffman
f932a3d842 added spider_exceptions to scrapy stats
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40556
2008-12-27 17:37:05 +00:00
Pablo Hoffman
6fa33f4463 fixed minor bug in scrapy manager
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40555
2008-12-27 02:06:05 +00:00
Pablo Hoffman
e89ad0be8c fixed minor bug in scrapy manager
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40554
2008-12-27 02:05:05 +00:00
Pablo Hoffman
9557e24203 added start_requests method to BaseSpider, made start_urls empty by default instead of a required attribute in every spider
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40553
2008-12-27 01:07:29 +00:00
elpolilla
5b159d0f04 - Improved unquote_markup by using generators instead of lists
- Added possibility of specifying headers in items_to_csv

--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40552
2008-12-26 18:53:15 +00:00
elpolilla
f59f1c8bc0 Added items_to_csv function
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40551
2008-12-26 16:10:03 +00:00
elpolilla
c7332cd372 Updated scrapy tutorial
--HG--
extra : convert_revision : svn%3Ab85faa78-f9eb-468e-a121-7cced6da292c%40550
2008-12-26 14:03:45 +00:00