Adrián Chaves
174769a3f0
Use a better name for the LxmlLinkExtractor subclassing test
2019-12-18 12:09:03 +01:00
Adrián Chaves
a4ef9750f9
Fix Flake8-reported issues
2019-12-13 14:32:06 +01:00
Adrián Chaves
1fc2b140c1
Merge branch 'master' into documentation-coverage
2019-12-05 14:43:36 +01:00
Adrián Chaves
33ef24c757
Add missing whitespace after ‘,’, ‘;’ or ‘:’
2019-11-13 10:52:05 +01:00
Adrián Chaves
7f4f98fd38
Provide complete API documentation coverage of scrapy.linkextractors
2019-09-30 18:22:28 +02:00
Eugenio Lacuesta
40086dabb8
Prevent more DeprecationWarnings
2019-07-13 22:14:00 -03:00
Matthieu Grandrie
e3b15252c8
New constructor arg *restrict_text* for FilteringLinkExtractor.
...
Same as allow and deny args, it holds a string, a regex or an iterable of. Links whose text don't match one of the regex are filtered out.
DOC restrict_text in LxmlLinkExtractor
2019-02-28 17:21:17 +01:00
nctl144
4c05441450
add ftp to the scheme list
2018-03-03 00:00:03 -05:00
Mikhail Korobov
2b4d46315f
TST fixed compatibility with new link extractor whitespace handling
2017-02-21 00:05:40 +05:00
Mikhail Korobov
47f7da8724
canonicalize=False by default for LinkExtractor. Fixes GH-1941.
2017-02-20 22:58:11 +05:00
Mikhail Korobov
d079e15fe2
Strip leading/trailing whitespaces in link extractors. Fixes GH-838.
2017-02-16 02:22:17 +05:00
Mikhail Korobov
44bfcbcf0f
TST split LinkExtractorTestCase.test_extraction into several methods; remove duplicated test
2015-08-31 00:49:38 +05:00
Mikhail Korobov
9bfe6ece59
Merge branch 'master' into py3-linkextractors
...
Conflicts:
scrapy/linkextractors/lxmlhtml.py
tests/test_linkextractors.py
2015-08-28 04:53:32 +05:00
Mikhail Korobov
f2edbd05de
PY3 port LinkExtractor
...
* tests for other link extractors are moved to test_linkextractors_deprecated.py
* in Python 3 Link is converted to use native strings for urls
* minor cleanups
2015-08-28 04:11:30 +05:00
Mikhail Korobov
f46a450080
refactor test_linkextractors
...
* rename LinkExtractorTestCase to BaseSgmlLinkExtractorTestCase
* add BaseLinkExtractorTestCase link extractor tests can inherit from
and decouple it from SgmlLinkExtractor
* add an extra check for deny_extensions
* xfail test_restrict_xpaths_with_html_entities for LxmlLinkExtractor explicitly
2015-08-28 04:11:30 +05:00
Rafał Gutkowski
cb3007c066
support link rel attribute with multiple values
2015-08-27 20:13:47 +02:00
Andrew Scorpil
de15fcdf33
[LinkExtractors] Ignore bogus links
...
(rebased the code for scrapy 1.0 and made a few code improvements --nyov)
2015-08-15 00:16:39 +00:00
Julia Medina
cf064b1437
Move scrapy/contrib/linkextractors to scrapy/linkextractors
2015-04-29 21:24:30 -03:00