1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-23 17:23:44 +00:00

14 Commits

Author SHA1 Message Date
Adrián Chaves
33ef24c757 Add missing whitespace after ‘,’, ‘;’ or ‘:’ 2019-11-13 10:52:05 +01:00
Eugenio Lacuesta
40086dabb8 Prevent more DeprecationWarnings 2019-07-13 22:14:00 -03:00
Matthieu Grandrie
e3b15252c8 New constructor arg *restrict_text* for FilteringLinkExtractor.
Same as allow and deny args, it holds a string, a regex or an iterable of. Links whose text don't match one of the regex are filtered out.
DOC restrict_text in LxmlLinkExtractor
2019-02-28 17:21:17 +01:00
nctl144
4c05441450 add ftp to the scheme list 2018-03-03 00:00:03 -05:00
Mikhail Korobov
2b4d46315f TST fixed compatibility with new link extractor whitespace handling 2017-02-21 00:05:40 +05:00
Mikhail Korobov
47f7da8724 canonicalize=False by default for LinkExtractor. Fixes GH-1941. 2017-02-20 22:58:11 +05:00
Mikhail Korobov
d079e15fe2 Strip leading/trailing whitespaces in link extractors. Fixes GH-838. 2017-02-16 02:22:17 +05:00
Mikhail Korobov
44bfcbcf0f TST split LinkExtractorTestCase.test_extraction into several methods; remove duplicated test 2015-08-31 00:49:38 +05:00
Mikhail Korobov
9bfe6ece59 Merge branch 'master' into py3-linkextractors
Conflicts:
	scrapy/linkextractors/lxmlhtml.py
	tests/test_linkextractors.py
2015-08-28 04:53:32 +05:00
Mikhail Korobov
f2edbd05de PY3 port LinkExtractor
* tests for other link extractors are moved to test_linkextractors_deprecated.py
* in Python 3 Link is converted to use native strings for urls
* minor cleanups
2015-08-28 04:11:30 +05:00
Mikhail Korobov
f46a450080 refactor test_linkextractors
* rename LinkExtractorTestCase to BaseSgmlLinkExtractorTestCase
* add BaseLinkExtractorTestCase link extractor tests can inherit from
  and decouple it from SgmlLinkExtractor
* add an extra check for deny_extensions
* xfail test_restrict_xpaths_with_html_entities for LxmlLinkExtractor explicitly
2015-08-28 04:11:30 +05:00
Rafał Gutkowski
cb3007c066 support link rel attribute with multiple values 2015-08-27 20:13:47 +02:00
Andrew Scorpil
de15fcdf33 [LinkExtractors] Ignore bogus links
(rebased the code for scrapy 1.0 and made a few code improvements --nyov)
2015-08-15 00:16:39 +00:00
Julia Medina
cf064b1437 Move scrapy/contrib/linkextractors to scrapy/linkextractors 2015-04-29 21:24:30 -03:00