scrapy

mirror of https://github.com/scrapy/scrapy.git synced 2025-02-23 08:03:53 +00:00

Author	SHA1	Message	Date
Adrián Chaves	174769a3f0	Use a better name for the LxmlLinkExtractor subclassing test	2019-12-18 12:09:03 +01:00
Adrián Chaves	a4ef9750f9	Fix Flake8-reported issues	2019-12-13 14:32:06 +01:00
Adrián Chaves	1fc2b140c1	Merge branch 'master' into documentation-coverage	2019-12-05 14:43:36 +01:00
Adrián Chaves	33ef24c757	Add missing whitespace after ‘,’, ‘;’ or ‘:’	2019-11-13 10:52:05 +01:00
Adrián Chaves	7f4f98fd38	Provide complete API documentation coverage of scrapy.linkextractors	2019-09-30 18:22:28 +02:00
Eugenio Lacuesta	40086dabb8	Prevent more DeprecationWarnings	2019-07-13 22:14:00 -03:00
Matthieu Grandrie	e3b15252c8	New constructor arg restrict_text for FilteringLinkExtractor. Same as allow and deny args, it holds a string, a regex or an iterable of. Links whose text don't match one of the regex are filtered out. DOC restrict_text in LxmlLinkExtractor	2019-02-28 17:21:17 +01:00
nctl144	4c05441450	add ftp to the scheme list	2018-03-03 00:00:03 -05:00
Mikhail Korobov	2b4d46315f	TST fixed compatibility with new link extractor whitespace handling	2017-02-21 00:05:40 +05:00
Mikhail Korobov	47f7da8724	canonicalize=False by default for LinkExtractor. Fixes GH-1941.	2017-02-20 22:58:11 +05:00
Mikhail Korobov	d079e15fe2	Strip leading/trailing whitespaces in link extractors. Fixes GH-838.	2017-02-16 02:22:17 +05:00
Mikhail Korobov	44bfcbcf0f	TST split LinkExtractorTestCase.test_extraction into several methods; remove duplicated test	2015-08-31 00:49:38 +05:00
Mikhail Korobov	9bfe6ece59	Merge branch 'master' into py3-linkextractors Conflicts: scrapy/linkextractors/lxmlhtml.py tests/test_linkextractors.py	2015-08-28 04:53:32 +05:00
Mikhail Korobov	f2edbd05de	PY3 port LinkExtractor * tests for other link extractors are moved to test_linkextractors_deprecated.py * in Python 3 Link is converted to use native strings for urls * minor cleanups	2015-08-28 04:11:30 +05:00
Mikhail Korobov	f46a450080	refactor test_linkextractors * rename LinkExtractorTestCase to BaseSgmlLinkExtractorTestCase * add BaseLinkExtractorTestCase link extractor tests can inherit from and decouple it from SgmlLinkExtractor * add an extra check for deny_extensions * xfail test_restrict_xpaths_with_html_entities for LxmlLinkExtractor explicitly	2015-08-28 04:11:30 +05:00
Rafał Gutkowski	cb3007c066	support link rel attribute with multiple values	2015-08-27 20:13:47 +02:00
Andrew Scorpil	de15fcdf33	[LinkExtractors] Ignore bogus links (rebased the code for scrapy 1.0 and made a few code improvements --nyov)	2015-08-15 00:16:39 +00:00
Julia Medina	cf064b1437	Move scrapy/contrib/linkextractors to scrapy/linkextractors	2015-04-29 21:24:30 -03:00

18 Commits