1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-26 06:23:41 +00:00

3943 Commits

Author SHA1 Message Date
Paul Tremberth
fd5b40593a Always enable offsite stats + refactor test to initialize crawler 2014-01-29 17:37:24 +01:00
Paul Tremberth
1a545157c6 Offsite: add 2 stats counters 2014-01-29 17:37:24 +01:00
Pablo Hoffman
21f0e4048b Merge pull request #565 from dangra/562-sgmlinkextractor
replace unencodeable codepoints with html entities
2014-01-27 22:01:39 -08:00
Pablo Hoffman
116a1df4b0 Merge pull request #557 from darkrho/expose-crawler-in-shell
Expose current crawler in the scrapy shell.
2014-01-27 21:08:02 -08:00
Daniel Graña
66829c962f replace unencodeable codepoints with html entities. fixes #562 and #285 2014-01-27 11:37:09 -02:00
Daniel Graña
8ecf0b786d Merge pull request #556 from darkrho/item-loader-nones
Make `ItemLoader` ignore `None` values from processors.
2014-01-27 04:21:09 -08:00
Daniel Graña
b14dabb281 Merge pull request #561 from redapple/regexlx-encoding
RegexLinkExtractor: encode URL unicode value when creating Links
2014-01-24 07:33:23 -08:00
Rolando Espinoza
4255e12bc7 Updated the tutorial crawl output with latest output. 2014-01-23 18:18:56 -04:00
Rolando Espinoza
9aab9224cb Updated shell docs with the crawler reference and fixed the actual shell output.
Also updated the shell example with a reproducible code example.
2014-01-23 18:04:57 -04:00
Paul Tremberth
f87859a627 RegexLinkExtractor: encode URL unicode value when creating Links 2014-01-23 19:41:18 +01:00
Rolando Espinoza
4081ba238d PEP8 minor edits. 2014-01-23 11:13:54 -04:00
Rolando Espinoza
240fdde667 Expose current crawler in the scrapy shell. 2014-01-23 11:08:56 -04:00
Rolando Espinoza
b93412059d Unused re import and PEP8 minor edits. 2014-01-23 10:37:33 -04:00
Rolando Espinoza
420efe77b2 Ignore None's values when using the ItemLoader. 2014-01-23 10:36:06 -04:00
Pablo Hoffman
2d60f86084 Merge pull request #535 from redapple/xpath-smartstrings
Disable smart strings in lxml XPath evaluations
2014-01-22 07:41:31 -08:00
Daniel Graña
677afe7e54 show ubuntu setup instructions as literal code 2014-01-20 16:22:53 -02:00
Daniel Graña
1c514c5fe9 Merge pull request #549 from dangra/509-scrapy-apt-repo
[MRG] update instruction to install using ubuntu packages
2014-01-20 09:27:57 -08:00
Daniel Graña
ebfb5b7096 replace warning about updating package lists by a note on package upgrade 2014-01-20 15:18:34 -02:00
Daniel Graña
52c3ff9190 fix apt-get line 2014-01-20 14:39:26 -02:00
Paul Tremberth
a0e25aec00 Use assertTrue/False 2014-01-20 17:29:16 +01:00
Daniel Graña
eb73ddd301 Update Ubuntu installation instructions 2014-01-20 14:15:01 -02:00
stray-leone
7f30a671c3 modify the version of scrapy ubuntu package
latest version is 0.22.
with scrapy-0.18, tutorial project provides error
relative issue : https://github.com/scrapy/scrapy/issues/511
2014-01-20 10:24:34 -02:00
Daniel Graña
431f2d109f fix 0.22.0 release date 2014-01-17 17:54:01 -02:00
Mikhail Korobov
dfd13f9941 fix typos in news.rst and remove (not released yet) header 2014-01-18 01:50:19 +06:00
Paul Tremberth
001cf39ff4 Add testcase to check is default Selector doesnt return smart strings 2014-01-17 20:34:59 +01:00
Paul Tremberth
5eb336215c Make lxml smart strings functionality customizable 2014-01-17 20:34:59 +01:00
Paul Tremberth
1f184ed7bf Disable smart strings in lxml XPath evaluations 2014-01-17 20:33:32 +01:00
Daniel Graña
d8164bd5f4 bump version to 0.23 0.23.0 2014-01-17 15:57:10 -02:00
Daniel Graña
d1f1b074e1 Merge 0.22.0 release notes 2014-01-17 15:55:46 -02:00
Daniel Graña
3c1e22618b Merge pull request #541 from kmike/fs-as-default-cache
[MRG] Make Filesystem storage backend default again.
2014-01-17 09:10:26 -08:00
Mikhail Korobov
7b7a1d8dfd Make Filesystem storage backend default again. See GH-500. 2014-01-17 04:32:08 +06:00
Daniel Graña
f0851e41ec Merge pull request #478 from redapple/offsitetests
Add tests for OffsiteMiddleware() + use re.escape() in domains regexp
2014-01-16 14:02:26 -08:00
Daniel Graña
151f9478c1 Merge pull request #538 from kmike/ajaxcrawlable-rename
[MRG] Rename AjaxCrawlableMiddleware to AjaxCrawlMiddleware
2014-01-16 10:33:46 -08:00
Mikhail Korobov
b03fe04999 Rename AjaxCrawlableMiddleware to AjaxCrawlMiddleware 2014-01-16 23:09:37 +06:00
Pablo Hoffman
ed6fd4933f Merge pull request #524 from hobsonlane/master
documentation code example corrections per pablohoffman
2014-01-16 06:44:51 -08:00
Pablo Hoffman
71ada5476e Merge pull request #472 from redapple/exslt
Register EXSLT namespaces by default (resolves #470)
2014-01-16 06:32:05 -08:00
Daniel Graña
b9bb9bed6b Merge pull request #343 from kmike/ajax-crawlable
[MRG] AjaxCrawlableMiddleware
2014-01-16 05:07:39 -08:00
Daniel Graña
e2a5310f92 Merge pull request #537 from dangra/warn-xpathselector-subclass
warn XPathSelector deprecation on subclassing and direct instance
2014-01-16 04:14:27 -08:00
Daniel Graña
5a175ad287 Merge pull request #519 from dangra/warn-once
Warn BaseSpider deprecation only once
2014-01-16 04:07:21 -08:00
Daniel Graña
b3be6e210d warn XPathSelector deprecation on subclassing and direct instance 2014-01-16 09:53:59 -02:00
Daniel Graña
3e42646ce1 do not test multiple instantation warnings 2014-01-16 09:49:51 -02:00
Hobson Lane
85a80d0752 remove "for brevity's sake" line and correct "Torrent item"
Torrent item -> TorrentItem class
2014-01-15 17:29:23 -08:00
Mikhail Korobov
03dab0117d Merge pull request #536 from manfre/patch-1
Fix comment typo
2014-01-15 16:39:33 -08:00
Michael Manfre
4a2a45b495 Fix comment typo 2014-01-15 12:35:34 -05:00
Paul Tremberth
827c0cf51f Rename "regexp" prefix to "re" 2014-01-15 15:00:25 +01:00
Mikhail Korobov
c92f52ce2c Merge pull request #525 from alexanderlukanin13/urllib_test_coverage
Improved test coverage
2014-01-15 04:05:37 -08:00
Mikhail Korobov
af16fa326f Merge pull request #533 from chekunkov/make_dupefilter_easily_subclassable
Make dupefilter easily subclassable
2014-01-15 03:56:45 -08:00
Paul Tremberth
88c8a523a7 Add warning in docs on performance when using EXSLT regexp functions 2014-01-15 12:52:10 +01:00
Alexander Chekunkov
5411695905 removed request_fingerprint method from BaseDupeFilter, removed unnecessary docstring 2014-01-15 13:44:46 +02:00
Paul Tremberth
a3eba68aca Drop EXSLT strings and math extensions 2014-01-15 12:28:25 +01:00