2009-01-06 14:34:26 +00:00
.. _intro-install:
2009-01-06 00:15:55 +00:00
2009-04-10 05:35:53 +00:00
==================
Installation guide
==================
This document describes how to install Scrapy in Linux, Windows and Mac OS X
2009-04-11 18:34:44 +00:00
systems and it consists on the following 3 steps:
2009-04-10 11:01:56 +00:00
2009-04-11 18:34:44 +00:00
* :ref: `intro-install-step1`
* :ref: `intro-install-step2`
* :ref: `intro-install-step3`
2009-04-10 11:01:56 +00:00
2009-04-11 18:34:44 +00:00
.. _intro-install-requirements:
2009-01-06 14:34:26 +00:00
2008-12-16 14:58:52 +00:00
Requirements
============
2009-04-11 18:34:44 +00:00
* `Python`_ 2.5 or 2.6 (3.x is not yet supported)
2010-04-24 18:19:52 -03:00
* `Twisted`_ 2.5.0, 8.0 or above (Windows users: you'll need to install
`Zope.Interface`_ and maybe `pywin32`_ because of `this Twisted bug`_ )
2009-04-10 11:01:56 +00:00
2010-01-13 12:20:24 -02:00
* `libxml2`_ (versions prior to 2.6.28 are known to have problems parsing certain malformed HTML, and have also been reported to contain leaks, so 2.6.28 or above is highly recommended)
2009-04-10 11:01:56 +00:00
2009-04-11 18:34:44 +00:00
.. _Python: http://www.python.org
.. _Twisted: http://twistedmatrix.com
.. _libxml2: http://xmlsoft.org
.. _pywin32: http://sourceforge.net/projects/pywin32/
2010-04-24 18:19:52 -03:00
.. _Zope.Interface: http://pypi.python.org/pypi/zope.interface#download
2009-04-11 18:34:44 +00:00
.. _this Twisted bug: http://twistedmatrix.com/trac/ticket/3707
2008-12-16 14:58:52 +00:00
Optional:
2009-04-11 18:34:44 +00:00
* `pyopenssl <http://pyopenssl.sourceforge.net> `_ (for HTTPS support, highly recommended)
2009-07-09 16:49:20 -03:00
* `simplejson <http://undefined.org/python/#simplejson> `_ (for (de)serializing JSON)
2009-04-11 18:34:44 +00:00
.. _intro-install-step1:
2008-12-16 14:58:52 +00:00
2009-04-11 18:34:44 +00:00
Step 1. Install Python
======================
2008-12-16 14:58:52 +00:00
2009-01-06 00:15:55 +00:00
Scrapy works with Python 2.5 or 2.6, you can get it at http://www.python.org/download/
2008-12-16 14:58:52 +00:00
2009-04-11 18:34:44 +00:00
.. highlight :: sh
.. _intro-install-step2:
Step 2. Install required libraries
==================================
2008-12-16 14:58:52 +00:00
2009-01-06 00:15:55 +00:00
The procedure for installing the required third party libraries depends on the
platform and operating system you use.
Ubuntu/Debian
-------------
2008-12-16 14:58:52 +00:00
2009-01-06 00:15:55 +00:00
If you're running Ubuntu/Debian Linux run the following command as root::
2008-12-16 14:58:52 +00:00
2009-04-11 18:34:44 +00:00
apt-get install python-twisted python-libxml2
To install optional libraries::
2009-09-29 09:44:02 -03:00
apt-get install python-pyopenssl python-simplejson
2008-12-16 14:58:52 +00:00
2009-01-19 13:37:07 +00:00
Arch Linux
----------
2008-12-16 14:58:52 +00:00
2009-03-22 22:05:23 +00:00
If you are running Arch Linux run the following command as root::
2008-12-16 14:58:52 +00:00
2009-04-11 18:34:44 +00:00
pacman -S twisted libxml2
2008-12-16 14:58:52 +00:00
2009-04-11 18:34:44 +00:00
To install optional libraries::
2009-01-26 23:28:19 +00:00
2009-09-29 09:44:02 -03:00
pacman -S pyopenssl python-simplejson
2009-04-11 18:34:44 +00:00
Mac OS X
--------
First, download `Twisted for Mac`_ .
.. _Twisted for Mac: http://twistedmatrix.com/trac/wiki/Downloads#MacOSX
Mac OS X ships an `` libxml2 `` version too old to be used by Scrapy. Also, by
looking on the web it seems that installing `` libxml2 `` on MacOSX is a bit of a
challenge. Here is a way to achieve this, though not acceptable on the long
run:
2009-01-26 23:28:19 +00:00
1. Fetch the following libxml2 and libxslt packages:
ftp://xmlsoft.org/libxml2/libxml2-2.7.3.tar.gz
ftp://xmlsoft.org/libxml2/libxslt-1.1.24.tar.gz
2009-04-11 18:34:44 +00:00
2. Extract, build and install them both with::
2009-01-26 23:28:19 +00:00
./configure --with-python=/Library/Frameworks/Python.framework/Versions/2.5/
make
sudo make install
2009-05-19 01:50:44 -03:00
2009-04-11 18:34:44 +00:00
Replacing `` /Library/Frameworks/Python.framework/Version/2.5/ `` with your
current python framework location.
2009-01-26 23:28:19 +00:00
2009-04-11 18:34:44 +00:00
3. Install libxml2 Python bidings with::
2009-01-26 23:28:19 +00:00
2009-04-11 18:34:44 +00:00
cd libxml2-2.7.3/python
2009-01-26 23:28:19 +00:00
sudo make install
The libraries and modules should be installed in something like
2009-04-11 18:34:44 +00:00
/usr/local/lib/python2.5/site-packages. Add it to your `` PYTHONPATH `` and
you are done.
2009-09-29 09:44:02 -03:00
4. Check the `` libxml2 `` library was installed properly with::
2009-01-26 23:28:19 +00:00
python -c 'import libxml2'
2009-01-06 00:15:55 +00:00
Windows
-------
2008-12-16 14:58:52 +00:00
2009-01-19 13:48:06 +00:00
Download and install:
2009-04-11 18:34:44 +00:00
1. `Twisted for Windows <http://twistedmatrix.com/trac/wiki/Downloads> `_ - you
may need to install `pywin32`_ because of `this Twisted bug`_
2008-12-16 14:58:52 +00:00
2010-04-24 18:19:52 -03:00
2. Install `Zope.Interface`_ (required by Twisted)
2009-04-11 18:34:44 +00:00
2010-04-24 18:19:52 -03:00
3. `libxml2 for Windows <http://users.skynet.be/sbi/libxml-python/> `_
4. `PyOpenSSL for Windows <http://sourceforge.net/project/showfiles.php?group_id=31249> `_
2009-04-10 09:51:50 +00:00
2009-04-11 18:34:44 +00:00
.. _intro-install-step3:
Step 3. Install Scrapy
======================
2008-12-16 14:58:52 +00:00
2009-09-17 11:06:55 -03:00
There are three ways to download and install Scrapy:
2008-12-16 15:40:27 +00:00
2009-09-29 09:44:02 -03:00
1. :ref: `intro-install-release`
2. :ref: `intro-install-easy`
3. :ref: `intro-install-dev`
.. _intro-install-release:
2009-09-29 08:41:34 -03:00
Installing an official release
------------------------------
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
Download Scrapy from the `Download page`_ . Scrapy is distributed in two ways: a
source code tarball (for Unix and Mac OS X systems) and a Windows installer
2009-09-29 09:44:02 -03:00
(for Windows). If you downloaded the tarball you can install it as any Python
2009-09-29 08:41:34 -03:00
package using `` setup.py `` ::
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
tar zxf scrapy-X.X.X.tar.gz
cd scrapy-X.X.X
python setup.py install
2009-05-19 01:50:44 -03:00
2009-09-29 08:41:34 -03:00
If you downloaded the Windows installer, just run it.
2008-12-16 15:40:27 +00:00
2009-09-29 08:41:34 -03:00
.. warning :: In Windows, you may need to add the `` C:\Python25\Scripts `` (or
2009-09-29 09:44:02 -03:00
`` C:\Python26\Scripts `` ) folder to the system path by adding that directory
2009-09-29 08:41:34 -03:00
to the `` PATH `` environment variable from the `Control Panel`_ .
2008-12-16 15:40:27 +00:00
2009-09-29 08:41:34 -03:00
.. _Download page: http://scrapy.org/download/
2008-12-16 15:40:27 +00:00
2009-09-29 09:44:02 -03:00
.. _intro-install-easy:
Installing with `easy_install`_
-------------------------------
2008-12-16 15:40:27 +00:00
2009-09-29 08:41:34 -03:00
You can install Scrapy running `easy_install`_ like this::
2009-12-13 14:23:31 -02:00
easy_install -U Scrapy
2008-12-16 15:40:27 +00:00
2009-09-17 11:06:55 -03:00
.. _easy_install: http://peak.telecommunity.com/DevCenter/EasyInstall
2009-03-22 22:05:23 +00:00
2009-09-29 09:44:02 -03:00
.. _intro-install-dev:
2009-09-29 08:41:34 -03:00
Installing the development version
-----------------------------------
2009-03-22 22:05:23 +00:00
2009-09-29 08:41:34 -03:00
.. note :: If you use the development version of Scrapy, you should subscribe
to the mailing lists to get notified of any changes to the API.
2009-03-22 22:05:23 +00:00
2008-12-16 15:40:27 +00:00
2009-09-29 08:41:34 -03:00
1. Check out the latest development code from the `Mercurial`_ repository (you
need to install `Mercurial_` first)::
2008-12-16 15:40:27 +00:00
2009-09-29 08:41:34 -03:00
hg clone http://hg.scrapy.org/scrapy scrapy-trunk
.. _Mercurial: http://www.selenic.com/mercurial/
2. Add Scrapy to your Python path
2009-03-22 22:05:23 +00:00
2009-09-29 08:41:34 -03:00
If you're on Linux, Mac or any Unix-like system, you can make a symbolic link
to your system `` site-packages `` directory like this::
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
ln -s /path/to/scrapy-trunk/scrapy SITE-PACKAGES/scrapy
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
Where `` SITE-PACKAGES `` is the location of your system `` site-packages ``
directory. To find this out execute the following::
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()"
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
Alternatively, you can define your `` PYTHONPATH `` environment variable so that
it includes the `` scrapy-trunk `` directory. This solution also works on Windows
systems, which don't support symbolic links. (Environment variables can be
defined on Windows systems from the `Control Panel`_ ).
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
Unix-like example::
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
PYTHONPATH=/path/to/scrapy-trunk
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
Windows example (from command line, but you should probably use the `Control
Panel`_)::
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
set PYTHONPATH=C:\path\to\scrapy-trunk
2009-04-10 11:01:56 +00:00
2010-08-18 19:48:32 -03:00
3. Make the `` scrapy `` command available
2009-04-10 11:01:56 +00:00
2009-09-29 08:41:34 -03:00
On Unix-like systems, create a symbolic link to the file
2010-08-18 19:48:32 -03:00
`` scrapy-trunk/bin/scrapy `` in a directory on your system path,
2009-09-29 08:41:34 -03:00
such as `` /usr/local/bin `` . For example::
2009-04-10 11:01:56 +00:00
2010-08-18 19:48:32 -03:00
ln -s `pwd` /scrapy-trunk/bin/scrapy /usr/local/bin
2009-04-10 11:01:56 +00:00
2010-08-18 19:48:32 -03:00
This simply lets you type `` scrapy `` from within any directory, rather
2009-09-29 08:41:34 -03:00
than having to qualify the command with the full path to the file.
2009-03-22 22:05:23 +00:00
2009-09-29 08:41:34 -03:00
On Windows systems, the same result can be achieved by copying the file
2010-08-18 19:48:32 -03:00
`` scrapy-trunk/bin/scrapy `` to somewhere on your system path,
2009-09-29 08:41:34 -03:00
for example `` C:\Python25\Scripts `` , which is customary for Python scripts.
2009-03-22 22:05:23 +00:00
2009-09-29 09:44:02 -03:00
.. _Control Panel: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/sysdm_advancd_environmnt_addchange_variable.mspx
2008-12-16 15:40:27 +00:00