1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 03:24:33 +00:00
scrapy/docs/intro/install.rst

230 lines
6.8 KiB
ReStructuredText

.. _intro-install:
==================
Installation guide
==================
This document describes how to install Scrapy in Linux, Windows and Mac OS X
systems and it consists on the following 3 steps:
* :ref:`intro-install-step1`
* :ref:`intro-install-step2`
* :ref:`intro-install-step3`
.. _intro-install-requirements:
Requirements
============
* `Python`_ 2.5 or 2.6 (3.x is not yet supported)
* `Twisted`_ 2.5.0, 8.0 or above (Windows users: you may need to install
`pywin32`_ because of `this Twisted bug`_)
* `libxml2`_ (2.6.28 or above is recommended)
.. _Python: http://www.python.org
.. _Twisted: http://twistedmatrix.com
.. _libxml2: http://xmlsoft.org
.. _pywin32: http://sourceforge.net/projects/pywin32/
.. _this Twisted bug: http://twistedmatrix.com/trac/ticket/3707
Optional:
* `pyopenssl <http://pyopenssl.sourceforge.net>`_ (for HTTPS support, highly recommended)
* `spidermonkey <http://www.mozilla.org/js/spidermonkey/>`_ (for parsing Javascript)
.. _intro-install-step1:
Step 1. Install Python
======================
Scrapy works with Python 2.5 or 2.6, you can get it at http://www.python.org/download/
.. highlight:: sh
.. _intro-install-step2:
Step 2. Install required libraries
==================================
The procedure for installing the required third party libraries depends on the
platform and operating system you use.
Ubuntu/Debian
-------------
If you're running Ubuntu/Debian Linux run the following command as root::
apt-get install python-twisted python-libxml2
To install optional libraries::
apt-get install python-pyopenssl spidermonkey-bin
Arch Linux
----------
If you are running Arch Linux run the following command as root::
pacman -S twisted libxml2
To install optional libraries::
pacman -S pyopenssl spidermonkey
Mac OS X
--------
First, download `Twisted for Mac`_.
.. _Twisted for Mac: http://twistedmatrix.com/trac/wiki/Downloads#MacOSX
Mac OS X ships an ``libxml2`` version too old to be used by Scrapy. Also, by
looking on the web it seems that installing ``libxml2`` on MacOSX is a bit of a
challenge. Here is a way to achieve this, though not acceptable on the long
run:
1. Fetch the following libxml2 and libxslt packages:
ftp://xmlsoft.org/libxml2/libxml2-2.7.3.tar.gz
ftp://xmlsoft.org/libxml2/libxslt-1.1.24.tar.gz
2. Extract, build and install them both with::
./configure --with-python=/Library/Frameworks/Python.framework/Versions/2.5/
make
sudo make install
Replacing ``/Library/Frameworks/Python.framework/Version/2.5/`` with your
current python framework location.
3. Install libxml2 Python bidings with::
cd libxml2-2.7.3/python
sudo make install
The libraries and modules should be installed in something like
/usr/local/lib/python2.5/site-packages. Add it to your ``PYTHONPATH`` and
you are done.
4. Check the ``libxml2`` library was installed propertly with::
python -c 'import libxml2'
Windows
-------
Download and install:
1. `Twisted for Windows <http://twistedmatrix.com/trac/wiki/Downloads>`_ - you
may need to install `pywin32`_ because of `this Twisted bug`_
2. `libxml2 for Windows <http://users.skynet.be/sbi/libxml-python/>`_
3. `PyOpenSSL for Windows <http://sourceforge.net/project/showfiles.php?group_id=31249>`_
.. _intro-install-step3:
Step 3. Install Scrapy
======================
We're working hard to get the first release of Scrapy out. In the meantime,
please download the latest development version from the `Mercurial`_
repository.
.. _Mercurial: http://www.selenic.com/mercurial/
Just follow these steps:
3.1. Install Mercurial
-----------------------
Make sure that you have `Mercurial`_ installed, and that you can run its
commands from a shell. (Enter ``hg help`` at a shell prompt to test this.)
3.2. Check out the Scrapy source code
-------------------------------------
By running the following command::
hg clone http://hg.scrapy.org/scrapy scrapy-trunk
3.3. Install the Scrapy module
------------------------------
Install the Scrapy module by running the following commands::
cd scrapy-trunk
python setup.py install
If you're on Unix-like systems (Linux, Mac, etc) you may need to run the second
command with root privileges, for example by running::
sudo python setup.py install
.. warning:: In Windows, you may need to add the ``C:\Python25\Scripts`` folder
to the system path by adding that directory to the ``PATH`` environment
variable from the `Control Panel`_.
.. warning:: Keep in mind that Scrapy is still being changed, as we haven't yet
released the first stable version. So it's important that you keep updating
the code periodically and reinstalling the Scrapy module. A more convenient
way is to use Scrapy module without installing it (see below).
Use Scrapy without installing it
================================
Another alternative is to use the Scrapy module without installing it which
makes it easier to keep using the last Development code without having to
reinstall it everytime you do a ``hg pull -u``.
You can do this by following the next steps:
Add Scrapy to your Python path
------------------------------
If you're on Linux, Mac or any Unix-like system, you can make a symbolic link
to your system ``site-packages`` directory like this::
ln -s /path/to/scrapy-trunk/scrapy SITE-PACKAGES/scrapy
Where ``SITE-PACKAGES`` is the location of your system ``site-packages``
directory. To find this out execute the following::
python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()"
Alternatively, you can define your ``PYTHONPATH`` environment variable so that
it includes the ``scrapy-trunk`` directory. This solution also works on Windows
systems, which don't support symbolic links. (Environment variables can be
defined on Windows systems from the `Control Panel`_).
Unix-like example::
PYTHONPATH=/path/to/scrapy-trunk
Windows example (from command line, but you should probably use the `Control
Panel`_)::
set PYTHONPATH=C:\path\to\scrapy-trunk
Make the scrapy-admin.py script available
-----------------------------------------
On Unix-like systems, create a symbolic link to the file
``scrapy-trunk/scrapy/bin/scrapy-admin.py`` in a directory on your system path,
such as ``/usr/local/bin``. For example::
ln -s `pwd`/scrapy-trunk/scrapy/bin/scrapy-admin.py /usr/local/bin
This simply lets you type scrapy-admin.py from within any directory, rather
than having to qualify the command with the full path to the file.
On Windows systems, the same result can be achieved by copying the file
``scrapy-trunk/scrapy/bin/scrapy-admin.py`` to somewhere on your system path,
for example ``C:\Python25\Scripts``, which is customary for Python scripts.
.. _Control Panel: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/sysdm_advancd_environmnt_addchange_variable.mspx