1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-27 20:04:06 +00:00
scrapy/docs/intro/install.rst

231 lines
6.7 KiB
ReStructuredText
Raw Normal View History

.. _intro-install:
==================
Installation guide
==================
This document describes how to install Scrapy in Linux, Windows and Mac OS X
systems and it consists on the following 3 steps:
* :ref:`intro-install-step1`
* :ref:`intro-install-step2`
* :ref:`intro-install-step3`
.. _intro-install-requirements:
Requirements
============
* `Python`_ 2.5 or 2.6 (3.x is not yet supported)
* `Twisted`_ 2.5.0, 8.0 or above (Windows users: you may need to install
`pywin32`_ because of `this Twisted bug`_)
* `libxml2`_ (2.6.28 or above is recommended)
.. _Python: http://www.python.org
.. _Twisted: http://twistedmatrix.com
.. _libxml2: http://xmlsoft.org
.. _pywin32: http://sourceforge.net/projects/pywin32/
.. _this Twisted bug: http://twistedmatrix.com/trac/ticket/3707
Optional:
* `pyopenssl <http://pyopenssl.sourceforge.net>`_ (for HTTPS support, highly recommended)
* `simplejson <http://undefined.org/python/#simplejson>`_ (for (de)serializing JSON)
.. _intro-install-step1:
Step 1. Install Python
======================
Scrapy works with Python 2.5 or 2.6, you can get it at http://www.python.org/download/
.. highlight:: sh
.. _intro-install-step2:
Step 2. Install required libraries
==================================
The procedure for installing the required third party libraries depends on the
platform and operating system you use.
Ubuntu/Debian
-------------
If you're running Ubuntu/Debian Linux run the following command as root::
apt-get install python-twisted python-libxml2
To install optional libraries::
apt-get install python-pyopenssl python-simplejson
Arch Linux
----------
If you are running Arch Linux run the following command as root::
pacman -S twisted libxml2
To install optional libraries::
pacman -S pyopenssl python-simplejson
Mac OS X
--------
First, download `Twisted for Mac`_.
.. _Twisted for Mac: http://twistedmatrix.com/trac/wiki/Downloads#MacOSX
Mac OS X ships an ``libxml2`` version too old to be used by Scrapy. Also, by
looking on the web it seems that installing ``libxml2`` on MacOSX is a bit of a
challenge. Here is a way to achieve this, though not acceptable on the long
run:
1. Fetch the following libxml2 and libxslt packages:
ftp://xmlsoft.org/libxml2/libxml2-2.7.3.tar.gz
ftp://xmlsoft.org/libxml2/libxslt-1.1.24.tar.gz
2. Extract, build and install them both with::
./configure --with-python=/Library/Frameworks/Python.framework/Versions/2.5/
make
sudo make install
Replacing ``/Library/Frameworks/Python.framework/Version/2.5/`` with your
current python framework location.
3. Install libxml2 Python bidings with::
cd libxml2-2.7.3/python
sudo make install
The libraries and modules should be installed in something like
/usr/local/lib/python2.5/site-packages. Add it to your ``PYTHONPATH`` and
you are done.
4. Check the ``libxml2`` library was installed properly with::
python -c 'import libxml2'
Windows
-------
Download and install:
1. `Twisted for Windows <http://twistedmatrix.com/trac/wiki/Downloads>`_ - you
may need to install `pywin32`_ because of `this Twisted bug`_
2. `libxml2 for Windows <http://users.skynet.be/sbi/libxml-python/>`_
3. `PyOpenSSL for Windows <http://sourceforge.net/project/showfiles.php?group_id=31249>`_
.. _intro-install-step3:
Step 3. Install Scrapy
======================
There are three ways to download and install Scrapy:
1. :ref:`intro-install-release`
2. :ref:`intro-install-easy`
3. :ref:`intro-install-dev`
.. _intro-install-release:
2009-09-29 08:41:34 -03:00
Installing an official release
------------------------------
2009-09-29 08:41:34 -03:00
Download Scrapy from the `Download page`_. Scrapy is distributed in two ways: a
source code tarball (for Unix and Mac OS X systems) and a Windows installer
(for Windows). If you downloaded the tarball you can install it as any Python
2009-09-29 08:41:34 -03:00
package using ``setup.py``::
2009-09-29 08:41:34 -03:00
tar zxf scrapy-X.X.X.tar.gz
cd scrapy-X.X.X
python setup.py install
2009-09-29 08:41:34 -03:00
If you downloaded the Windows installer, just run it.
2009-09-29 08:41:34 -03:00
.. warning:: In Windows, you may need to add the ``C:\Python25\Scripts`` (or
``C:\Python26\Scripts``) folder to the system path by adding that directory
2009-09-29 08:41:34 -03:00
to the ``PATH`` environment variable from the `Control Panel`_.
2009-09-29 08:41:34 -03:00
.. _Download page: http://scrapy.org/download/
.. _intro-install-easy:
Installing with `easy_install`_
-------------------------------
2009-09-29 08:41:34 -03:00
You can install Scrapy running `easy_install`_ like this::
easy_install -U scrapy
.. _easy_install: http://peak.telecommunity.com/DevCenter/EasyInstall
.. _intro-install-dev:
2009-09-29 08:41:34 -03:00
Installing the development version
-----------------------------------
2009-09-29 08:41:34 -03:00
.. note:: If you use the development version of Scrapy, you should subscribe
to the mailing lists to get notified of any changes to the API.
2009-09-29 08:41:34 -03:00
1. Check out the latest development code from the `Mercurial`_ repository (you
need to install `Mercurial_` first)::
2009-09-29 08:41:34 -03:00
hg clone http://hg.scrapy.org/scrapy scrapy-trunk
.. _Mercurial: http://www.selenic.com/mercurial/
2. Add Scrapy to your Python path
2009-09-29 08:41:34 -03:00
If you're on Linux, Mac or any Unix-like system, you can make a symbolic link
to your system ``site-packages`` directory like this::
2009-09-29 08:41:34 -03:00
ln -s /path/to/scrapy-trunk/scrapy SITE-PACKAGES/scrapy
2009-09-29 08:41:34 -03:00
Where ``SITE-PACKAGES`` is the location of your system ``site-packages``
directory. To find this out execute the following::
2009-09-29 08:41:34 -03:00
python -c "from distutils.sysconfig import get_python_lib; print get_python_lib()"
2009-09-29 08:41:34 -03:00
Alternatively, you can define your ``PYTHONPATH`` environment variable so that
it includes the ``scrapy-trunk`` directory. This solution also works on Windows
systems, which don't support symbolic links. (Environment variables can be
defined on Windows systems from the `Control Panel`_).
2009-09-29 08:41:34 -03:00
Unix-like example::
2009-09-29 08:41:34 -03:00
PYTHONPATH=/path/to/scrapy-trunk
2009-09-29 08:41:34 -03:00
Windows example (from command line, but you should probably use the `Control
Panel`_)::
2009-09-29 08:41:34 -03:00
set PYTHONPATH=C:\path\to\scrapy-trunk
3. Make the ``scrapy-ctl.py`` script available
2009-09-29 08:41:34 -03:00
On Unix-like systems, create a symbolic link to the file
``scrapy-trunk/scrapy/bin/scrapy-ctl.py`` in a directory on your system path,
such as ``/usr/local/bin``. For example::
2009-09-29 08:41:34 -03:00
ln -s `pwd`/scrapy-trunk/scrapy/bin/scrapy-ctl.py /usr/local/bin
2009-09-29 08:41:34 -03:00
This simply lets you type ``scrapy-ctl.py`` from within any directory, rather
than having to qualify the command with the full path to the file.
2009-09-29 08:41:34 -03:00
On Windows systems, the same result can be achieved by copying the file
``scrapy-trunk/scrapy/bin/scrapy-ctl.py`` to somewhere on your system path,
for example ``C:\Python25\Scripts``, which is customary for Python scripts.
.. _Control Panel: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/sysdm_advancd_environmnt_addchange_variable.mspx