1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-22 06:13:24 +00:00

Deprecated scrapy-ctl.py command in favour of simpler "scrapy" command. Closes #199. Also updated documenation accordingly and added convenient scrapy.bat script for running from Windows.

--HG--
rename : debian/scrapy-ctl.1 => debian/scrapy.1
rename : docs/topics/scrapy-ctl.rst => docs/topics/cmdline.rst
This commit is contained in:
Pablo Hoffman 2010-08-18 19:48:32 -03:00
parent 5522afca8d
commit 34554da201
22 changed files with 151 additions and 157 deletions

View File

@ -1,4 +0,0 @@
#!/usr/bin/env python
from scrapy.cmdline import execute
execute()

1
debian/rules vendored
View File

@ -7,6 +7,5 @@
override_dh_auto_install:
dh_auto_install
mkdir -p $(CURDIR)/debian/scrapy/usr/bin
mv $(CURDIR)/debian/tmp/usr/bin/scrapy-ctl.py $(CURDIR)/debian/scrapy/usr/bin/scrapy-ctl
mv $(CURDIR)/debian/tmp/usr/bin/scrapy-ws.py $(CURDIR)/debian/scrapy/usr/bin/scrapy-ws
mv $(CURDIR)/debian/tmp/usr/bin/scrapy-sqs.py $(CURDIR)/debian/scrapy/usr/bin/scrapy-sqs

View File

@ -1,12 +1,12 @@
.TH SCRAPY-CTL 1 "October 17, 2009"
.TH SCRAPY 1 "October 17, 2009"
.SH NAME
scrapy-ctl \- Python Scrapy control script
scrapy \- Python Scrapy control script
.SH SYNOPSIS
.B scrapy-ctl
.B scrapy
[\fIcommand\fR] [\fIOPTIONS\fR] ...
.SH DESCRIPTION
.PP
Scrapy is controlled through the \fBscrapy-ctl\fR control script. The script provides several commands, for different purposes. Each command supports its own particular syntax. In other words, each command supports a different set of arguments and options.
Scrapy is controlled through the \fBscrapy\fR control script. The script provides several commands, for different purposes. Each command supports its own particular syntax. In other words, each command supports a different set of arguments and options.
.SH OPTIONS
.SS fetch\fR [\fIOPTION\fR] \fIURL\fR
.TP
@ -72,7 +72,7 @@ Set/override setting (may be repeated)
Python path to the Scrapy project settings
.SH AUTHOR
Scrapy-ctl was written by the Scrapy Developers
Scrapy was written by the Scrapy Developers
<scrapy-developers@googlegroups.com>.
.PP
This manual page was written by Ignace Mouzannar <mouzannar@gmail.com>,

View File

@ -1 +1 @@
debian/scrapy-ctl.1
debian/scrapy.1

View File

@ -116,7 +116,7 @@ Can I run a spider without creating a project?
Yes. You can use the ``runspider`` command. For example, if you have a spider
written in a ``my_spider.py`` file you can run it with::
scrapy-ctl.py runspider my_spider.py
scrapy runspider my_spider.py
I get "Filtered offsite request" messages. How can I fix them?
--------------------------------------------------------------

View File

@ -169,15 +169,15 @@ Reference
.. toctree::
:hidden:
topics/scrapy-ctl
topics/cmdline
topics/request-response
topics/settings
topics/signals
topics/exceptions
topics/exporters
:doc:`topics/scrapy-ctl`
Understand the command used to control your Scrapy project.
:doc:`topics/cmdline`
Understand the command-line tool used to control your Scrapy project.
:doc:`topics/request-response`
Understand the classes used to represent HTTP requests and responses.

View File

@ -214,19 +214,19 @@ Installing the development version
set PYTHONPATH=C:\path\to\scrapy-trunk
3. Make the ``scrapy-ctl.py`` script available
3. Make the ``scrapy`` command available
On Unix-like systems, create a symbolic link to the file
``scrapy-trunk/bin/scrapy-ctl.py`` in a directory on your system path,
``scrapy-trunk/bin/scrapy`` in a directory on your system path,
such as ``/usr/local/bin``. For example::
ln -s `pwd`/scrapy-trunk/bin/scrapy-ctl.py /usr/local/bin
ln -s `pwd`/scrapy-trunk/bin/scrapy /usr/local/bin
This simply lets you type ``scrapy-ctl.py`` from within any directory, rather
This simply lets you type ``scrapy`` from within any directory, rather
than having to qualify the command with the full path to the file.
On Windows systems, the same result can be achieved by copying the file
``scrapy-trunk/bin/scrapy-ctl.py`` to somewhere on your system path,
``scrapy-trunk/bin/scrapy`` to somewhere on your system path,
for example ``C:\Python25\Scripts``, which is customary for Python scripts.
.. _Control Panel: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/sysdm_advancd_environmnt_addchange_variable.mspx

View File

@ -36,12 +36,12 @@ Creating a project
Before start scraping, you will have set up a new Scrapy project. Enter a
directory where you'd like to store your code and then run::
python scrapy-ctl.py startproject dmoz
scrapy startproject dmoz
This will create a ``dmoz`` directory with the following contents::
dmoz/
scrapy-ctl.py
scrapy.cfg
dmoz/
__init__.py
items.py
@ -53,7 +53,7 @@ This will create a ``dmoz`` directory with the following contents::
These are basically:
* ``scrapy-ctl.py``: the project's control script.
* ``scrapy.cfg``: the project configuration file
* ``dmoz/``: the project's python module, you'll later import your code from
here.
* ``dmoz/items.py``: the project's items file.
@ -144,7 +144,7 @@ Crawling
To put our spider to work, go to the project's top level directory and run::
python scrapy-ctl.py crawl dmoz.org
scrapy crawl dmoz.org
The ``crawl dmoz.org`` command runs the spider for the ``dmoz.org`` domain. You
will get an output similar to this::
@ -244,7 +244,7 @@ installed on your system.
To start a shell you must go to the project's top level directory and run::
python scrapy-ctl.py shell http://www.dmoz.org/Computers/Programming/Languages/Python/Books/
scrapy shell http://www.dmoz.org/Computers/Programming/Languages/Python/Books/
This is what the shell looks like::
@ -372,7 +372,7 @@ Let's add this code to our spider::
Now try crawling the dmoz.org domain again and you'll see sites being printed
in your output, run::
python scrapy-ctl.py crawl dmoz.org
scrapy crawl dmoz.org
Using our item
--------------

94
docs/topics/cmdline.rst Normal file
View File

@ -0,0 +1,94 @@
.. _topics-cmdline:
========================
Scrapy command line tool
========================
Scrapy is controlled through the ``scrapy`` command, which we'll refer to as
the "Scrapy tool" from now on to differentiate it from Scrapy commands.
The Scrapy tool provides several commands, for different purposes. Each command
supports its own particular syntax. In other words, each command supports a
different set of arguments and options.
This page doesn't describe each command and its syntax, but instead provides an
introduction to how the ``scrapy`` tool is used. After you learn the basics,
you can get help for each particular command using the ``scrapy`` tool itself.
Using the ``scrapy`` tool
=========================
The first thing you would do with the ``scrapy`` tool is to create your Scrapy
project::
scrapy startproject myproject
That will create a Scrapy project under the ``myproject`` directory.
Next, you go inside the new project directory::
cd myproject
And you're ready to use use the ``scrapy`` command to manage and control your
project from there. For example, to create a new spider::
scrapy genspider mydomain mydomain.com
See all available commands
--------------------------
To see all available commands type::
scrapy -h
That will print a summary of all available Scrapy commands.
The first line will print the currently active project, if you're inside a
Scrapy project.
Example (with an active project)::
Scrapy X.X.X - project: myproject
Usage
=====
...
Example (with no active project)::
Scrapy X.X.X - no active project
Usage
=====
...
Get help for a particular command
---------------------------------
To get help about a particular command, including its description, usage, and
available options type::
scrapy <command> -h
Example::
scrapy crawl -h
Using ``scrapy`` tool outside your project
==========================================
Not all commands must be run from "inside" a Scrapy project. You can, for
example, use the ``fetch`` command to download a page (using Scrapy built-in
downloader) from outside a project. Other commands that can be used outside a
project are ``startproject`` (obviously) and ``shell``, to launch a
:ref:`Scrapy Shell <topics-shell>`.
Also, keep in mind that some commands may have slightly different behaviours
when running them from inside projects. For example, the fetch command will use
spider arguments (such as ``user_agent`` attribute) if the url being fetched is
handled by some specific project spider that happens to define a custom
``user_agent`` attribute. This is feature, as the ``fetch`` command is meant to
download pages as they would be downloaded from the spider.

View File

@ -1,103 +0,0 @@
.. _topics-scrapy-ctl:
=============
scrapy-ctl.py
=============
Scrapy is controlled through the ``scrapy-ctl.py`` control script. The script
provides several commands, for different purposes. Each command supports its
own particular syntax. In other words, each command supports a different set of
arguments and options.
This page doesn't describe each command and its syntax, but provides an
introduction to how the ``scrapy-ctl.py`` script is used. After you learn how
to use it, you can get help for each particular command using the same
``scrapy-ctl.py`` script.
Global and project-specific ``scrapy-ctl.py``
=============================================
There is one global ``scrapy-ctl.py`` script shipped with Scrapy and another
``scrapy-ctl.py`` script automatically created inside your Scrapy project. The
project-specific ``scrapy-ctl.py`` is just a thin wrapper around the global
``scrapy-ctl.py`` which populates the settings of your project, so you don't
have to specify them every time through the ``--settings`` argument.
Using the ``scrapy-ctl.py`` script
==================================
The first thing you would do with the ``scrapy-ctl.py`` script is create your
Scrapy project::
scrapy-ctl.py startproject myproject
That will create a Scrapy project under the ``myproject`` directory and will
put a new ``scrapy-ctl.py`` inside that directory.
So, you go inside the new project directory::
cd myproject
And you're ready to use your project's ``scrapy-ctl.py``. For example, to
create a new spider::
python scrapy-ctl.py genspider mydomain mydomain.com
This is the same as using the global ``scrapy-ctl.py`` script and passing the
project settings module in the ``--settings`` argument::
scrapy-ctl.py --settings=myproject.settings genspider mydomain mydomain.com
You'll typically use the project-specific ``scrapy-ctl.py``, for convenience.
See all available commands
--------------------------
To see all available commands type::
scrapy-ctl.py -h
That will print a summary of all available Scrapy commands.
The first line will print the currently active project (if any).
Example (active project)::
Scrapy X.X.X - project: myproject
Usage
=====
...
Example (no active project)::
Scrapy X.X.X - no active project
Usage
=====
...
Get help for a particular command
---------------------------------
To get help about a particular command, including its description, usage and
available options type::
scrapy-ctl.py <command> -h
Example::
scrapy-ctl.py crawl -h
Using ``scrapy-ctl.py`` outside your project
============================================
Not all commands must be run from "inside" a Scrapy project. You can, for
example, use the ``fetch`` command to download a page (using Scrapy built-in
downloader) from outside a project. Other commands that can be used outside a
project are ``startproject`` (obviously) and ``shell``, to launch a
:ref:`Scrapy Shell <topics-shell>`.

View File

@ -83,7 +83,7 @@ Here's its HTML code:
First, let's open the shell::
scrapy-ctl.py shell http://doc.scrapy.org/_static/selectors-sample1.html
scrapy shell http://doc.scrapy.org/_static/selectors-sample1.html
Then, after the shell loads, you'll have some selectors already instanced and
ready to use.

View File

@ -24,8 +24,7 @@ Designating the settings
When you use Scrapy, you have to tell it which settings you're using. You can
do this by using an environment variable, ``SCRAPY_SETTINGS_MODULE``, or the
``--settings`` argument of the :doc:`scrapy-ctl.py script
</topics/scrapy-ctl>`.
``--settings`` argument of the :doc:`scrapy command </topics/cmdline>`.
The value of ``SCRAPY_SETTINGS_MODULE`` should be in Python path syntax, e.g.
``myproject.settings``. Note that the settings module should be on the
@ -65,7 +64,7 @@ You can also override one (or more) settings from command line using the
Example::
scrapy-ctl.py crawl domain.com --set LOG_FILE=scrapy.log
scrapy crawl domain.com --set LOG_FILE=scrapy.log
2. Environment variables
------------------------
@ -74,7 +73,7 @@ You can populate settings using environment variables prefixed with
``SCRAPY_``. For example, to change the log file location un Unix systems::
$ export SCRAPY_LOG_FILE=scrapy.log
$ scrapy-ctl.py crawl example.com
$ scrapy crawl example.com
In Windows systems, you can change the environment variables from the Control
Panel following `these guidelines`_.
@ -90,7 +89,7 @@ It's where most of your custom settings will be populated.
4. Default settings per-command
-------------------------------
Each :doc:`/topics/scrapy-ctl` command can have its own default settings, which
Each :doc:`/topics/cmdline` command can have its own default settings, which
override the global default settings. Those custom command settings are
specified in the ``default_settings`` attribute of the command class.
@ -224,7 +223,7 @@ project name). This will be used to construct the User-Agent by default, and
also for logging.
It's automatically populated with your project name when you create your
project with the :doc:`scrapy-ctl.py </topics/scrapy-ctl>` ``startproject``
project with the :doc:`scrapy </topics/cmdline>` ``startproject``
command.
.. setting:: BOT_VERSION
@ -998,7 +997,7 @@ TEMPLATES_DIR
Default: ``templates`` dir inside scrapy module
The directory where to look for template when creating new projects with
:doc:`scrapy-ctl.py startproject </topics/scrapy-ctl>` command.
:doc:`scrapy startproject </topics/cmdline>` command.
.. setting:: URLLENGTH_LIMIT

View File

@ -33,7 +33,7 @@ Launch the shell
To launch the shell type::
scrapy-ctl.py shell <url>
scrapy shell <url>
Where the ``<url>`` is the URL you want to scrape.
@ -106,7 +106,7 @@ shell works.
First, we launch the shell::
python scrapy-ctl.py shell http://scrapy.org --nolog
scrapy shell http://scrapy.org --nolog
Then, the shell fetches the url (using the Scrapy downloader) and prints the
list of available objects and some help::

View File

@ -2,7 +2,7 @@
#
# To create the first spider for your project use this command:
#
# scrapy-ctl.py genspider myspider myspider-domain.com
# scrapy genspider myspider myspider-domain.com
#
# For more info see:
# http://doc.scrapy.org/topics/spiders.html

View File

@ -2,7 +2,7 @@
#
# To create the first spider for your project use this command:
#
# scrapy-ctl.py genspider myspider myspider-domain.com
# scrapy genspider myspider myspider-domain.com
#
# For more info see:
# http://doc.scrapy.org/topics/spiders.html

10
extras/scrapy.bat Normal file
View File

@ -0,0 +1,10 @@
@echo off
rem Windows command-line tool for Scrapy
setlocal
rem Use a full path to Python (relative to this script) as the standard Python
rem install does not put python.exe on the PATH...
rem %~dp0 is the directory of this script
%~dp0..\python "%~dp0scrapy" %*
endlocal

View File

@ -51,9 +51,9 @@ def _print_usage(inside_project):
print "Usage"
print "=====\n"
print "To run a command:"
print " scrapy-ctl.py <command> [options] [args]\n"
print " scrapy <command> [options] [args]\n"
print "To get help:"
print " scrapy-ctl.py <command> -h\n"
print " scrapy <command> -h\n"
print "Available commands"
print "==================\n"
cmds = _get_commands_dict()
@ -66,6 +66,10 @@ def _print_usage(inside_project):
def execute(argv=None):
if argv is None:
argv = sys.argv
if any('scrapy-ctl' in x for x in argv):
import warnings
warnings.warn("`scrapy-ctl.py` command-line tool is deprecated and will be removed in Scrapy 0.11, use `scrapy` instead",
DeprecationWarning, stacklevel=2)
cmds = _get_commands_dict()
@ -82,7 +86,7 @@ def execute(argv=None):
parser.usage = "%%prog %s %s" % (cmdname, cmd.syntax())
parser.description = cmd.long_desc()
if cmd.requires_project and not settings.settings_module:
print "Error running: scrapy-ctl.py %s\n" % cmdname
print "Error running: scrapy %s\n" % cmdname
print "Cannot find project settings module in python path: %s" % \
settings.settings_module_path
sys.exit(1)
@ -98,7 +102,7 @@ def execute(argv=None):
sys.exit(2)
else:
print "Unknown command: %s\n" % cmdname
print 'Use "scrapy-ctl.py -h" for help'
print 'Use "scrapy -h" for help'
sys.exit(2)
settings.defaults.update(cmd.default_settings)

View File

@ -13,7 +13,6 @@ TEMPLATES_PATH = join(scrapy.__path__[0], 'templates', 'project')
TEMPLATES_TO_RENDER = (
('scrapy.cfg',),
('scrapy-ctl.py',),
('${project_name}', 'settings.py.tmpl'),
('${project_name}', 'items.py.tmpl'),
('${project_name}', 'pipelines.py.tmpl'),
@ -47,7 +46,6 @@ class Command(ScrapyCommand):
moduletpl = join(TEMPLATES_PATH, 'module')
copytree(moduletpl, join(project_name, project_name), ignore=IGNORE)
shutil.copy(join(TEMPLATES_PATH, 'scrapy.cfg'), project_name)
shutil.copy(join(TEMPLATES_PATH, 'scrapy-ctl.py'), project_name)
for paths in TEMPLATES_TO_RENDER:
path = join(*paths)
tplfile = join(project_name,

View File

@ -2,7 +2,7 @@
#
# To create the first spider for your project use this command:
#
# scrapy-ctl.py genspider myspider myspider-domain.com
# scrapy genspider myspider myspider-domain.com
#
# For more info see:
# http://doc.scrapy.org/topics/spiders.html

View File

@ -1,7 +0,0 @@
#!/usr/bin/env python
import os
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', '${project_name}.settings')
from scrapy.cmdline import execute
execute()

View File

@ -35,7 +35,7 @@ class StartprojectTest(ProjectTest):
def test_startproject(self):
self.assertEqual(0, self.call('startproject', self.project_name))
assert exists(join(self.proj_path, 'scrapy-ctl.py'))
assert exists(join(self.proj_path, 'scrapy.cfg'))
assert exists(join(self.proj_path, 'testproject'))
assert exists(join(self.proj_mod_path, '__init__.py'))
assert exists(join(self.proj_mod_path, 'items.py'))

View File

@ -74,6 +74,10 @@ if len(sys.argv) > 1 and sys.argv[1] == 'bdist_wininst':
for file_info in data_files:
file_info[0] = '\\PURELIB\\%s' % file_info[0]
scripts = ['bin/scrapy', 'bin/scrapy-ws.py', 'bin/scrapy-sqs.py']
if os.name == 'nt':
scripts.append('extras/scrapy.bat')
# Dynamically calculate the version based on scrapy.__version__
version = ".".join(map(str, __import__('scrapy').version_info[:2]))
@ -90,7 +94,7 @@ setup_args = {
'packages': packages,
'cmdclass': cmdclasses,
'data_files': data_files,
'scripts': ['bin/scrapy', 'bin/scrapy-ctl.py', 'bin/scrapy-ws.py', 'bin/scrapy-sqs.py'],
'scripts': scripts,
'classifiers': [
'Programming Language :: Python',
'Programming Language :: Python :: 2.5',