Deprecated scrapy-ctl.py command in favour of simpler "scrapy" command. Closes #199. Also updated documenation accordingly and added convenient scrapy.bat script for running from Windows.

--HG-- rename : debian/scrapy-ctl.1 => debian/scrapy.1 rename : docs/topics/scrapy-ctl.rst => docs/topics/cmdline.rst
2025-02-22 06:13:24 +00:00 · 2010-08-18 19:48:32 -03:00 · 2010-08-18 19:48:32 -03:00 · 34554da201
commit 34554da201
parent 5522afca8d
22 changed files with 151 additions and 157 deletions
--- a/bin/scrapy-ctl.py
+++ b/bin/scrapy-ctl.py
@ -1,4 +0,0 @@
-#!/usr/bin/env python
-
-from scrapy.cmdline import execute
-execute()
--- a/debian/rules
+++ b/debian/rules
@ -7,6 +7,5 @@
 override_dh_auto_install:
 	dh_auto_install
 	mkdir -p $(CURDIR)/debian/scrapy/usr/bin
-	mv $(CURDIR)/debian/tmp/usr/bin/scrapy-ctl.py $(CURDIR)/debian/scrapy/usr/bin/scrapy-ctl
 	mv $(CURDIR)/debian/tmp/usr/bin/scrapy-ws.py $(CURDIR)/debian/scrapy/usr/bin/scrapy-ws
 	mv $(CURDIR)/debian/tmp/usr/bin/scrapy-sqs.py $(CURDIR)/debian/scrapy/usr/bin/scrapy-sqs
--- a/debian/scrapy-ctl.1
+++ b/debian/scrapy-ctl.1
@ -1,12 +1,12 @@
-.TH SCRAPY-CTL 1 "October 17, 2009"
+.TH SCRAPY 1 "October 17, 2009"
 .SH NAME
-scrapy-ctl \- Python Scrapy control script
+scrapy \- Python Scrapy control script
 .SH SYNOPSIS
-.B scrapy-ctl
+.B scrapy
 [\fIcommand\fR] [\fIOPTIONS\fR] ...
 .SH DESCRIPTION
 .PP
-Scrapy is controlled through the \fBscrapy-ctl\fR control script. The script provides several commands, for different purposes. Each command supports its own particular syntax. In other words, each command supports a different set of arguments and options.
+Scrapy is controlled through the \fBscrapy\fR control script. The script provides several commands, for different purposes. Each command supports its own particular syntax. In other words, each command supports a different set of arguments and options.
 .SH OPTIONS
 .SS fetch\fR [\fIOPTION\fR]  \fIURL\fR
 .TP
@ -72,7 +72,7 @@ Set/override setting (may be repeated)
 Python path to the Scrapy project settings

 .SH AUTHOR
-Scrapy-ctl was written by the Scrapy Developers 
+Scrapy was written by the Scrapy Developers 
 <scrapy-developers@googlegroups.com>.
 .PP
 This manual page was written by Ignace Mouzannar <mouzannar@gmail.com>,
--- a/debian/scrapy.manpages
+++ b/debian/scrapy.manpages
@ -1 +1 @@
-debian/scrapy-ctl.1
+debian/scrapy.1
--- a/docs/faq.rst
+++ b/docs/faq.rst
@ -116,7 +116,7 @@ Can I run a spider without creating a project?
 Yes. You can use the ``runspider`` command. For example, if you have a spider
 written in a ``my_spider.py`` file you can run it with::

-    scrapy-ctl.py runspider my_spider.py
+    scrapy runspider my_spider.py

 I get "Filtered offsite request" messages. How can I fix them?
 --------------------------------------------------------------
--- a/docs/index.rst
+++ b/docs/index.rst
@ -169,15 +169,15 @@ Reference
 .. toctree::
   :hidden:

-   topics/scrapy-ctl
+   topics/cmdline
   topics/request-response
   topics/settings
   topics/signals
   topics/exceptions
   topics/exporters

-:doc:`topics/scrapy-ctl`
-    Understand the command used to control your Scrapy project.
+:doc:`topics/cmdline`
+    Understand the command-line tool used to control your Scrapy project.

 :doc:`topics/request-response`
    Understand the classes used to represent HTTP requests and responses.
--- a/docs/intro/install.rst
+++ b/docs/intro/install.rst
@ -214,19 +214,19 @@ Installing the development version

       set PYTHONPATH=C:\path\to\scrapy-trunk

-3. Make the ``scrapy-ctl.py`` script available
+3. Make the ``scrapy`` command available

   On Unix-like systems, create a symbolic link to the file
-   ``scrapy-trunk/bin/scrapy-ctl.py`` in a directory on your system path,
+   ``scrapy-trunk/bin/scrapy`` in a directory on your system path,
   such as ``/usr/local/bin``. For example::

-       ln -s `pwd`/scrapy-trunk/bin/scrapy-ctl.py /usr/local/bin
+       ln -s `pwd`/scrapy-trunk/bin/scrapy /usr/local/bin

-   This simply lets you type ``scrapy-ctl.py`` from within any directory, rather
+   This simply lets you type ``scrapy`` from within any directory, rather
   than having to qualify the command with the full path to the file.

   On Windows systems, the same result can be achieved by copying the file
-   ``scrapy-trunk/bin/scrapy-ctl.py`` to somewhere on your system path,
+   ``scrapy-trunk/bin/scrapy`` to somewhere on your system path,
   for example ``C:\Python25\Scripts``, which is customary for Python scripts.

 .. _Control Panel: http://www.microsoft.com/resources/documentation/windows/xp/all/proddocs/en-us/sysdm_advancd_environmnt_addchange_variable.mspx
--- a/docs/intro/tutorial.rst
+++ b/docs/intro/tutorial.rst
@ -36,12 +36,12 @@ Creating a project
 Before start scraping, you will have set up a new Scrapy project. Enter a
 directory where you'd like to store your code and then run::

-   python scrapy-ctl.py startproject dmoz
+   scrapy startproject dmoz

 This will create a ``dmoz`` directory with the following contents::

   dmoz/
-       scrapy-ctl.py
+       scrapy.cfg
       dmoz/
           __init__.py
           items.py
@ -53,7 +53,7 @@ This will create a ``dmoz`` directory with the following contents::

 These are basically: 

-* ``scrapy-ctl.py``: the project's control script.
+* ``scrapy.cfg``: the project configuration file
 * ``dmoz/``: the project's python module, you'll later import your code from
  here.
 * ``dmoz/items.py``: the project's items file.
@ -144,7 +144,7 @@ Crawling

 To put our spider to work, go to the project's top level directory and run::

-   python scrapy-ctl.py crawl dmoz.org
+   scrapy crawl dmoz.org

 The ``crawl dmoz.org`` command runs the spider for the ``dmoz.org`` domain. You
 will get an output similar to this::
@ -244,7 +244,7 @@ installed on your system.

 To start a shell you must go to the project's top level directory and run::

-   python scrapy-ctl.py shell http://www.dmoz.org/Computers/Programming/Languages/Python/Books/
+   scrapy shell http://www.dmoz.org/Computers/Programming/Languages/Python/Books/

 This is what the shell looks like::

@ -372,7 +372,7 @@ Let's add this code to our spider::
 Now try crawling the dmoz.org domain again and you'll see sites being printed
 in your output, run::

-   python scrapy-ctl.py crawl dmoz.org
+   scrapy crawl dmoz.org

 Using our item
 --------------
--- a/docs/topics/cmdline.rst
+++ b/docs/topics/cmdline.rst
@ -0,0 +1,94 @@
+.. _topics-cmdline:
+
+========================
+Scrapy command line tool
+========================
+
+Scrapy is controlled through the ``scrapy`` command, which we'll refer to as
+the "Scrapy tool" from now on to differentiate it from Scrapy commands.
+
+The Scrapy tool provides several commands, for different purposes. Each command
+supports its own particular syntax. In other words, each command supports a
+different set of arguments and options.
+
+This page doesn't describe each command and its syntax, but instead provides an
+introduction to how the ``scrapy`` tool is used. After you learn the basics,
+you can get help for each particular command using the ``scrapy`` tool itself.
+
+Using the ``scrapy`` tool
+=========================
+
+The first thing you would do with the ``scrapy`` tool is to create your Scrapy
+project::
+
+    scrapy startproject myproject
+
+That will create a Scrapy project under the ``myproject`` directory.
+
+Next, you go inside the new project directory::
+
+    cd myproject
+
+And you're ready to use use the ``scrapy`` command to manage and control your
+project from there. For example, to create a new spider::
+
+    scrapy genspider mydomain mydomain.com
+
+See all available commands
+--------------------------
+
+To see all available commands type::
+
+    scrapy -h
+
+That will print a summary of all available Scrapy commands.
+
+The first line will print the currently active project, if you're inside a
+Scrapy project.
+
+Example (with an active project)::
+
+    Scrapy X.X.X - project: myproject
+
+    Usage
+    =====
+
+    ...
+
+Example (with no active project)::
+
+    Scrapy X.X.X - no active project
+
+    Usage
+    =====
+
+    ...
+
+
+Get help for a particular command
+---------------------------------
+
+To get help about a particular command, including its description, usage, and
+available options type::
+
+    scrapy <command> -h
+
+Example::
+
+    scrapy crawl -h
+
+Using ``scrapy`` tool outside your project
+==========================================
+
+Not all commands must be run from "inside" a Scrapy project. You can, for
+example, use the ``fetch`` command to download a page (using Scrapy built-in
+downloader) from outside a project. Other commands that can be used outside a
+project are ``startproject`` (obviously) and ``shell``, to launch a
+:ref:`Scrapy Shell <topics-shell>`.
+
+Also, keep in mind that some commands may have slightly different behaviours
+when running them from inside projects. For example, the fetch command will use
+spider arguments (such as ``user_agent`` attribute) if the url being fetched is
+handled by some specific project spider that happens to define a custom
+``user_agent`` attribute. This is feature, as the ``fetch`` command is meant to
+download pages as they would be downloaded from the spider.
--- a/docs/topics/scrapy-ctl.rst
+++ b/docs/topics/scrapy-ctl.rst
@ -1,103 +0,0 @@
-.. _topics-scrapy-ctl:
-
-=============
-scrapy-ctl.py
-=============
-
-Scrapy is controlled through the ``scrapy-ctl.py`` control script. The script
-provides several commands, for different purposes. Each command supports its
-own particular syntax. In other words, each command supports a different set of
-arguments and options.
-
-This page doesn't describe each command and its syntax, but provides an
-introduction to how the ``scrapy-ctl.py`` script is used. After you learn how
-to use it, you can get help for each particular command using the same
-``scrapy-ctl.py`` script.
-
-Global and project-specific ``scrapy-ctl.py``
-=============================================
-
-There is one global ``scrapy-ctl.py`` script shipped with Scrapy and another
-``scrapy-ctl.py`` script automatically created inside your Scrapy project. The
-project-specific ``scrapy-ctl.py`` is just a thin wrapper around the global
-``scrapy-ctl.py`` which populates the settings of your project, so you don't
-have to specify them every time through the ``--settings`` argument.
-
-Using the ``scrapy-ctl.py`` script
-==================================
-
-The first thing you would do with the ``scrapy-ctl.py`` script is create your
-Scrapy project::
-
-    scrapy-ctl.py startproject myproject
-
-That will create a Scrapy project under the ``myproject`` directory and will
-put a new ``scrapy-ctl.py`` inside that directory.
-
-So, you go inside the new project directory::
-
-    cd myproject
-
-And you're ready to use your project's ``scrapy-ctl.py``. For example, to
-create a new spider::
-
-    python scrapy-ctl.py genspider mydomain mydomain.com
-
-This is the same as using the global ``scrapy-ctl.py`` script and passing the
-project settings module in the ``--settings`` argument::
-
-    scrapy-ctl.py --settings=myproject.settings genspider mydomain mydomain.com
-
-You'll typically use the project-specific ``scrapy-ctl.py``, for convenience.
-
-See all available commands
--------------------------
-
-To see all available commands type::
-
-    scrapy-ctl.py -h
-
-That will print a summary of all available Scrapy commands.
-
-The first line will print the currently active project (if any). 
-
-Example (active project)::
-
-    Scrapy X.X.X - project: myproject
-
-    Usage
-    =====
-
-    ...
-
-Example (no active project)::
-
-    Scrapy X.X.X - no active project
-
-    Usage
-    =====
-
-    ...
-
-
-Get help for a particular command
---------------------------------
-
-To get help about a particular command, including its description, usage and
-available options type::
-
-    scrapy-ctl.py <command> -h
-
-Example::
-
-    scrapy-ctl.py crawl -h
-
-Using ``scrapy-ctl.py`` outside your project
-============================================
-
-Not all commands must be run from "inside" a Scrapy project. You can, for
-example, use the ``fetch`` command to download a page (using Scrapy built-in
-downloader) from outside a project. Other commands that can be used outside a
-project are ``startproject`` (obviously) and ``shell``, to launch a
-:ref:`Scrapy Shell <topics-shell>`.
-
--- a/docs/topics/selectors.rst
+++ b/docs/topics/selectors.rst
@ -83,7 +83,7 @@ Here's its HTML code:

 First, let's open the shell::

-    scrapy-ctl.py shell http://doc.scrapy.org/_static/selectors-sample1.html
+    scrapy shell http://doc.scrapy.org/_static/selectors-sample1.html

 Then, after the shell loads, you'll have some selectors already instanced and
 ready to use.
--- a/docs/topics/settings.rst
+++ b/docs/topics/settings.rst
@ -24,8 +24,7 @@ Designating the settings

 When you use Scrapy, you have to tell it which settings you're using. You can
 do this by using an environment variable, ``SCRAPY_SETTINGS_MODULE``, or the
-``--settings`` argument of the :doc:`scrapy-ctl.py script
-</topics/scrapy-ctl>`.
+``--settings`` argument of the :doc:`scrapy command </topics/cmdline>`.

 The value of ``SCRAPY_SETTINGS_MODULE`` should be in Python path syntax, e.g.
 ``myproject.settings``. Note that the settings module should be on the
@ -65,7 +64,7 @@ You can also override one (or more) settings from command line using the

 Example::

-    scrapy-ctl.py crawl domain.com --set LOG_FILE=scrapy.log
+    scrapy crawl domain.com --set LOG_FILE=scrapy.log

 2. Environment variables
 ------------------------
@ -74,7 +73,7 @@ You can populate settings using environment variables prefixed with
 ``SCRAPY_``. For example, to change the log file location un Unix systems::

    $ export SCRAPY_LOG_FILE=scrapy.log
-    $ scrapy-ctl.py crawl example.com
+    $ scrapy crawl example.com

 In Windows systems, you can change the environment variables from the Control
 Panel following `these guidelines`_.
@ -90,7 +89,7 @@ It's where most of your custom settings will be populated.
 4. Default settings per-command
 -------------------------------

-Each :doc:`/topics/scrapy-ctl` command can have its own default settings, which
+Each :doc:`/topics/cmdline` command can have its own default settings, which
 override the global default settings. Those custom command settings are
 specified in the ``default_settings`` attribute of the command class.

@ -224,7 +223,7 @@ project name). This will be used to construct the User-Agent by default, and
 also for logging.

 It's automatically populated with your project name when you create your
-project with the :doc:`scrapy-ctl.py </topics/scrapy-ctl>` ``startproject``
+project with the :doc:`scrapy </topics/cmdline>` ``startproject``
 command.

 .. setting:: BOT_VERSION
@ -998,7 +997,7 @@ TEMPLATES_DIR
 Default: ``templates`` dir inside scrapy module

 The directory where to look for template when creating new projects with
-:doc:`scrapy-ctl.py startproject </topics/scrapy-ctl>` command.
+:doc:`scrapy startproject </topics/cmdline>` command.

 .. setting:: URLLENGTH_LIMIT

--- a/docs/topics/shell.rst
+++ b/docs/topics/shell.rst
@ -33,7 +33,7 @@ Launch the shell

 To launch the shell type::

-    scrapy-ctl.py shell <url>
+    scrapy shell <url>

 Where the ``<url>`` is the URL you want to scrape.

@ -106,7 +106,7 @@ shell works.

 First, we launch the shell::

-    python scrapy-ctl.py shell http://scrapy.org --nolog
+    scrapy shell http://scrapy.org --nolog

 Then, the shell fetches the url (using the Scrapy downloader) and prints the
 list of available objects and some help::
--- a/examples/experimental/googledir/googledir/spiders/init.py
+++ b/examples/experimental/googledir/googledir/spiders/init.py
@ -2,7 +2,7 @@
 #
 # To create the first spider for your project use this command:
 #
-#   scrapy-ctl.py genspider myspider myspider-domain.com
+#   scrapy genspider myspider myspider-domain.com
 #
 # For more info see:
 # http://doc.scrapy.org/topics/spiders.html
--- a/examples/experimental/imdb/imdb/spiders/init.py
+++ b/examples/experimental/imdb/imdb/spiders/init.py
@ -2,7 +2,7 @@
 #
 # To create the first spider for your project use this command:
 #
-#   scrapy-ctl.py genspider myspider myspider-domain.com
+#   scrapy genspider myspider myspider-domain.com
 #
 # For more info see:
 # http://doc.scrapy.org/topics/spiders.html
--- a/extras/scrapy.bat
+++ b/extras/scrapy.bat
@ -0,0 +1,10 @@
+@echo off
+rem Windows command-line tool for Scrapy
+
+setlocal
+rem Use a full path to Python (relative to this script) as the standard Python
+rem install does not put python.exe on the PATH...
+rem %~dp0 is the directory of this script
+
+%~dp0..\python "%~dp0scrapy" %*
+endlocal
--- a/scrapy/cmdline.py
+++ b/scrapy/cmdline.py
@ -51,9 +51,9 @@ def _print_usage(inside_project):
    print "Usage"
    print "=====\n"
    print "To run a command:"
-    print "  scrapy-ctl.py <command> [options] [args]\n"
+    print "  scrapy <command> [options] [args]\n"
    print "To get help:"
-    print "  scrapy-ctl.py <command> -h\n"
+    print "  scrapy <command> -h\n"
    print "Available commands"
    print "==================\n"
    cmds = _get_commands_dict()
@ -66,6 +66,10 @@ def _print_usage(inside_project):
 def execute(argv=None):
    if argv is None:
        argv = sys.argv
+    if any('scrapy-ctl' in x for x in argv):
+        import warnings
+        warnings.warn("`scrapy-ctl.py` command-line tool is deprecated and will be removed in Scrapy 0.11, use `scrapy` instead",
+            DeprecationWarning, stacklevel=2)

    cmds = _get_commands_dict()

@ -82,7 +86,7 @@ def execute(argv=None):
        parser.usage = "%%prog %s %s" % (cmdname, cmd.syntax())
        parser.description = cmd.long_desc()
        if cmd.requires_project and not settings.settings_module:
-            print "Error running: scrapy-ctl.py %s\n" % cmdname
+            print "Error running: scrapy %s\n" % cmdname
            print "Cannot find project settings module in python path: %s" % \
                settings.settings_module_path
            sys.exit(1)
@ -98,7 +102,7 @@ def execute(argv=None):
        sys.exit(2)
    else:
        print "Unknown command: %s\n" % cmdname
-        print 'Use "scrapy-ctl.py -h" for help' 
+        print 'Use "scrapy -h" for help' 
        sys.exit(2)

    settings.defaults.update(cmd.default_settings)
--- a/scrapy/commands/startproject.py
+++ b/scrapy/commands/startproject.py
@ -13,7 +13,6 @@ TEMPLATES_PATH = join(scrapy.__path__[0], 'templates', 'project')

 TEMPLATES_TO_RENDER = (
    ('scrapy.cfg',),
-    ('scrapy-ctl.py',),
    ('${project_name}', 'settings.py.tmpl'),
    ('${project_name}', 'items.py.tmpl'),
    ('${project_name}', 'pipelines.py.tmpl'),
@ -47,7 +46,6 @@ class Command(ScrapyCommand):
        moduletpl = join(TEMPLATES_PATH, 'module')
        copytree(moduletpl, join(project_name, project_name), ignore=IGNORE)
        shutil.copy(join(TEMPLATES_PATH, 'scrapy.cfg'), project_name)
-        shutil.copy(join(TEMPLATES_PATH, 'scrapy-ctl.py'), project_name)
        for paths in TEMPLATES_TO_RENDER:
            path = join(*paths)
            tplfile = join(project_name,
--- a/scrapy/templates/project/module/spiders/init.py
+++ b/scrapy/templates/project/module/spiders/init.py
@ -2,7 +2,7 @@
 #
 # To create the first spider for your project use this command:
 #
-#   scrapy-ctl.py genspider myspider myspider-domain.com
+#   scrapy genspider myspider myspider-domain.com
 #
 # For more info see:
 # http://doc.scrapy.org/topics/spiders.html
--- a/scrapy/templates/project/scrapy-ctl.py
+++ b/scrapy/templates/project/scrapy-ctl.py
@ -1,7 +0,0 @@
-#!/usr/bin/env python
-
-import os
-os.environ.setdefault('SCRAPY_SETTINGS_MODULE', '${project_name}.settings')
-
-from scrapy.cmdline import execute
-execute()
--- a/scrapy/tests/test_commands.py
+++ b/scrapy/tests/test_commands.py
@ -35,7 +35,7 @@ class StartprojectTest(ProjectTest):
    def test_startproject(self):
        self.assertEqual(0, self.call('startproject', self.project_name))

-        assert exists(join(self.proj_path, 'scrapy-ctl.py'))
+        assert exists(join(self.proj_path, 'scrapy.cfg'))
        assert exists(join(self.proj_path, 'testproject'))
        assert exists(join(self.proj_mod_path, '__init__.py'))
        assert exists(join(self.proj_mod_path, 'items.py'))
--- a/setup.py
+++ b/setup.py
@ -74,6 +74,10 @@ if len(sys.argv) > 1 and sys.argv[1] == 'bdist_wininst':
    for file_info in data_files:
        file_info[0] = '\\PURELIB\\%s' % file_info[0]

+scripts = ['bin/scrapy', 'bin/scrapy-ws.py', 'bin/scrapy-sqs.py']
+if os.name == 'nt':
+    scripts.append('extras/scrapy.bat')
+
 # Dynamically calculate the version based on scrapy.__version__
 version = ".".join(map(str, __import__('scrapy').version_info[:2]))

@ -90,7 +94,7 @@ setup_args = {
    'packages': packages,
    'cmdclass': cmdclasses,
    'data_files': data_files,
-    'scripts': ['bin/scrapy', 'bin/scrapy-ctl.py', 'bin/scrapy-ws.py', 'bin/scrapy-sqs.py'],
+    'scripts': scripts,
    'classifiers': [
        'Programming Language :: Python',
        'Programming Language :: Python :: 2.5',