1
0
mirror of https://github.com/scrapy/scrapy.git synced 2025-02-25 05:44:03 +00:00
scrapy/docs/topics/djangoitem.rst

97 lines
2.8 KiB
ReStructuredText
Raw Normal View History

2009-08-26 08:44:22 -03:00
.. _topics-djangoitem:
==========
DjangoItem
==========
:class:`DjangoItem` is a class of item that gets its fields definition from a
Django model, you simply create a :class:`DjangoItem` and specify what Django
model it relates to.
2009-08-26 08:44:22 -03:00
Besides of getting the model fields defined on your item, :class:`DjangoItem`
provides a method to create and populate a Django model instance with the item
data.
2009-08-26 08:44:22 -03:00
Using DjangoItem
================
:class:`DjangoItem` works much like ModelForms in Django, you create a subclass
and define its ``django_model`` atribute to ve a valid Django model. With this
you will get an item with a field for each Django model field.
2009-08-26 08:44:22 -03:00
In addition, you can define fields that aren't present in the model and even
override fields that are present in the model defining them in the item.
Let's see some examples:
Django model for the examples::
class Person(models.Model):
name = models.CharField(max_length=255)
age = models.IntegerField()
Defining a basic :class:`DjangoItem`::
2009-08-26 08:44:22 -03:00
class PersonItem(DjangoItem):
django_model = Person
:class:`DjangoItem` work just like :class:`~scrapy.item.Item`::
2009-08-26 08:44:22 -03:00
p = PersonItem()
p['name'] = 'John'
p['age'] = '22'
To obtain the Django model from the item, we call the extra method
:meth:`~DjangoItem.save` of the :class:`DjangoItem`::
2009-08-26 08:44:22 -03:00
>>> person = p.save()
>>> person.name
'John'
>>> person.age
'22'
>>> person.id
1
As you see the model is already saved when we call :meth:`~DjangoItem.save`, we
can prevent this by calling it with ``commit=False``. We can use
``commit=False`` in :meth:`~DjangoItem.save` method to obtain an unsaved model::
2009-08-26 08:44:22 -03:00
>>> person = p.save(commit=False)
>>> person.name
2009-08-26 08:44:22 -03:00
'John'
>>> person.age
2009-08-26 08:44:22 -03:00
'22'
>>> person.id
2009-08-26 08:44:22 -03:00
None
As said before, we can add other fields to the item::
2009-08-26 08:44:22 -03:00
class PersonItem(DjangoItem):
django_model = Person
sex = Field()
p = PersonItem()
p['name'] = 'John'
p['age'] = '22'
p['sex'] = 'M'
.. note:: fields added to the item won't be taken into account when doing a :meth:`~DjangoItem.save`
2009-08-26 08:44:22 -03:00
And we can override the fields of the model with your own::
2009-08-26 08:44:22 -03:00
class PersonItem(DjangoItem):
django_model = Person
name = Field(default='No Name')
2009-08-26 08:44:22 -03:00
This is usefull to provide properties to the field, like a default or any other
property that your project uses.
2012-08-29 11:23:11 -03:00
DjangoItem caveats
==================
DjangoItem is a rather convenient way to integrate Scrapy projects with Django
models, but bear in mind that Django ORM may not scale well if you scrape a lot
of items (ie. millions) with Scrapy. This is because a relational backend is
often not a good choice for a write intensive application (such as a web
crawler), specially if the database is highly normalized and with many indices.