探索Scrapy-DjangoItem：数据抓取与Django模型的完美结合

作者：System 时间：2024年09月03日分类：所有,数据库字数：1159

这篇文章距离上次修改已过339天，其中的内容可能已经有所变动。




import scrapy
from scrapy.loader.processors import TakeFirst
from scrapy_djangoitem import DjangoItem
 
# 假设我们有一个Django模型UserProfile
from myapp.models import UserProfile
 
class UserProfileItem(DjangoItem):
    django_model = UserProfile
    
class MySpider(scrapy.Spider):
    name = 'user_profile'
    allowed_domains = ['example.com']
    start_urls = ['http://www.example.com/user/profiles']
 
    def parse(self, response):
        for profile in response.css('div.profile'):
            item = UserProfileItem()
            item['name'] = profile.css('div.name ::text').extract_first(default='').strip()
            item['title'] = profile.css('div.title ::text').extract_first(default='').strip()
            # 假设我们想要保存的是最先出现的头像图片链接
            item['avatar_url'] = profile.css('div.avatar img::attr(src)').extract_first(default='')
            yield item
 
# 注意：这个例子假设UserProfile模型有name, title和avatar_url字段，并且模型中的字段与Item对象中的字段对应。

这个例子中，我们定义了一个UserProfileItem的子类，它使用了DjangoItem基类并指定了Django模型。在爬虫中，我们使用CSS选择器来提取页面中的用户信息，并将这些信息填充到UserProfileItem实例中。最后，我们通过yield将这个实例提交给Scrapy-DjangoItem管道，由管道负责将数据保存到Django数据库中。

探索Scrapy-DjangoItem：数据抓取与Django模型的完美结合

评论已关闭

推荐阅读