【爬虫】Python实现爬取淘宝商品信息（超详细）

作者：System 时间：2024年08月11日分类：所有,爬虫字数：1201

这篇文章距离上次修改已过341天，其中的内容可能已经有所变动。

以下是一个简化的示例，展示如何使用Python爬取淘宝商品信息。请注意，实际的应用中应遵守相关法律法规，并遵循网站的robots.txt规则，避免对网站的正常服务造成影响。




import requests
from lxml import etree
 
def crawl_taobao_item(item_url):
    headers = {
        'User-Agent': 'your_user_agent',  # 替换为你的User-Agent
    }
    try:
        response = requests.get(item_url, headers=headers)
        response.raise_for_status()  # 检查是否请求成功
        response.encoding = response.apparent_encoding  # 设置编码格式
        return response.text
    except requests.RequestException as e:
        print(f"Error: {e}")
        return None
 
def parse_item_info(html):
    tree = etree.HTML(html)
    title = tree.xpath('//div[@class="tb-detail-hd"]/h1/text()')[0].strip()
    price = tree.xpath('//div[@class="tb-rmb"]/text()')[0].strip()
    return {
        'title': title,
        'price': price
    }
 
def main():
    item_url = 'https://item.taobao.com/item.htm?id=商品ID'  # 替换为具体的商品链接
    html = crawl_taobao_item(item_url)
    if html:
        item_info = parse_item_info(html)
        print(item_info)
 
if __name__ == "__main__":
    main()

在这个例子中，crawl_taobao_item函数负责发送HTTP请求获取页面内容，parse_item_info函数负责解析页面内容，提取商品标题和价格。请确保你有正确的User-Agent和商品ID。

【爬虫】Python实现爬取淘宝商品信息（超详细）

评论已关闭

推荐阅读