动态爬虫（以携程为目标，爬取评论分数留言等）

作者：System 时间：2024年08月23日分类：所有,爬虫字数：1058

这篇文章距离上次修改已过692天，其中的内容可能已经有所变动。

由于原始代码已经包含了一个很好的示例，我们可以直接使用原始代码作为解决方案。以下是关键函数的简要说明和代码实例：




import requests
from lxml import etree
 
def get_comments(url):
    """
    获取旅行目的地评论数据
    :param url: 评论页面的URL
    :return: 返回包含评分和留言的字典列表
    """
    response = requests.get(url)
    if response.status_code == 200:
        html = etree.HTML(response.text)
        comments = html.xpath('//div[@class="comment_list"]/div[@class="day"]')
        result = []
        for comment in comments:
            score = comment.xpath('.//div[@class="score"]/strong/text()')[0]
            message = comment.xpath('.//div[@class="text"]/span/text()')[0].strip()
            result.append({'score': score, 'message': message})
        return result
    else:
        print("请求失败")
        return []
 
# 使用示例
url = 'https://you.ctrip.com/sight/3335.html#comments'
comments = get_comments(url)
for comment in comments:
    print(f'评分: {comment["score"]}, 留言: {comment["message"]}')

这段代码定义了一个get_comments函数，它接受一个URL作为参数，发送HTTP GET请求，解析返回的页面，提取评分和留言信息，并以列表的形式返回。使用时需要替换为具体的URL。

动态爬虫（以携程为目标，爬取评论分数留言等）

评论已关闭

推荐阅读