爬虫实战：探索XPath爬虫技巧之热榜新闻

作者：System 时间：2024年08月16日分类：所有,爬虫字数：655

这篇文章距离上次修改已过339天，其中的内容可能已经有所变动。




import requests
from lxml import etree
 
# 请求URL
url = 'https://sina.com.cn/'
headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3',
    'Referer': 'https://sina.com.cn/'
}
 
# 发送请求
response = requests.get(url, headers=headers)
 
# 解析HTML
html = etree.HTML(response.text)
 
# 提取热榜新闻标题
hot_news_titles = html.xpath('//ul[@class="news-list"]/li/a/text()')
 
# 打印结果
for title in hot_news_titles:
    print(title)

这段代码使用了requests库来发送HTTP请求，使用lxml.etree中的etree.HTML来解析HTML内容，并使用XPath查询语言来提取热榜新闻标题。这是一个简单的实践，展示了如何使用XPath进行网页信息的定位和提取。

爬虫实战：探索XPath爬虫技巧之热榜新闻

评论已关闭

推荐阅读