【Python 爬虫基础】初见 Python 网络爬虫

作者：System 时间：2024年08月14日分类：所有,爬虫字数：1030

这篇文章距离上次修改已过352天，其中的内容可能已经有所变动。




import requests
from bs4 import BeautifulSoup
 
# 发送网络请求获取网页内容
def get_html(url):
    try:
        response = requests.get(url)
        if response.status_code == 200:
            return response.text
        else:
            return "网页请求失败，状态码：" + str(response.status_code)
    except requests.RequestException:
        return "发生错误，无法获取网页内容"
 
# 解析网页并提取数据
def parse_html(html):
    soup = BeautifulSoup(html, 'html.parser')
    title = soup.find('h1', class_='post-title').get_text()
    content = soup.find('div', class_='post-content').get_text()
    return {
        'title': title,
        'content': content
    }
 
# 主函数，组装URL并调用函数获取和解析网页
def main():
    url = 'https://www.example.com/some-post'
    html = get_html(url)
    parsed_data = parse_html(html)
    print(parsed_data)
 
if __name__ == '__main__':
    main()

这段代码展示了如何使用Python的requests库来发送网络请求，以及如何使用BeautifulSoup库来解析HTML并提取数据。代码中定义了get_html和parse_html两个函数，分别用于获取网页内容和解析网页内容。最后，在main函数中，我们组装了一个URL，并调用这两个函数来获取和展示解析后的数据。

【Python 爬虫基础】初见 Python 网络爬虫

评论已关闭

推荐阅读