2.5《Python3 网络爬虫开发实战》学习之实例实战1

作者：System 时间：2024年08月16日分类：所有,爬虫字数：1005

这篇文章距离上次修改已过508天，其中的内容可能已经有所变动。




import requests
from bs4 import BeautifulSoup
 
# 获取网页内容的函数
def get_html_content(url):
    try:
        response = requests.get(url)
        if response.status_code == 200:
            return response.text
        else:
            return "网页获取失败，状态码：" + str(response.status_code)
    except requests.exceptions.RequestException:
        return "请求异常"
 
# 解析网页并提取标题的函数
def parse_html_extract_title(html_content):
    soup = BeautifulSoup(html_content, 'html.parser')
    title = soup.find('title')
    if title:
        return title.string
    else:
        return "未找到标题"
 
# 主函数
def main():
    url = "https://www.python.org"
    html_content = get_html_content(url)
    print(parse_html_extract_title(html_content))
 
# 如果这个脚本被直接运行，则执行main函数
if __name__ == "__main__":
    main()

这段代码首先导入了requests和BeautifulSoup库，然后定义了两个函数：get_html_content用于获取网页内容，parse_html_extract_title用于解析网页并提取标题。最后，在main函数中通过调用这两个函数来实现获取Python官网的标题，并打印输出。最后，如果这段脚本被直接执行，则会调用main函数执行相关操作。

2.5《Python3 网络爬虫开发实战》学习之实例实战1

评论已关闭

推荐阅读