文本抓取利器，Python和Beautiful Soup爬虫助你事半功倍

作者：System 时间：2024年08月16日分类：所有,爬虫字数：739

这篇文章距离上次修改已过348天，其中的内容可能已经有所变动。




from bs4 import BeautifulSoup
import requests
 
# 示例函数：抓取指定URL的网页内容
def fetch_page_content(url):
    response = requests.get(url)
    if response.status_code == 200:
        return response.text
    else:
        return "网页获取失败"
 
# 示例函数：使用Beautiful Soup解析网页并提取指定标签的内容
def extract_content_with_beautiful_soup(html, tag):
    soup = BeautifulSoup(html, 'html.parser')
    return soup.find_all(tag)
 
# 示例用法
url = 'http://example.com'
html_content = fetch_page_content(url)
headings = extract_content_with_beautiful_soup(html_content, 'h1')
for heading in headings:
    print(heading.get_text())

这段代码展示了如何使用Python的requests库获取网页内容，以及如何使用Beautiful Soup库解析网页内容并提取特定标签的文本。这是一个简单的网络爬虫示例，可以帮助初学者理解网络爬虫的基本原理和实践方法。

文本抓取利器，Python和Beautiful Soup爬虫助你事半功倍

评论已关闭

推荐阅读