利用BS4的select及find_all查找HTML常见的元素和属性

作者：System 时间：2024年08月21日分类：所有,html 字数：869

这篇文章距离上次修改已过490天，其中的内容可能已经有所变动。




from bs4 import BeautifulSoup
import requests
 
url = 'https://www.example.com'
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
 
# 找到所有的<div>标签
divs = soup.find_all('div')
 
# 找到所有的<p>标签，并获取其文本内容
paragraphs = [p.get_text() for p in soup.find_all('p')]
 
# 找到所有的<a>标签，并获取href属性
links = [(a['href']) for a in soup.find_all('a', href=True)]
 
# 使用CSS选择器查找class为'class-name'的所有<div>标签
divs_with_class = soup.select('div.class-name')
 
# 查找所有带有id='id-name'的元素
element_with_id = soup.select('#id-name')
 
# 查找所有的<h1>标签，并获取其文本内容
headings = [h.get_text() for h in soup.find_all('h1')]
 
# 打印结果
print(divs)
print(paragraphs)
print(links)
print(divs_with_class)
print(element_with_id)
print(headings)

这段代码使用了BeautifulSoup的find_all方法和select方法来查找HTML文档中的常见元素和属性。它演示了如何查找特定的标签、获取文本内容、提取属性，以及如何使用CSS选择器进行更复杂的查询。

利用BS4的select及find_all查找HTML常见的元素和属性

评论已关闭

推荐阅读