深入解析Python的lxml库:高效处理XML和HTML的利器
from lxml import etree
# 解析XML字符串
xml_data = """
<root>
<item id="1">First Item</item>
<item id="2">Second Item</item>
</root>
"""
xml_tree = etree.fromstring(xml_data)
print(xml_tree)
# 解析HTML字符串
html_data = """
<html>
<head><title>Sample Title</title></head>
<body>
<p id="first">This is the first paragraph.</p>
<p id="second">This is the second paragraph.</p>
</body>
</html>
"""
html_tree = etree.HTML(html_data)
print(html_tree)
# XPath查询
result = html_tree.xpath('//p[@id="first"]/text()')
print(result) # 输出段落文本
# 序列化XML为字符串
xml_str = etree.tostring(xml_tree, pretty_print=True)
print(xml_str.decode('utf-8'))
这段代码展示了如何使用lxml库来解析XML和HTML字符串,并使用XPath进行查询。然后,它将解析后的树序列化回字符串,并以可读的格式打印输出。这是一个处理XML和HTML数据的实用示例。
评论已关闭