深入解析Python的lxml库：高效处理XML和HTML的利器

作者：System 时间：2024年08月12日分类：所有,html 字数：794

这篇文章距离上次修改已过354天，其中的内容可能已经有所变动。




from lxml import etree
 
# 解析XML字符串
xml_data = """
<root>
    <item id="1">First Item</item>
    <item id="2">Second Item</item>
</root>
"""
xml_tree = etree.fromstring(xml_data)
print(xml_tree)
 
# 解析HTML字符串
html_data = """
<html>
    <head><title>Sample Title</title></head>
    <body>
        <p id="first">This is the first paragraph.</p>
        <p id="second">This is the second paragraph.</p>
    </body>
</html>
"""
html_tree = etree.HTML(html_data)
print(html_tree)
 
# XPath查询
result = html_tree.xpath('//p[@id="first"]/text()')
print(result)  # 输出段落文本
 
# 序列化XML为字符串
xml_str = etree.tostring(xml_tree, pretty_print=True)
print(xml_str.decode('utf-8'))

这段代码展示了如何使用lxml库来解析XML和HTML字符串，并使用XPath进行查询。然后，它将解析后的树序列化回字符串，并以可读的格式打印输出。这是一个处理XML和HTML数据的实用示例。

深入解析Python的lxml库：高效处理XML和HTML的利器

评论已关闭

推荐阅读