第一个简单爬虫：获取页面

作者：System 时间：2024年08月16日分类：所有,爬虫字数：695

这篇文章距离上次修改已过352天，其中的内容可能已经有所变动。

下面是一个使用Python的requests库编写的简单网页爬虫示例，用于获取指定网页的内容。




import requests
 
def get_page_content(url):
    try:
        response = requests.get(url)
        if response.status_code == 200:
            return response.text
        else:
            return "Failed to retrieve the webpage, status code: {}".format(response.status_code)
    except requests.exceptions.RequestException as e:
        return "An error occurred: {}".format(e)
 
url = "http://example.com"  # 替换为你想爬取的网页地址
print(get_page_content(url))

这段代码首先导入了requests库，然后定义了一个函数get_page_content，它接受一个URL作为参数，使用requests.get方法获取网页内容。如果请求成功，它返回网页的文本内容；如果请求失败，它返回错误信息。最后，代码中的url变量需要被替换为你想要爬取的网页地址，并打印出获取到的内容。

第一个简单爬虫：获取页面

评论已关闭

推荐阅读