get请求搜索功能爬虫

作者：System 时间：2024年08月23日分类：所有,爬虫字数：1041

这篇文章距离上次修改已过336天，其中的内容可能已经有所变动。

为了创建一个使用GET请求实现搜索功能的爬虫，你可以使用Python的requests库来发送HTTP GET请求，并使用BeautifulSoup库来解析返回的HTML内容。以下是一个简单的例子，假设我们要搜索一个假设的网站（http://example.com）。




import requests
from bs4 import BeautifulSoup
 
def search_on_site(query):
    # 构建GET请求的URL
    url = 'http://example.com/search?q={}'.format(query)
    
    # 发送GET请求
    response = requests.get(url)
    
    # 检查请求是否成功
    if response.status_code == 200:
        # 解析返回的HTML内容
        soup = BeautifulSoup(response.text, 'html.parser')
        
        # 提取你需要的信息，例如搜索结果的标题
        results = soup.find_all('div', {'class': 'search-result'})
        for result in results:
            title = result.find('h3', {'class': 'result-title'})
            if title:
                print(title.text)
    else:
        print("Failed to retrieve search results")
 
# 使用函数进行搜索
search_on_site('python')

请注意，你需要根据实际的网站结构调整URL的构建和解析过程中的选择器。此外，不同网站可能需要额外的请求头（headers），例如User-Agent，Cookies，或者特定的认证机制。在这种情况下，你可以通过修改requests.get()方法的参数来添加这些头信息。

get请求搜索功能爬虫

评论已关闭

推荐阅读