python爬虫6—高性能异步爬虫

作者：System 时间：2024年08月19日分类：所有,爬虫字数：772

这篇文章距离上次修改已过356天，其中的内容可能已经有所变动。




import asyncio
import aiohttp
 
async def fetch(session, url):
    async with session.get(url) as response:
        return await response.text()
 
async def main():
    urls = ['http://httpbin.org/delay/1', 'http://httpbin.org/delay/2'] * 100
    async with aiohttp.ClientSession() as session:
        tasks = [asyncio.create_task(fetch(session, url)) for url in urls]
        html_list = await asyncio.gather(*tasks)
        for html in html_list:
            print(len(html))
 
if __name__ == '__main__':
    asyncio.run(main())

这段代码使用了asyncio库和aiohttp库来实现高性能的异步网络请求。fetch函数负责发起对单个URL的请求并获取响应文本。main函数则是程序的入口点，它创建了一个ClientSession对象，并用它来并发地获取多个URL的内容。通过asyncio.gather函数，我们能够并发地执行多个任务，并在所有任务完成后收集结果。这样的爬虫模型在处理大量网络请求时能够显著提高效率。

python爬虫6—高性能异步爬虫

评论已关闭

推荐阅读