Python爬虫之异步爬虫
import asyncio
import aiohttp
async def fetch(session, url):
async with session.get(url) as response:
return await response.text()
async def main():
urls = ['http://httpbin.org/delay/1', 'http://httpbin.org/delay/2']
async with aiohttp.ClientSession() as session:
tasks = [fetch(session, url) for url in urls]
results = await asyncio.gather(*tasks)
for result in results:
print(result)
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
这段代码使用了aiohttp
库来进行异步HTTP请求,以及asyncio
库来管理异步任务。fetch
函数负责获取指定URL的内容,main
函数则是协程的主要入口点,其中创建了一个ClientSession
,然后并行地执行多个fetch
调用。这样可以有效地提高爬取性能,特别是在网络I/O密集的任务中。
评论已关闭