爬虫爬取百度图片——大量、多张图片
要爬取百度图片的大量图片,可以使用Python的requests和beautifulsoup库。以下是一个简单的示例代码,用于爬取一个关键词的多张图片。
import requests
from bs4 import BeautifulSoup
import os
def download_image(image_url, file_path):
response = requests.get(image_url, stream=True)
if response.status_code == 200:
with open(file_path, 'wb') as file:
for chunk in response.iter_content(chunk_size=1024):
if chunk:
file.write(chunk)
print(f"Image saved: {file_path}")
else:
print(f"Failed to download: {image_url}")
def crawl_baidu_images(keyword, max_images):
download_dir = 'images/'
os.makedirs(download_dir, exist_ok=True)
base_url = 'https://image.baidu.com/search/flip?tn=baiduimage&ie=utf-8&word='
url = base_url + keyword
response = requests.get(url)
soup = BeautifulSoup(response.text, 'html.parser')
image_urls = [img['src'] for img in soup.find_all('img', class_='main_img')][:max_images]
for i, image_url in enumerate(image_urls):
file_path = download_dir + f'{i+1}.jpg'
download_image(image_url, file_path)
if __name__ == '__main__':
keyword = 'tiananmen' # 替换为你想要的关键词
max_images = 20 # 最多下载的图片数量
crawl_baidu_images(keyword, max_images)
请注意,这个代码仅用作学习和演示目的,实际使用时应遵守相关法律法规,并遵守网站的爬虫政策。对于大规模爬取,应设置适当的请求间隔,并考虑使用代理和session对象以维持会话。
评论已关闭