爬虫--列车时刻表数据（python）

作者：System 时间：2024年08月23日分类：所有,爬虫字数：1284

这篇文章距离上次修改已过355天，其中的内容可能已经有所变动。

爬取列车时刻表数据可以使用Python的requests库来获取网页内容，然后使用BeautifulSoup库来解析网页。以下是一个简单的例子，展示如何获取某个列车时刻表页面的数据。




import requests
from bs4 import BeautifulSoup
 
# 列车时刻表网页URL
url = 'http://www.12306.cn/index/trainlist-N-Q-1.html'
 
# 发送HTTP请求
response = requests.get(url)
 
# 检查请求是否成功
if response.status_code == 200:
    # 使用BeautifulSoup解析网页内容
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # 找到所有列车时刻表信息的表格
    trains_table = soup.find('table', class_='train_list')
    
    # 遍历每一行（跳过表头）
    for row in trains_table.find_all('tr')[1:]:
        # 提取每一列的数据
        cells = row.find_all('td')
        train_number = cells[0].text.strip()  # 列车号
        start_station = cells[1].text.strip()  # 起点站
        end_station = cells[2].text.strip()  # 终点站
        start_time = cells[3].text.strip()  # 开行时间
        duration = cells[4].text.strip()  # 耗时
        frequency = cells[5].text.strip()  # 频率
        car_type = cells[6].text.strip()  # 车型
        print(train_number, start_station, end_station, start_time, duration, frequency, car_type)
else:
    print("Failed to retrieve webpage")

请注意，实际的列车时刻表网页可能会更新版面或者加入额外的反爬机制，如JavaScript渲染的内容或者需要登录验证等。此外，频繁请求可能会受到服务器限制，因此应遵守相关法律法规，遵循robots.txt协议，合理设置请求频率，并在适当的时候增加必要的请求头信息（如User-Agent、Referer等）来模拟真实的浏览器请求。

爬虫--列车时刻表数据（python）

评论已关闭

推荐阅读