分类 python 下的文章

【机器学习实战】基于python对泰坦尼克幸存者进行数据分析与预测

2024-08-10




import pandas as pd
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction import DictVectorizer
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import make_pipeline
from sklearn.metrics import accuracy_score
 
# 读取数据
titanic_data = pd.read_csv('titanic_data.csv')
 
# 分离特征和目标
X = titanic_data[titanic_data.select_dtypes(exclude=['object']).columns]
y = titanic_data['survived']
 
# 特征工程：对类别变量进行one-hot编码
categorical_features = ['sex', 'n_siblings_spouses', 'parch', 'class', 'deck', 'embark_town', 'alone']
 
preprocessor = ColumnTransformer(
    transformers=[
        ('one_hot_encoder', OneHotEncoder(handle_unknown='ignore'), categorical_features)
    ])
 
# 初始化随机森林分类器
rf_classifier = make_pipeline(preprocessor, RandomForestClassifier())
 
# 划分训练集和测试集
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
 
# 训练模型
rf_classifier.fit(X_train, y_train)
 
# 进行预测
y_pred = rf_classifier.predict(X_test)
 
# 评估模型性能
accuracy = accuracy_score(y_test, y_pred)
print(f"Model Accuracy: {accuracy*100:.2f}%")

这段代码使用了sklearn库中的随机森林分类器来解决分类问题。首先，我们读取了泰坦尼克的数据集，并将其分为特征X和目标y。然后，我们使用ColumnTransformer对类别特征进行one-hot编码，并初始化随机森林分类器。接着，我们使用train_test_split划分数据集为训练集和测试集，并训练模型。最后，我们使用测试集来评估模型性能，并打印出准确率。

【python】python课设天气预测数据分析及可视化

2024-08-10




import pandas as pd
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import PolynomialFeatures
 
# 读取数据
df = pd.read_csv('weatherAUS.csv')
 
# 数据处理：选择部分特征，并转换成正确的日期格式
df['Date'] = pd.to_datetime(df['Date'], format='%Y-%m-%d')
df['year'] = df['Date'].dt.year
df['month'] = df['Date'].dt.month
df['day'] = df['Date'].dt.day
df['hour'] = df['Date'].dt.hour
df = df[['year', 'month', 'day', 'hour', 'Rainfall', 'Evapotranspiration', 'Solar']]
 
# 选择一个月的数据用于演示
df = df[df['month'] == 2]
 
# 根据需要选择特征和目标
X = df[['year', 'month', 'day', 'hour', 'Solar']]
y = df['Rainfall']
 
# 多项式回归：创建多项式特征
poly_reg = PolynomialFeatures(degree=2)
X_poly = poly_reg.fit_transform(X.values[:,1:])
lin_reg = LinearRegression()
lin_reg.fit(X_poly, y)
 
# 可视化多项式回归结果
plt.scatter(X['Solar'], y)
plt.plot(X['Solar'], lin_reg.predict(poly_reg.fit_transform(X[['Solar']])), color='red')
plt.title('Solar vs Rainfall')
plt.xlabel('Solar')
plt.ylabel('Rainfall')
plt.show()

这段代码展示了如何使用多项式回归分析阳光和雨量之间的关系，并将结果可视化。代码中使用了PolynomialFeatures来创建多项式特征，并用LinearRegression进行线性拟合。最后，使用matplotlib.pyplot绘制了散点图和拟合的线条。

Python-matplotlib绘制双(多)y轴图像

2024-08-10

在Python中使用matplotlib库绘制双Y轴图像的基本步骤如下：

使用plt.subplots()创建一个图和一个轴（ax）。
在同一个图中创建第二个轴，并指定其Y轴位置。
在两个轴上分别绘制数据。

以下是一个示例代码，展示了如何绘制双Y轴图像：




import matplotlib.pyplot as plt
import numpy as np
 
# 创建数据
x = np.linspace(0, 10, 100)
y1 = np.sin(x)
y2 = np.cos(x)
 
# 创建图和轴
fig, ax1 = plt.subplots()
 
# 在第一个轴上绘制第一组数据
ax1.plot(x, y1, label='sin(x)')
ax1.set_ylabel('sin(x)')
 
# 创建第二个轴，并放在右侧
ax2 = ax1.twinx()
 
# 在第二个轴上绘制第二组数据
ax2.plot(x, y2, 'r', label='cos(x)')
ax2.set_ylabel('cos(x)')
 
# 添加图例
ax1.legend(loc='upper left')
ax2.legend(loc='upper right')
 
# 显示图像
plt.show()

这段代码将生成一个图像，其中包含两组数据（正弦和余弦），每组数据在不同的Y轴上显示，但共享同一X轴。

2024-08-10

在Ubuntu系统中，您可以使用以下步骤安装mamba，这是一个快速的包管理器，可以替代conda：




wget https://github.com/conda-forge/miniforge/releases/latest/download/Mambaforge-Linux-x86_64.sh
bash Mambaforge-Linux-x86_64.sh

安装完成后，您可以使用mamba来安装bob.learn库：




mamba install bob.learn

请确保您的用户有足够的权限来安装软件包，或者您可能需要在命令前加上sudo。

【Python--网络编程之Ping命令的实现】

2024-08-10

在Python中，可以使用subprocess模块来执行Ping命令，并获取其输出。以下是一个简单的实现示例：




import subprocess
 
def ping(host, count=4):
    """
    Ping the given host for a specified number of times.
    :param host: The host to ping.
    :param count: The number of times to ping.
    :return: A list of ping response times in seconds.
    """
    response_times = []
    cmd = ['ping', '-c', str(count), host]
    try:
        result = subprocess.run(cmd, check=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE, text=True)
        for line in result.stdout.splitlines():
            if 'time=' in line:
                response_times.append(float(line.split('=')[1].split(' ')[0]) / 1000.0)
    except subprocess.CalledProcessError as e:
        print(f"Ping failed: {e}")
    return response_times
 
# 使用示例
host = 'google.com'
response_times = ping(host)
print(f"Ping {host} results:")
for i, time in enumerate(response_times):
    print(f"{i + 1}. {time} seconds")

这段代码定义了一个ping函数，它接受主机名和次数作为参数，并返回一个包含每次ping的响应时间的列表。在使用subprocess.run()时，我们通过-c参数指定了ping的次数，并且通过管道获取了输出。然后，我们解析输出，提取出每次ping的时间，将其转换为秒，并添加到结果列表中。如果ping失败，它会打印错误信息。

Python 实现Excel (XLS 或 XLSX) 与TXT文本格式互转

2024-08-10




import os
import xlrd
import csv
 
# 将XLS/XLSX文件转换为TXT文本文件
def convert_xls_to_txt(input_file, output_file):
    if input_file.endswith('.xls'):
        book = xlrd.open_workbook(input_file, on_demand=True)
        sh = book.sheet_by_index(0)
    elif input_file.endswith('.xlsx'):
        book = xlrd.open_workbook(input_file)
        sh = book.sheet_by_index(0)
    
    with open(output_file, 'w', newline='', encoding='utf-8') as f:
        csw = csv.writer(f, delimiter='\t')
        for row_num in range(sh.nrows):
            csw.writerow(sh.row_values(row_num))
 
# 将TXT文本文件转换为XLSX文件
def convert_txt_to_xlsx(input_file, output_file):
    with open(input_file, 'r', newline='', encoding='utf-8') as f:
        reader = csv.reader(f, delimiter='\t')
        with open(output_file, 'w', newline='', encoding='utf-8') as f:
            writer = csv.writer(f)
            for row in reader:
                writer.writerow(row)
 
# 示例使用
convert_xls_to_txt('example.xlsx', 'example.txt')
convert_txt_to_xlsx('example.txt', 'example.xlsx')

这段代码提供了两个函数convert_xls_to_txt和convert_txt_to_xlsx，分别用于将XLS或XLSX文件转换为TXT文本文件，以及将TXT文本文件转换为XLSX文件。这里使用了xlrd库来读取Excel文件，以及Python的内置csv模块来处理文本文件的读写。这些函数可以直接使用，只需要传入正确的文件路径作为参数。

Python酷库之旅-第三方库Pandas(028)

2024-08-10




import pandas as pd
 
# 创建一个简单的DataFrame
data = {'Name': ['John', 'Anna', 'Peter', 'Linda'],
        'Age': [28, 23, 34, 29],
        'City': ['New York', 'Paris', 'Berlin', 'London']}
df = pd.DataFrame(data)
 
# 打印DataFrame
print(df)
 
# 通过列名获取特定列的数据
age_column = df['Age']
print(age_column)
 
# 使用列的位置索引获取特定列的数据
city_column = df.iloc[:, 2]
print(city_column)
 
# 使用列的位置索引获取多列数据
first_two_columns = df.iloc[:, 0:2]
print(first_two_columns)
 
# 使用列名获取多列数据
name_and_age = df[['Name', 'Age']]
print(name_and_age)
 
# 使用条件筛选数据
adults = df[df['Age'] >= 21]
print(adults)
 
# 对数据进行排序
df_sorted = df.sort_values(by='Age')
print(df_sorted)
 
# 对列名进行排序
df_sorted_columns = df.sort_values(by='Name')
print(df_sorted_columns)

这段代码展示了如何使用Pandas库来创建一个DataFrame，并对其进行各种操作，包括数据筛选、排序等。这是学习Pandas库的一个基本入门示例。

Python安装cv2库方法

2024-08-10

在Python中安装cv2库（即OpenCV库），你可以使用pip管理器。以下是安装cv2库的步骤：

打开终端（在Windows上是命令提示符或PowerShell，在MacOS或Linux上是终端）。
输入以下命令来安装OpenCV：




pip install opencv-python

如果你需要包含OpenCV的额外贡献模块，可以安装opencv-contrib-python：




pip install opencv-contrib-python

安装完成后，你可以通过以下Python代码来验证是否成功安装了cv2库：




import cv2
print(cv2.__version__)

如果没有错误，并且打印出了版本号，则表示cv2库已成功安装。

python绘制柱状图及格式设置（颜色，边框间距，横纵坐标）

2024-08-10




import matplotlib.pyplot as plt
 
# 数据
values = [20, 35, 15, 40]
labels = ['A', 'B', 'C', 'D']
 
# 绘制柱状图
plt.bar(labels, values, color='lightblue', edgecolor='k', linestyle='-')
 
# 格式化设置
plt.title('示例柱状图')
plt.xlabel('类别')
plt.ylabel('数值')
plt.xticks(rotation=45)  # 横坐标标签旋转
plt.yticks(range(0, 50, 10))  # 纵坐标刻度
 
# 显示图例
plt.legend()
 
# 显示图表
plt.show()

这段代码使用了Matplotlib库来绘制一个简单的柱状图，并对其进行了颜色、边框以及横纵坐标的格式化设置。通过这个例子，开发者可以学习到如何使用Python进行数据可视化。