基于LIDC-IDRI肺结节肺癌数据集的放射组学机器学习分类良性和恶性肺癌(Python 全代码)全流程解析
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from scipy.io import loadmat
# 加载数据
data = loadmat('LIDC-IDRI_data.mat')
features = data['X']
labels = data['y'].ravel()
# 划分训练集和测试集
train_features, test_features, train_labels, test_labels = train_test_split(
features, labels, test_size=0.2, random_state=42)
# 特征缩放
scaler = StandardScaler()
train_features_scaled = scaler.fit_transform(train_features)
test_features_scaled = scaler.transform(test_features)
# 创建随机森林分类器
rf_classifier = RandomForestClassifier(n_estimators=100, random_state=42)
# 训练模型
rf_classifier.fit(train_features_scaled, train_labels)
# 预测测试集
predictions = rf_classifier.predict(test_features_scaled)
# 评估模型
accuracy = accuracy_score(test_labels, predictions)
print(f'Model Accuracy: {accuracy}')
这段代码首先加载了LIDC-IDRI肺结节数据集,然后使用train_test_split
函数划分数据集为训练集和测试集。接着,使用StandardScaler
对特征进行缩放。随后创建了一个随机森林分类器,并用训练集数据训练模型。最后,用测试集数据评估模型,并打印出模型的准确率。这个过程是机器学习中一个标准的数据处理和模型评估流程。
评论已关闭