深度学习中的图片分类:ResNet 模型详解及代码实现
深度学习已经成为图像分类任务中的主流技术,而ResNet(Residual Network)作为一种突破深度学习瓶颈的经典模型,在多个图像任务中展现了卓越的性能。本文将通过理论与实践结合的方式,深入解析ResNet模型的原理、结构特点,并提供从零实现ResNet的Python代码示例,帮助你快速掌握这项技术。
1. ResNet简介
1.1 什么是ResNet?
ResNet由何恺明等人在2015年提出,解决了深层神经网络训练时常见的梯度消失和梯度爆炸问题。ResNet的核心思想是引入残差块(Residual Block),让网络学习残差(Residual),而不是直接拟合目标输出。
残差学习公式
其中:
- ( F(x) ):残差函数(网络层的输出)。
- ( x ):输入直接跳跃连接(shortcut connection)。
1.2 ResNet的优点
- 解决退化问题:深度网络容易出现退化,ResNet通过引入跳跃连接解决了这一问题。
- 易于优化:浅层网络的表现可以通过残差块直接传播到深层。
- 灵活性:适用于图像分类、目标检测等多种任务。
2. ResNet的网络结构
ResNet由多个残差块堆叠而成,不同版本具有不同的深度:
- ResNet-18:包含18个卷积层。
- ResNet-34:包含34个卷积层。
- ResNet-50/101/152:通过Bottleneck Block扩展深度。
2.1 残差块结构
基本残差块(ResNet-18/34)
其中:
- ( F(x) ):两个卷积层 + BatchNorm + ReLU。
瓶颈残差块(ResNet-50/101/152)
为了减少计算量,瓶颈结构采用了( 1\times1 )卷积进行降维:
3. ResNet的代码实现
以下代码展示如何实现ResNet模型,从基础残差块到完整网络。
3.1 导入必要库
import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision import datasets, transforms
from torch.utils.data import DataLoader
3.2 残差块实现
基本残差块:
class BasicBlock(nn.Module):
expansion = 1 # 输出维度不变
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super(BasicBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=1, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
self.downsample = downsample
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = F.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out += identity
out = F.relu(out)
return out
瓶颈残差块:
class BottleneckBlock(nn.Module):
expansion = 4 # 输出维度扩大4倍
def __init__(self, in_channels, out_channels, stride=1, downsample=None):
super(BottleneckBlock, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=1, bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, stride=stride, padding=1, bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
self.conv3 = nn.Conv2d(out_channels, out_channels * 4, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(out_channels * 4)
self.downsample = downsample
def forward(self, x):
identity = x
if self.downsample is not None:
identity = self.downsample(x)
out = self.conv1(x)
out = self.bn1(out)
out = F.relu(out)
out = self.conv2(out)
out = self.bn2(out)
out = F.relu(out)
out = self.conv3(out)
out = self.bn3(out)
out += identity
out = F.relu(out)
return out
3.3 ResNet模型实现
class ResNet(nn.Module):
def __init__(self, block, layers, num_classes=1000):
super(ResNet, self).__init__()
self.in_channels = 64
self.conv1 = nn.Conv2d(3, 64, kernel_size=7, stride=2, padding=3, bias=False)
self.bn1 = nn.BatchNorm2d(64)
self.relu = nn.ReLU(inplace=True)
self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
# ResNet层
self.layer1 = self._make_layer(block, 64, layers[0])
self.layer2 = self._make_layer(block, 128, layers[1], stride=2)
self.layer3 = self._make_layer(block, 256, layers[2], stride=2)
self.layer4 = self._make_layer(block, 512, layers[3], stride=2)
# 分类器
self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
self.fc = nn.Linear(512 * block.expansion, num_classes)
def _make_layer(self, block, out_channels, blocks, stride=1):
downsample = None
if stride != 1 or self.in_channels != out_channels * block.expansion:
downsample = nn.Sequential(
nn.Conv2d(self.in_channels, out_channels * block.expansion, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(out_channels * block.expansion),
)
layers = [block(self.in_channels, out_channels, stride, downsample)]
self.in_channels = out_channels * block.expansion
for _ in range(1, blocks):
layers.append(block(self.in_channels, out_channels))
return nn.Sequential(*layers)
def forward(self, x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.layer1(x)
x = self.layer2(x)
x = self.layer3(x)
x = self.layer4(x)
x = self.avgpool(x)
x = torch.flatten(x, 1)
x = self.fc(x)
return x
3.4 创建ResNet实例
def resnet18():
return ResNet(BasicBlock, [2, 2, 2, 2])
def resnet50():
return ResNet(BottleneckBlock, [3, 4, 6, 3])
# 创建模型
model = resnet18()
4. 模型训练与评估
4.1 数据预处理
transform = transforms.Compose([
transforms.Resize((224, 224)),
transforms.ToTensor(),
transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])
])
train_dataset = datasets.CIFAR10(root='./data', train=True, transform=transform, download=True)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_dataset = datasets.CIFAR10(root='./data', train=False, transform=transform, download=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)
4.2 训练模型
import torch.optim as optim
# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
# 训练循环
num_epochs = 10
for epoch in range(num_epochs):
model.train()
for images, labels in train_loader:
outputs = model(images)
loss = criterion(outputs, labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
print(f"Epoch [{epoch+1}/{num_epochs}], Loss: {loss.item():.4f}")
4.3 模型评估
model.eval()
correct = 0
total = 0
with torch.no_grad():
for images, labels in test_loader:
outputs = model(images)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print(f"Accuracy: {100 * correct / total:.2f}%")
5. 总结
本文详细介绍了ResNet模型的结构与原理,并通过Python代码演示了如何从零实现ResNet,完成图像分类任务。ResNet的核心在于残差块的引入,这一创新设计不仅解决了深层网络的优化问题,还显著提升了模型性能。
通过本文的学习,你可以掌握如何使用ResNet进行图像分类,并扩展到其他深度学习任务中,探索其更多应用可能性!