Posted 2025-02-25Updated 2025-02-2516 minutes read (About 2397 words)0 visits

从0开始学习卷积神经网络（三）—— 进阶实例学习

在本系列的上一篇文章中，我们学习了如何训练一个识别数字图片的神经网络，在本篇文章中将会在此基础上学习一些进阶知识。

引入非线性算法

在上一篇文章中，我们的神经网络使用的是线性算法，现在我们在此基础上转化为非线性算法，相关代码如下所示：

# 定义 CNN 模型
class numCNN(nn.Module):
    def __init__(self, input_size=28*28, output_size=10, batch_size=64):
        super(numCNN, self).__init__()
        # 定义一个 (10, 784) 的权重矩阵
        self.weight = nn.Parameter(torch.randn(input_size, output_size) * 0.01)  # (784, 10)
        # 定义一个长度为10的偏置向量
        self.bias = nn.Parameter(torch.randn(output_size) * 0.01)
        self.batch_size = batch_size
        self.batch_size_range = range(batch_size)
        self.real_answer = torch.zeros(self.batch_size, 10, device=device)

    def forward(self, x):
        return torch.matmul(x, self.weight) + self.bias

    # 梯度计算
    def grad(self, inputs, outputs, label):
        # 根据lable，生成真实结果
        # (64, 10)
        real_answer =  self.real_answer.clone()
        real_answer[self.batch_size_range, label] = 1  # one-hot 编码
        # 计算误差
        deltas = outputs - real_answer
        # 计算权重梯度（批量版外积）
        weight_deltas = torch.matmul(deltas.t(), inputs) / self.batch_size  # (10, 64) @ (64, 784) -> (10, 784)
        # 计算bias梯度
        bias_deltas = deltas.sum(dim=[0]) / self.batch_size
        return weight_deltas, bias_deltas

通过上面的代码可以看到，我们的改动只是增加了一个偏置向量，公式变为了：

$$
(fg)(n) = \sum_{m=-\infty}^{\infty} f(m) g(n - m) + B
$$

训练代码也需要做一定的修改，如下所示：

epochs = 100
total_time = 0
for x in range(epochs):
    start_time = time.time()
    total_loss = torch.tensor(0.0, device=device)
    for image, labels in train_loader:
        image, labels = image.to(device), labels.to(device)
        # 前向传播
        conv_output = model(image)
        # 使用激活函数处理输出
        probs = softmax(conv_output)
        # 计算损失
        loss = model.lossFunction(probs, labels)
        total_loss += loss
        # 更新权重梯度
        weight_deltas, bias_deltas = model.grad(image, conv_output, labels)
        # 手动更新权重和偏置
        with torch.no_grad():
            model.weight -= alpha * weight_deltas.t()
            model.bias -= alpha * bias_deltas
    end_time = time.time()
    total_time += end_time - start_time
    print(f"训练步数：{x}, 损失值：{total_loss.item()}，运行时间：{end_time - start_time}")
print(f"训练结束，计算花费时间：{total_time}")

在batch_size=600， alpha=0.001的条件下训练100轮，评估效果：

1 2	$ python3 train_image_pytorch.py test1 评估结果，成功率为：8579/10000

和上一篇文章中的成功率相比，稍微提高了一些，再加上测试数据少，所以提高的感觉不明显。接下来我们继续优化代码。

完整替换为pytorch架构代码

虽然上一章节中，我们已经把代码从numpy架构替换成了pytorch架构，但是计算损失值，计算梯度的代码仍然算我们自己实现的，现在我们在算法不变的情况下，把代码替换成pytorch架构。

首先是我们的forward函数，该函数中的计算过程在pytorch代码中被称作全连接层，全连接层的计算过程跟我们的forward函数一样，都是：输出 = 输入 × 权重矩阵 + 偏置。

相关代码如下所示：

import torch.nn as nn
fc = nn.Linear(28 * 28, 10)
>>> fc.weight
Parameter containing:
tensor([[-0.0138, -0.0058,  0.0323,  ...,  0.0240, -0.0188,  0.0172],
        [-0.0221,  0.0310,  0.0090,  ..., -0.0289, -0.0148,  0.0339],
        [ 0.0257, -0.0203, -0.0011,  ...,  0.0126, -0.0231, -0.0328],
        ...,
        [-0.0206, -0.0077, -0.0270,  ..., -0.0342, -0.0140, -0.0112],
        [ 0.0243,  0.0005,  0.0141,  ..., -0.0222,  0.0346,  0.0075],
        [ 0.0077, -0.0236,  0.0141,  ..., -0.0232,  0.0020,  0.0268]],
       requires_grad=True)
>>> fc.weight.shape
torch.Size([10, 784])
>>> fc.bias
Parameter containing:
tensor([-0.0242, -0.0105,  0.0205, -0.0097,  0.0026, -0.0125,  0.0080, -0.0204,
         0.0053,  0.0343], requires_grad=True)
>>> fc.bias.shape
torch.Size([10])

修改后的神经网络代码如下所示：

# 定义 CNN 模型
class numCNN(nn.Module):
    def __init__(self, input_size=28*28, output_size=10):
        super(numCNN, self).__init__()
        # 定义一个全连接层
        self.fc = nn.Linear(input_size, output_size)

    def forward(self, x):
        return self.fc(x)
    
    def save(self):
        torch.save(self.state_dict(), saveFile)
        print(f"训练结果已保存到 '{saveFile}'")

下一步，使用pytorch提供的nn.CrossEntropyLoss()函数来计算损失值，并且使用optim.Adam来更新权重梯度和偏置向量梯度，相关代码如下所示：

import torch.nn as nn
import torch.optim as optim

criterion = nn.CrossEntropyLoss()  # 交叉熵损失
'''
model.parameters()值为神经网络中能进行学习的参数，在本例中为fc.weight和fc.bias
'''
optimizer = optim.Adam(model.parameters(), lr=alpha)  # Adam 优化器

epochs = 100
for x in range(epochs):
    total_loss = torch.tensor(0.0, device=device)
    for image, labels in train_loader:
        image, labels = image.to(device), labels.to(device)
        # 前向传播
        conv_output = model(image)
        # 计算损失
        loss = criterion(conv_output, labels)
        total_loss += loss
        # 更新权重梯度
        loss.backward()
        optimizer.step()

这么一看，是不是代码简化了很多？完整的训练代码如下所示：

def train(new=0):
    batch_size = 60000 // 100
    model = numCNN().to(device)

    if not new:
        model.load_state_dict(torch.load(saveFile))

    train_dataset = MNISTDataset(train_datasets)
    train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, pin_memory=True)
    
    criterion = nn.CrossEntropyLoss()  # 交叉熵损失
    optimizer = optim.Adam(model.parameters(), lr=alpha)  # Adam 优化器

    epochs = 100
    total_time = 0
    for x in range(epochs):
        start_time = time.time()
        total_loss = torch.tensor(0.0, device=device)
        # device_time = 0
        for image, labels in train_loader:
            image, labels = image.to(device), labels.to(device)
            # 清空之前的梯度
            optimizer.zero_grad()
            # 前向传播
            conv_output = model(image) 
            # 计算损失
            loss = criterion(conv_output, labels)
            total_loss += loss
            # 更新权重梯度
            # 首先通过损失值计算所有参数的梯度
            loss.backward()
            # 根据梯度更新全连接层权重和偏置向量
            optimizer.step()
        end_time = time.time()
        total_time += end_time - start_time
        print(f"训练步数：{x}, 损失值：{total_loss.item()}，运行时间：{end_time - start_time}")
    print(f"训练结束，计算花费时间：{total_time}")
    model.save()

由于Adam优化器的算法优于我之前自己写的梯度计算代码，所以按照上述修改后，成功率提升到：评估结果，成功率为：9210/10000。

继续优化算法

上面的代码修改完成以后，成功率的上限基本已经定死了，可以通过修改alpha值还有batch_size值来加快到达上限的时间。如果想要增加成功率上限，我们只能考虑继续优化算法。

还记得在本系列的第一篇文章中对于图片卷积的介绍吗？对于图片的卷积本质上就是提取出图片的特征，那么就产生了这么一种算法：先提取出图片特征，再根据图片特征算出10个数字的概率。修改后的代码如下所示：

class numCNN(nn.Module):
    def __init__(self, input_size=28*28, output_size=10):
        super(numCNN, self).__init__()
        # 定义一个二维卷积层
        self.conv1 = nn.Conv2d(1, 1, 3, padding=1)
        # 定义一个全连接层
        self.fc = nn.Linear(input_size, output_size)
        # 激活函数，修正线性单元，数学公式为max(0, x)
        self.relu = nn.ReLU()

    def forward(self, x):
        # 卷积层：首先进行卷积计算
        c = self.conv1(x)
        # max(0, c)
        t = self.relu(c)
        # 把(batch_size, 1, 28, 28)展平成(batch_size, 28 * 28)
        t = t.view(-1, 28 * 28)
        # 全连接层：计算出10个数字的概率
        return self.fc(t)

理解更复杂的算法

学到这里，如果把前文的知识都理解清楚了，再看网上出现的一些训练代码那么就非常容易理解了，比如可以在网上很多地方搜到使用pytorch实现的卷积神经网络算法识别手写数字，关键代码如下所示：

# 定义 CNN 模型
class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(1, 32, kernel_size=3, padding=1)
        self.conv2 = nn.Conv2d(32, 64, kernel_size=3, padding=1)
        self.pool = nn.MaxPool2d(2, 2)
        self.fc1 = nn.Linear(64 * 7 * 7, 128)
        self.fc2 = nn.Linear(128, 10)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.pool(self.relu(self.conv1(x)))
        x = self.pool(self.relu(self.conv2(x)))
        x = x.view(-1, 64 * 7 * 7)
        x = self.relu(self.fc1(x))
        x = self.fc2(x)
        return x

# 模型初始化
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
model = CNN().to(device)

# 定义损失函数和优化器
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

# 训练模型
epochs = 10
for epoch in range(epochs):
    model.train()
    running_loss = 0.0
    for images, labels in train_loader:
        images, labels = images.to(device), labels.to(device)
        
        optimizer.zero_grad()
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()
    print(f"Epoch [{epoch+1}/{epochs}], Loss: {running_loss/len(train_loader):.4f}")

在上面的代码中，神经网络算法使用了两个卷积层，一个池化层，两个全连接层。

第一步：x = self.pool(self.relu(self.conv1(x)))

对输入的图像进行卷积计算，得到32个特征图，输出的结果形状为：(batch_size, 32, 28, 28)，然后经过池化，特征图被缩小一倍，最终的x形状为：(batch_size, 32, 14, 14)。

第二步：x = self.pool(self.relu(self.conv2(x)))

对第一步输出的结果再次通过卷积提取特征，得到64个特征图，输出结果的形状为：(batch_size, 64, 14, 14)，然后经过池化，特征图再次被缩小一倍，最终x的形状为：(batch_size, 64, 7, 7)。

第三步：x = x.view(-1, 64 * 7 * 7)

把x的形状从(batch_size, 64, 7, 7)展平成(batch_size, 64 * 7 * 7)。

第四步：x = self.relu(self.fc1(x))

第三步的输出经过全连接层计算，输出128个结果。

最后一步：x = self.fc2(x)

把第四步输出的128个值传递给第二个全连接层，计算输出最终的10个数字的概率。

经过以上算法修改后，值需要训练10轮，成功率提升到了：评估结果，成功率为：9905/10000。

总结

学到这，该案例的内容就讲解完了。也许你还会有很多疑惑，比如最后的算法，为什么使用的两次卷积层，两次全连接层呢？我觉得这块应该由专业学算法的同学去研究，身为外行的我只需要知道有这个算法能更好的处理图片识别任务，然后可以改哪些参数增加速度，增加准确率，减小显存占用就行了。毕竟精力有限，没法面面俱到。

从0开始学习卷积神经网络（三）—— 进阶实例学习

https://nobb.site/2025/02/25/0x91/

Author

Hcamael

Posted on

2025-02-25

Updated on

2025-02-25

Licensed under

#AI

从0开始学习卷积神经网络（三）—— 进阶实例学习

引入非线性算法

完整替换为pytorch架构代码

继续优化算法

理解更复杂的算法

总结

Author

Posted on

Updated on

Licensed under

Like this article? Support the author with

Recents

Tags

Links