Fashion-MNIST CNN 实战：LeNet-5 架构实现 10 个 Epoch 达到 89.2% 准确率-拓冰建站

Fashion-MNIST图像分类实战：基于LeNet-5架构的89.2%准确率实现

当谈到计算机视觉的入门项目时，Fashion-MNIST数据集已经成为新一代的"Hello World"。这个包含70,000张时尚单品灰度图像的数据集，不仅继承了经典MNIST的简洁格式（28x28像素，10个类别），更以其丰富的视觉特征和实际应用价值，成为测试卷积神经网络性能的理想选择。本文将带您从零开始，使用TensorFlow/Keras实现LeNet-5架构，在仅10个训练周期内达到89.2%的测试准确率。

1. 环境准备与数据加载

在开始构建模型前，我们需要准备好开发环境并加载数据集。确保已安装Python 3.7+和TensorFlow 2.x版本。Fashion-MNIST作为Keras内置数据集，加载过程异常简单：

import tensorflow as tf from tensorflow import keras import numpy as np import matplotlib.pyplot as plt # 加载数据集 fashion_mnist = keras.datasets.fashion_mnist (train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data() # 定义类别名称 class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']

数据集已自动划分为60,000张训练图像和10,000张测试图像。让我们快速查看数据形态：

print(f"训练集形状: {train_images.shape}") # (60000, 28, 28) print(f"测试集形状: {test_images.shape}") # (10000, 28, 28)

数据可视化是理解数据集的重要步骤。以下代码展示训练集中的前25张图像：

plt.figure(figsize=(10,10)) for i in range(25): plt.subplot(5,5,i+1) plt.xticks([]) plt.yticks([]) plt.grid(False) plt.imshow(train_images[i], cmap=plt.cm.binary) plt.xlabel(class_names[train_labels[i]]) plt.show()

2. 数据预处理与归一化

原始图像的像素值范围是0-255，我们需要将其归一化到0-1之间，这对神经网络的训练至关重要：

# 归一化像素值 train_images = train_images / 255.0 test_images = test_images / 255.0

对于卷积神经网络，我们还需要调整数据维度，添加通道信息（虽然Fashion-MNIST是灰度图像，但仍需明确通道数为1）：

# 为CNN调整输入形状 train_images = train_images.reshape((60000, 28, 28, 1)) test_images = test_images.reshape((10000, 28, 28, 1))

3. LeNet-5架构实现

LeNet-5是由Yann LeCun在1998年提出的经典CNN架构，虽然简单但在小图像分类任务上仍有出色表现。以下是我们的实现：

def build_lenet5(input_shape=(28, 28, 1), num_classes=10): model = keras.Sequential([ # 第一卷积层：6个5x5卷积核，使用ReLU激活 keras.layers.Conv2D(6, (5, 5), activation='relu', input_shape=input_shape, padding='same'), # 平均池化层 keras.layers.AveragePooling2D((2, 2)), # 第二卷积层：16个5x5卷积核 keras.layers.Conv2D(16, (5, 5), activation='relu'), keras.layers.AveragePooling2D((2, 2)), # 展平层 keras.layers.Flatten(), # 全连接层 keras.layers.Dense(120, activation='tanh'), keras.layers.Dense(84, activation='tanh'), # 输出层 keras.layers.Dense(num_classes, activation='softmax') ]) return model model = build_lenet5()

让我们查看模型架构摘要：

model.summary()

输出将显示各层参数数量，总参数量约为44,000个。虽然与现代架构相比很小，但对于Fashion-MNIST已经足够。

4. 模型训练与超参数调优

编译模型时，我们选择Adam优化器和稀疏分类交叉熵损失函数：

model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

关键训练参数：

批量大小：256
训练周期：10
验证集比例：20%（从训练集划分）

history = model.fit(train_images, train_labels, epochs=10, batch_size=256, validation_split=0.2)

训练过程中，我们可以监控损失和准确率的变化：

def plot_training_history(history): plt.figure(figsize=(12, 4)) # 准确率曲线 plt.subplot(1, 2, 1) plt.plot(history.history['accuracy'], label='训练准确率') plt.plot(history.history['val_accuracy'], label='验证准确率') plt.title('训练与验证准确率') plt.xlabel('周期') plt.ylabel('准确率') plt.legend() # 损失曲线 plt.subplot(1, 2, 2) plt.plot(history.history['loss'], label='训练损失') plt.plot(history.history['val_loss'], label='验证损失') plt.title('训练与验证损失') plt.xlabel('周期') plt.ylabel('损失') plt.legend() plt.tight_layout() plt.show() plot_training_history(history)

5. 模型评估与结果分析

在测试集上评估模型性能：

test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2) print(f"\n测试准确率: {test_acc:.4f}")

典型输出结果：

测试准确率: 0.8924

混淆矩阵能更详细展示模型在各个类别上的表现：

from sklearn.metrics import confusion_matrix import seaborn as sns # 生成预测结果 predictions = model.predict(test_images) pred_labels = np.argmax(predictions, axis=1) # 绘制混淆矩阵 cm = confusion_matrix(test_labels, pred_labels) plt.figure(figsize=(10,8)) sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=class_names, yticklabels=class_names) plt.xlabel('预测标签') plt.ylabel('真实标签') plt.title('混淆矩阵') plt.show()

从混淆矩阵中，我们通常会发现模型在"Shirt"类上表现较差（常与"T-shirt/top"、"Pullover"混淆），因为这些类别视觉上确实相似。

6. 性能优化技巧

要达到更高的准确率，可以考虑以下优化策略：

6.1 数据增强

通过旋转、平移等变换增加训练数据多样性：

from tensorflow.keras.preprocessing.image import ImageDataGenerator datagen = ImageDataGenerator( rotation_range=10, width_shift_range=0.1, height_shift_range=0.1, zoom_range=0.1 ) # 使用增强数据重新训练 model.fit(datagen.flow(train_images, train_labels, batch_size=256), epochs=15, validation_data=(test_images, test_labels))

6.2 架构改进

现代CNN常用的改进包括：

使用ReLU替代tanh激活函数
添加Batch Normalization层
增加网络深度
使用Dropout防止过拟合

改进后的架构示例：

def build_improved_cnn(): model = keras.Sequential([ keras.layers.Conv2D(32, (3, 3), activation='relu', input_shape=(28, 28, 1), padding='same'), keras.layers.BatchNormalization(), keras.layers.MaxPooling2D((2, 2)), keras.layers.Dropout(0.25), keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same'), keras.layers.BatchNormalization(), keras.layers.MaxPooling2D((2, 2)), keras.layers.Dropout(0.25), keras.layers.Flatten(), keras.layers.Dense(128, activation='relu'), keras.layers.BatchNormalization(), keras.layers.Dropout(0.5), keras.layers.Dense(10, activation='softmax') ]) return model

6.3 学习率调度

动态调整学习率可以提升模型性能：

initial_learning_rate = 0.001 lr_schedule = keras.optimizers.schedules.ExponentialDecay( initial_learning_rate, decay_steps=1000, decay_rate=0.9, staircase=True) optimizer = keras.optimizers.Adam(learning_rate=lr_schedule)

7. 模型部署与应用

训练完成后，我们可以保存模型供后续使用：

model.save('fashion_mnist_cnn.h5')

加载模型进行单张图像预测：

def predict_single_image(model, image): """预测单张图像类别""" if image.ndim == 2: # 如果是灰度图像 image = image.reshape(1, 28, 28, 1) image = image / 255.0 # 归一化 prediction = model.predict(image) return np.argmax(prediction) # 示例：预测测试集第一张图像 sample_image = test_images[0] predicted_label = predict_single_image(model, sample_image) print(f"预测类别: {class_names[predicted_label]}") print(f"真实类别: {class_names[test_labels[0]]}")

在实际应用中，您可以将模型集成到Web应用或移动APP中，实现实时时尚单品分类。对于生产环境，建议将模型转换为TensorFlow Lite格式以优化移动端性能：

converter = tf.lite.TFLiteConverter.from_keras_model(model) tflite_model = converter.convert() with open('fashion_mnist_cnn.tflite', 'wb') as f: f.write(tflite_model)