https://github.com/kuangliu/pytorch-cifar/blob/master/models/resnet.py
从阅读
https://www.cs.toronto.edu/~kriz/cifar.html
我对代码的理解:
self.conv1 = nn.Conv2d(3, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.conv1 = nn.Conv2d(3, 6, 5) # 3 channels in, 6 channels out , kernel size of 5
self.conv2 = nn.Conv2d(6, 16, 5) # 6 channels in, 16 channels out , kernel size of 5
self.fc1 = nn.Linear(16*5*5, 120) # 16*5*5 in features , 120 ouot feature
从resnet.py文件以下内容:
self.fc1 = nn.Linear(16*5*5, 120)
从
http://cs231n.github.io/convolutional-networks/
声明如下:
总结。总之,Conv层:
W2=(W1F+2P)/S+1 H2=(H1F+2P)/S+1(即计算宽度和高度
等对称)D2=K,参数共享,引入FFD1
每个过滤器的权重,总共(FFD1)K个权重和K个偏差。在
输出体积,第d个深度切片(大小为W2H2)就是结果
步幅为S的音量,然后用第d个偏移量进行偏移。
由此,我试图理解如何将训练图像尺寸32x32(1024像素)转换为特征映射(16*5*5->400),作为
nn.Linear(16*5*5, 120)
https://pytorch.org/docs/stable/nn.html#torch.nn.Conv2d
可以看到默认步幅为1,填充为0。
从32*32的图像尺寸得到16*5*5的步骤是什么?16*5*5可以从上面的步骤得到吗?
spatial extent
?
更新:
'''LeNet in PyTorch.'''
import torch.nn as nn
import torch.nn.functional as F
class LeNet(nn.Module):
def __init__(self):
super(LeNet, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16*5*5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
out = F.relu(self.conv1(x))
out = F.max_pool2d(out, 2)
out = F.relu(self.conv2(out))
out = F.max_pool2d(out, 2)
out = out.view(out.size(0), -1)
out = F.relu(self.fc1(out))
out = F.relu(self.fc2(out))
out = self.fc3(out)
return out
取自
https://github.com/kuangliu/pytorch-cifar/blob/master/models/lenet.py
我的理解是卷积运算应用于每个内核的图像数据。因此,如果设置了5个核,那么对数据应用5个卷积,从而生成5维图像表示。