代码之家 › 专栏 › 技术社区 › Venkat

将LSTM中的Tanh激活更改为ReLU

pytorch lstm

Venkat · 技术社区 · 7 年前

LSTM类中的默认非线性激活函数为tanh。我希望在我的项目中使用ReLU。浏览文档和其他资源,我找不到一种简单的方法来实现这一点。我能找到的唯一方法是定义自己的自定义LSTMCell,但是 here 作者表示,定制的LSTMCells不支持GPU加速功能(或者自文章发表以来,这种功能是否发生了变化?)。我需要使用CUDA来加速我的训练。任何帮助都将不胜感激。

1 回复 | 直到 7 年前

Ioannis Nasios 7 年前

自定义LSTMcell不支持GPU加速功能 -这句话可能意味着如果使用LSTMCells,GPU的加速能力将受到限制。当然,您可以编写自己的LSTM实现,但需要牺牲运行时。

例如,有一次我实现了一个LSTM(基于线性层),如下所示,它通常比 LSTM (Pytork中提供)当用作深层神经模型的一部分时。

class LSTMCell(nn.Module):
    def __init__(self, input_size, hidden_size, nlayers, dropout):
        """"Constructor of the class"""
        super(LSTMCell, self).__init__()

        self.nlayers = nlayers
        self.dropout = nn.Dropout(p=dropout)

        ih, hh = [], []
        for i in range(nlayers):
            ih.append(nn.Linear(input_size, 4 * hidden_size))
            hh.append(nn.Linear(hidden_size, 4 * hidden_size))
        self.w_ih = nn.ModuleList(ih)
        self.w_hh = nn.ModuleList(hh)

    def forward(self, input, hidden):
        """"Defines the forward computation of the LSTMCell"""
        hy, cy = [], []
        for i in range(self.nlayers):
            hx, cx = hidden[0][i], hidden[1][i]
            gates = self.w_ih[i](input) + self.w_hh[i](hx)
            i_gate, f_gate, c_gate, o_gate = gates.chunk(4, 1)

            i_gate = F.sigmoid(i_gate)
            f_gate = F.sigmoid(f_gate)
            c_gate = F.tanh(c_gate)
            o_gate = F.sigmoid(o_gate)

            ncx = (f_gate * cx) + (i_gate * c_gate)
            nhx = o_gate * F.tanh(ncx)
            cy.append(ncx)
            hy.append(nhx)
            input = self.dropout(nhx)

        hy, cy = torch.stack(hy, 0), torch.stack(cy, 0)
        return hy, cy

我很高兴知道LSTM自定义实现的运行时是否可以改进!

推荐文章

Seán Healy · LSTM或变压器模型是否有任何可逆实现?

1 年前

saks · LSTM-DQNAgent执行股票预测时的输入形状和尺寸兼容性问题

1 年前

leone · ValueError:输入数据应为非空

1 年前

Tháº¯ng Ngô Äá»©c · 为什么我的LSTM预测不正确

1 年前

Python · 如何训练LSTM架构来预测数字序列?

1 年前

Sadeq Al-Ahdal · 为什么预测长度是该LSTM kears模型预期长度的两倍?

1 年前

MattCW · Tensorflow。ValueError:形状(3,4)的秩必须至少为3

1 年前

Daniel Navarro · TensorFlow训练问题:无法将NumPy数组转换为Tensor(不支持的对象类型NumPy.ndarray)

1 年前

GENERALE · 协议预测实现中的LSTM/GRU模型准确性问题

1 年前

Ai4l2s · 在return_sequences=TRUE的中间时间步长上,LSTM精度较差

1 年前