run哥带你手撕RNN -- baseRNN=>LSTM=>GRU的进化过程与使用技巧

Posted on 2022-10-07 Edited on 2023-03-28 In ML

wuhu

RNN

正常的神经网络

我们印象中的最基础的神经网络是一层层向前迭代，再backward算loss

why RNN？

并没有考虑到时间上的联系，不能(在我看来，不知道是否有些绝对)处理时间序列类型的数据（没有找他们之间联系的操作）。

但在处理如翻译的任务中，我们往往需要将前后的输入进行关联，来得到更好地答案。

那么 RNN 如何解决这个问题？

如果让你来设计，你会怎么设计？

我们都知道，在神经网络中传递的其实是W，也就是权重。如果能让这些输入传递的权重相互产生影响，是不是就可以将他们关联起来。

隐藏层

RNN 在神经网络中加上了一层隐藏层。用来关联各个input。

而RNN，LSTM，GRU等方法不同点也在于隐藏层的计算公式不同

RNN , LSTM ，GRU 隐藏层计算公式的不同

RNN

最基础的实现

正常情况下直接一个 XW + b 就过去了但这里可以看到还加了上个隐藏神经元的信息

具体实现可以看源码

LSTM

面试题：**说说LSTM中的各个门 **(遗忘门，输入门，输出门)

解决了RNN 无法学到序列中蕴含的间隔较长的规律 的问题

开始拆解！！！

其核心部分为下图中的 cell state，从上图可以看出：它沿着整条链运行，而且没有复杂的操作，也就是说，给了信息沿着这条线一直传下去的机会。这里没有复杂的操作(我自己觉得不复杂，所以这么说)，有添加和删除信息到cell state中两种常规操作。

我们继续拆解

第一步：

可以看到这块是和上面 x 连接起来的，他由一个 sigmoid 和 pointwise multiplication operation 组成。

这一块决定我们要从cell state 中丢弃哪些信息。

第二步：

决定储存哪些信息

可以看看到这块又分为两个部分一部分直接sigmoid(叫 input gate layer) ，决定我们要更新什么。另一部分用 tanh 来建立一个候选值向量（a vector of new candidate values）。

我们接下来要把这个值给更新进去，以得到新的 Ct

第三步：

决定我们要输出什么。sigmoid决定输出cell state的哪些部分，tanh ([-1,1]) * sigmid 决定最后的输出

GRU

pytorch中使用方法

RNN

class RNNModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
        super(RNNModel, self).__init__()

        # Defining the number of layers and the nodes in each layer
        self.hidden_dim = hidden_dim
        self.layer_dim = layer_dim

        # RNN layers
        self.rnn = nn.RNN(
            input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
        )
        # Fully connected layer
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        # Initializing hidden state for first input with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # Forward propagation by passing in the input and hidden state into the model
        out, h0 = self.rnn(x, h0.detach())

        # Reshaping the outputs in the shape of (batch_size, seq_length, hidden_size)
        # so that it can fit into the fully connected layer
        out = out[:, -1, :]

        # Convert the final state to our desired output shape (batch_size, output_dim)
        out = self.fc(out)
        return out

GRU

class GRUModel(nn.Module):
    def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
        super(GRUModel, self).__init__()

        # Defining the number of layers and the nodes in each layer
        self.layer_dim = layer_dim
        self.hidden_dim = hidden_dim

        # GRU layers
        self.gru = nn.GRU(
            input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
        )

        # Fully connected layer
        self.fc = nn.Linear(hidden_dim, output_dim)

    def forward(self, x):
        # Initializing hidden state for first input with zeros
        h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

        # Forward propagation by passing in the input and hidden state into the model
        out, _ = self.gru(x, h0.detach())

        # Reshaping the outputs in the shape of (batch_size, seq_length, hidden_size)
        # so that it can fit into the fully connected layer
        out = out[:, -1, :]

        # Convert the final state to our desired output shape (batch_size, output_dim)
        out = self.fc(out)

        return out

LSTM

import torch
import torch.nn as nn


class LSTM_NN(nn.Module):
    """
    This is an implementation of "Long short-term memory neural network for traffic speed
    prediction using remote microwave sensor data"
    """
    def __init__(self):
        super(LSTM_NN, self).__init__()
        self.n_hid = 256
        self.lstm = nn.LSTM(input_size=1, hidden_size=self.n_hid, num_layers=1, batch_first=True)
        self.output = nn.Linear(self.n_hid, 1)

    def forward(self, x):
        _, n_time, n_node, n_feat = x.shape
        x = x.transpose(1, 2)  # [B, T, N, F] -> [B, N, T, F]
        x = x.reshape(-1, n_time, n_feat)  # [B, N, T, F] -> [B * N, T, F]
        _, (hn, _) = self.lstm(x)


        hn = hn.reshape(1, -1, n_node, self.n_hid).transpose(0, 1)
        return self.output(hn).contiguous()  # Output ~ [B, N, T=1]

双向 RNN

reference

https://towardsdatascience.com/building-rnn-lstm-and-gru-for-time-series-using-pytorch-a46e5b094e7b

Understanding LSTM Networks – colah’s blog