run哥带你手撕RNN -- baseRNN=>LSTM=>GRU的进化过程与使用技巧

wuhu

RNN

正常的神经网络

前馈神经网络的输入层、隐藏层和输出层的可视图

我们印象中的最基础的神经网络是一层层向前迭代,再backward算loss

why RNN?

并没有考虑到时间上的联系,不能(在我看来,不知道是否有些绝对)处理时间序列类型的数据(没有找他们之间联系的操作)。

但在处理如翻译的任务中,我们往往需要将前后的输入进行关联,来得到更好地答案。

那么 RNN 如何解决这个问题?

如果让你来设计,你会怎么设计?

我们都知道,在神经网络中传递的其实是W,也就是权重。如果能让这些输入传递的权重相互产生影响,是不是就可以将他们关联起来。

隐藏层

img

RNN 在神经网络中加上了一层隐藏层。用来关联各个input。

而RNN,LSTM,GRU等方法不同点也在于隐藏层的计算公式不同

RNN , LSTM ,GRU 隐藏层计算公式的不同

RNN

image-20230327171030286

image-20230327164622936

最基础的实现

正常情况下直接一个 XW + b 就过去了 但这里可以看到还加了上个隐藏神经元的信息

具体实现可以看源码

LSTM

面试题:**说说LSTM中的各个门 **(遗忘门,输入门,输出门)

解决了RNN 无法学到序列中蕴含的间隔较长的规律 的问题

开始拆解!!!

image-20230327171055340

其核心部分为 下图中的 cell state,从上图可以看出:它沿着整条链运行,而且没有复杂的操作,也就是说, 给了信息沿着这条线一直传下去的机会。这里没有复杂的操作(我自己觉得不复杂,所以这么说),有添加和删除信息到cell state中两种常规操作。

image-20230327171141570

我们继续拆解

第一步:

image-20230328121537154

可以看到这块是和上面 x 连接起来的,他由一个 sigmoid 和 pointwise multiplication operation 组成。

这一块决定我们要从cell state 中丢弃哪些信息。

image-20230328122021060

第二步:

决定储存哪些信息

可以看看到 这块又分为两个部分 一部分直接sigmoid(叫 input gate layer) ,决定我们要更新什么。另一部分用 tanh 来建立一个候选值向量(a vector of new candidate values)。

image-20230328122339748

image-20230328142432584我们接下来要把这个值给更新进去,以得到新的 Ct

image-20230328142545015

第三步:

决定我们要输出什么。sigmoid决定输出cell state的哪些部分,tanh ([-1,1]) * sigmid 决定最后的输出

image-20230328142818689

image-20221010103758448

GRU

pytorch中使用方法

RNN

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
class RNNModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
super(RNNModel, self).__init__()

# Defining the number of layers and the nodes in each layer
self.hidden_dim = hidden_dim
self.layer_dim = layer_dim

# RNN layers
self.rnn = nn.RNN(
input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
)
# Fully connected layer
self.fc = nn.Linear(hidden_dim, output_dim)

def forward(self, x):
# Initializing hidden state for first input with zeros
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

# Forward propagation by passing in the input and hidden state into the model
out, h0 = self.rnn(x, h0.detach())

# Reshaping the outputs in the shape of (batch_size, seq_length, hidden_size)
# so that it can fit into the fully connected layer
out = out[:, -1, :]

# Convert the final state to our desired output shape (batch_size, output_dim)
out = self.fc(out)
return out

GRU

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
class GRUModel(nn.Module):
def __init__(self, input_dim, hidden_dim, layer_dim, output_dim, dropout_prob):
super(GRUModel, self).__init__()

# Defining the number of layers and the nodes in each layer
self.layer_dim = layer_dim
self.hidden_dim = hidden_dim

# GRU layers
self.gru = nn.GRU(
input_dim, hidden_dim, layer_dim, batch_first=True, dropout=dropout_prob
)

# Fully connected layer
self.fc = nn.Linear(hidden_dim, output_dim)

def forward(self, x):
# Initializing hidden state for first input with zeros
h0 = torch.zeros(self.layer_dim, x.size(0), self.hidden_dim).requires_grad_()

# Forward propagation by passing in the input and hidden state into the model
out, _ = self.gru(x, h0.detach())

# Reshaping the outputs in the shape of (batch_size, seq_length, hidden_size)
# so that it can fit into the fully connected layer
out = out[:, -1, :]

# Convert the final state to our desired output shape (batch_size, output_dim)
out = self.fc(out)

return out

LSTM

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
import torch
import torch.nn as nn


class LSTM_NN(nn.Module):
"""
This is an implementation of "Long short-term memory neural network for traffic speed
prediction using remote microwave sensor data"
"""
def __init__(self):
super(LSTM_NN, self).__init__()
self.n_hid = 256
self.lstm = nn.LSTM(input_size=1, hidden_size=self.n_hid, num_layers=1, batch_first=True)
self.output = nn.Linear(self.n_hid, 1)

def forward(self, x):
_, n_time, n_node, n_feat = x.shape
x = x.transpose(1, 2) # [B, T, N, F] -> [B, N, T, F]
x = x.reshape(-1, n_time, n_feat) # [B, N, T, F] -> [B * N, T, F]
_, (hn, _) = self.lstm(x)


hn = hn.reshape(1, -1, n_node, self.n_hid).transpose(0, 1)
return self.output(hn).contiguous() # Output ~ [B, N, T=1]

双向 RNN

reference

https://towardsdatascience.com/building-rnn-lstm-and-gru-for-time-series-using-pytorch-a46e5b094e7b

Understanding LSTM Networks – colah’s blog