Learned from StatQuest on Youtube. Really GREAT channel.
Sometimes we need a network to receive variable-length of inputs.
For example, when we want to predict the stock price on day 10, and we have 9 days of data for the blue stock but only 4 days for the red.
A normal neural network cannot solve this, because the input dimension has to be fixed. But we can use RNN, structured like this,
There is only one space for input, and we input the data one by one sequentially.
Traversing RNN, one input will have an intermediate result, which will be fed back to the preceding block, and when a latter input comes to that block, it will be summed with the fed backward result, and this kind of circulation keeps running until all the inputs have traversed the network.
The feed-backward mechanism helps us simulate the connection among the sequential data.
But, because we are going to let each data go through the network, we will repeat the process a lot of times. In RNN, when setting w2 to (0, 1), the gradient will vanish for backpropagation. When set to (1, +inf), the gradient will explode.
To address this problem, Long Short Term Memory (LSTM) is proposed.
LSTM is an improved version of RNN. We are replacing the network with the following structure, but it remains the idea of feeding backward.
The blue line above represents long-term memory, and the blue line below is short-term memory.
After one input has traversed the network, we get the C(t) and h(t), which will be fed back to the position of C(t-1) and h(t-1). Eventually, after inputting all the data one by one, we get the final result at the position of h(t).