Long-Short Term Memory (LSTM)

1 minute read

This is written by me and not generated by ChatGPT. The aim is to develop a conceptual understanding of the LSTM architecture.

LSTM, at a high level, produces a few things:

A Cell state that represent long-term memory -> Information retention purpose
A Hidden state that represent working memory -> Problem solving and decision making purpose

LSTM produce them by doing 2 tasks:

LSTM does those task by using Gates and Cell states:

Gates use sigmoid activation due to its (0, 1) range. This range, by designed, allow Gates to act as filters, they will decide which neuron in the paired state to attend and which to ignore.
- Forget gate and Input gate maintains the Cell state
  - Forget gate is paired with the (previous) Cell state
  - Input gate is paired with the Candidate cell state
- Output gate create the new Hidden state for next time step
  - Output gate is paired with the (updated) Cell state
Cell states use Tanh activation due to its (-1, 1) range. This range, by designed, allow the Cell states to represent meaningful information that’s retained for later use.
- Cell state: Long term memory
- Candidate cell state: Proposed update based on the new input

All gates takes in the previous hidden state and x as input:

Forget gate decide what to forget from the cell state
Input gate decide what to keep from the proposed candidate cell state
Output gate decide what information is needed for sort-term problem solving, specifically in the next time step.

A few question:

Does the hidden state have too many responsibility? It has to carry enough information for both the Forget gate, Input gate, Output gate, and Candidate cell state proposal layer to do their job.

You may also enjoy