Skip to content

Recurrent Neural Networks

Meghavarshini Krishnaswamy edited this page Mar 28, 2024 · 7 revisions

Introduction

RNNs are a good architecture for working with sequential data in the form of x (1), ...x (τ). RNNs are better suited for sequences with many members, and varied lengths than the perceptron or feed-forward layers. This is because they are able to use the relationship between adjacent members of a sequence to make their predictions. Due to their ability to retain information about previous states and process large sequences, RNNs a good option for time-series analysis and natural language processing.

RNNs process sequential data by defining a recurrence relation over timesteps which is typically the following formula:

$$ S_{k} = f(S_{k-1}\cdot W_{rec} + X_{k}\cdot W_{X}) $$

Where $S_{k}$ is the state at time k, $X_{k}$ an exogenous input at time , $W_{rec}$ and $W_{X}$ are parameters similar to wrights. The RNN can be viewed as a state model with a feedback loop . The state evolves over time due to the recurrence relation, and the feedback is fed back into the state with a delay of one timestep. This delayed feedback loop gives the model memory because it can "remember" information between timesteps in the states.

The final output of the network at a certain timestep is typically computed from one or more states.

This structure allows us to predict the next state $S_{k+1}$ from the current state $S_{k}$ and current input $X_{k}$

Source for information and image: How to implement an RNN (1/2) - Minimal example

Use Cases

RNNs are great for tasks that require a many-to-many, or many-to-one mapping. Such as classifying different series of letters to classes, or machine translation.

RNNs for NLP

We will transition into this Python notebook for our demonstration: pytorch_char_rnn_classification_tutorial.ipynb

Clone this wiki locally