This repo contains implementation of various forms of attention:
- Location based attention
- Content based dot product
- Content based concatenation
- Content based general attention
- Pointer networks
and finally
Each of these sequence to sequence models is trained to learn how to sort a shuffled array of numbers from 1 to N. The code to generate this data is here.
There is a considerable improvement if an attention based model is used versus the no attention model.
All the models and the data loader are defined in code/.
-
Each model is defined in a separate file. The file containing a model also contains
trainandtestfunctions which are self-explanatory. -
Output logs are stored under
training_outputs/ -
Attention weights can be visualized using the code in the notebook Visualizing attention.