Deciding on a model architecture A standard architecture Activations Positional embeddings Frequently seen modifications Parallel feed-forward and attention Additional layer-norms