Sequence to Sequence Learning — Paper Explained

What the heck is Encoder & Decoder in a seq2seq model?

Published in

Analytics Vidhya

5 min readSep 8, 2020

This blog gives a high-level intuition of the Seq2Seq model, so you don’t need to be “the deep learning guy” to understand all of it, basic neural network knowledge would be sufficient to understand this blog.

The research paper, Sequence to Sequence Learning with Neural Network is considered a breakthrough in the field of Natural Language Processing after Google released the paper at the Conference on Neural Information Processing Systems (NIPS). After understanding the Seq2Seq, one can move ahead to read about Transformers, Attention Concepts, and other latest breakthroughs in the field of NLP.

Starting right from the start already had the following RNNs earlier:

Vector to Vector(Vec2Vec) RNN

This model takes a single vector as input and produces a single vector as output.

An example could be word-to-word translation.

2. Sequence to Vector(Seq2Vec) RNN

In the RNN model, where we give the input, a sequence at output comes out to be a single vector.
An example of this could be language predictor,

我爱计算机科学 (input sequence) gives Chinese(output vector)

3. Vector to Sequence(Vec2Sec) RNN

The RNN model where input is a single vector and outputs a sequence.

4. Sequence to Sequence (Seq2Seq) RNN

Yes! We had Sequence to Sequence model earlier too which takes input as a sequence and provides output in sequence too.
Note: It completely differs from the Encoder-Decoder architecture represented in the paper which is explained a little later in this blog

Despite having this traditional seq2seq RNN model. How could this paper be a breakthrough?
Here, in the above figure, we can see the length of output sequence is same as the input sequence. A Language-translation model is in the same case. If we are trying to convert English to Hindi language, this doesn’t necessarily mean that length of English sequence is same as that of the Hindi sequence. This is where Encoder and Decoder are useful.

Encoder and Decoder

Above shown is the same architecture proposed in the Seq2Seq - encoder decoder research paper. It composes of 2 basic components-
1. Encoder
2. Decoder
Shown below is a well differentiated diagrammatic representation of the encoder and decoder. The key advantage of using an encoder decoder model is length of input and length of output can be different.

Each cell in this architecture is an RNN network (can be either LSTM network or GRU network).
Encoder receives input(A, B, C, <eos>) and each cell receives a feedback from the hidden state(links between each cell).
W is the context vector.
Decoder is the part where output is generated(X,Y,Z,<eos>).

Let’s try to understand the each component in detail so that we can understand the whole functioning of the encoder, decoder and how information is passed from encoder to decoder.

Encoder

Encoder is a recurrent RNN network in each cell. Each cell receives single vector followed by an end-of-statement token, <EOS> which marks the end of input sequence and generates context vector, W.

It is responsible for taking input and pass-on the information to the next cell. This information being passed on is known as the feedback (hidden state) and is shown via the links between successive cells.

Hidden State (feedback) Calculation Formula

The formula mentioned here is used to calculate the hidden state’s feedback value which contains all the information till the last cell’s input. Here, Wʰʰ is the weight given to the previous hidden state at time stamp (t-1) and Wʰˣ is the weight given to the current input at time stamp t.

Context Vector — the linkage between Encoder and Decoder

Context vector contains all the information from all the hidden states inside Encoder and it plays the role of only input for the decoder part.

Decoder

Decoder is responsible for generating the output. Each cell receives the output generated from the previous cell at time stamp (t-1) as the input of the current cell at time stamp t.
W, the context vector generated by the last cell of encoder acts as the initial input for the RNN cells in decoder.

The key point to notice is that the output generated from the cell at time stamp t is used as the input for the cell at time stamp t+1.

These two formulae are used to calculate the hidden state feedback and the output from each cell.

That’s all the basics one need to know about the functioning of Encoder and Decoder. This paper became a breakthrough in the research and applied field of Natural Language Processing (NLP) as now the length of the sequence of output can be different from the length of the input making it robust and very useful in the case of machine translation, question-answering system, dialogue generation models, and many other fields.