Sequence to Sequence Learning
Seq2Seq with Neural Networks can be used to perform “End to End” Translation came into reality around 2014. An simple Encoder-Decoder architecture to do Language Translation, showed us the possibility of Neural Machine Translation (NMT) over Statistical Machine Translation (SMT).
Initial approach was with 2 Deep LSTM Networks where first acts as an Encoder for mapping input to a fixed dimension vector and second as Decoder for mapping that fixed vector to output sequence.
The encoder network can be built with a RNN/GRU /LSTM. We feed the input words one word at a time. After ingesting the input sequence, the Neural Unit then offers a vector that represents the input sentence. Then the decoder network comes to play, which takes as input the encoding output by the encoder network, this network then can be trained to output the translation one word at a time until eventually it outputs say, the end of sequence or end the sentence token upon which the decoder stops and as usual we could take the generated tokens and feed them to the next head in the sequence similar to when synthesizing text using the language model.
Architecture similar to this also works for image captioning like given an image, it can be captioned automatically for example ‘cat sitting on a chair’. For instance a pre-trained network can be the encoder network for the image, which gives us a n-dimensional vector that represents the image. We then take this vector and feed it to an RNN/LSTM/GRU, whose job will be to generate the caption one word at a time.
This is a sneak peek into Sequence to Sequence Model from me.
Team Blogs:
1. Gated Recurrent Unit — An Introduction
2. LSTM - Solution for vanishing gradient
3. RNNs are dead and their renewed relevance