Building a Decoder-Only Transformer Model

Understanding the Foundation of ChatGPT Through Practical Coding

In this post, we will explore the Decoder-Only Transformer, the foundation of ChatGPT, through a simple code example. For the code, I referred to Josh Starmer’s video, Coding a ChatGPT Like Transformer From Scratch in PyTorch. I highly recommend watching the video if you’re unfamiliar with the concept of Decoder-Only... [Read More]

Attention Mechanism Simplified

Attention in Sequence Models

In the Seq2Seq model that we explored, the encoder processes input data and produces a context vector- a single vector that encapsulates the entire input sequence. This vector is then passed to the decoder, which uses it to generate the output sequence. However, it’s important to consider the challenge of... [Read More]

Decoding Seq2Seq Models

How Machines Learn to Translate

In the previous post, we explored Long Short Term Memory Networks (LSTMs), a variant of Recurrent Neural Networks (RNNs) designed to handle long sequence data more effectively. Now, let’s consider a practical application: translating sentences from one language to another. For instance, translating the Spanish sentence (“Te quiero”) into the... [Read More]