Seyong Ryoo

Building a Decoder-Only Transformer Model

Understanding the Foundation of ChatGPT Through Practical Coding

Posted on September 12, 2024

In this post, we will explore the Decoder-Only Transformer, the foundation of ChatGPT, through a simple code example. For the code, I referred to Josh Starmer’s video, Coding a ChatGPT Like Transformer From Scratch in PyTorch. I highly recommend watching the video if you’re unfamiliar with the concept of Decoder-Only... [Read More]

Understanding Transformers

Breaking Down the Transformer Architecture

Posted on September 9, 2024

In this post, we will explore how Transformer, the foundation of models like ChatGPT, operates step by step using a simple example. Specifically, we’ll focus on how a transformer neural network translates the simple Spanish sentence “Te quiero” into English, “I love you”. [Read More]

Tags: Artificial Intelligence Data Science Deep Learning Transformer Attention

Attention Mechanism Simplified

Attention in Sequence Models

Posted on September 6, 2024

In the Seq2Seq model that we explored, the encoder processes input data and produces a context vector- a single vector that encapsulates the entire input sequence. This vector is then passed to the decoder, which uses it to generate the output sequence. However, it’s important to consider the challenge of... [Read More]

Tags: Artificial Intelligence Data Science Deep Learning Sequential Modeling Attention

Decoding Seq2Seq Models

How Machines Learn to Translate

Posted on September 3, 2024

In the previous post, we explored Long Short Term Memory Networks (LSTMs), a variant of Recurrent Neural Networks (RNNs) designed to handle long sequence data more effectively. Now, let’s consider a practical application: translating sentences from one language to another. For instance, translating the Spanish sentence (“Te quiero”) into the... [Read More]

Tags: Artificial Intelligence Data Science Deep Learning Sequential Modeling LSTM Seq2Seq

Breaking Down LSTM

Exploring the Capabilities of Long Short-Term Memory Networks

Posted on September 1, 2024

Recurrent Neural Networks [Read More]

Tags: Artificial Intelligence Machine Learning Data Science Neural Network Deep Learning Sequential Modeling RNN LSTM