Abstract
In this work, the development of a basic large language model (LLM) has been presented, with a primary focus on the pre-training process and model architecture. A simplified transformer-based design has been implemented to demonstrate core LLM principles with the incorporation of reinforcement learning techniques. Key components such as tokenization, and training objectives have been discussed to provide a foundational understanding of LLM construction. Additionally, an overview of several established models—including GPT-2, LLaMA 3.1, and DeepSeek—has been provided to contextualize current advancements in the field. Through this comparative and explanatory approach, the essential building blocks of large-scale language models have been explored in a clear and accessible manner.
Published Version
Join us for a 30 min session where you can share your feedback and ask us any queries you have