AI Portal Gun
Tech Deep Dive

Deep Dive into Transformers, Neural Network & Backprop

Dive deep into Transformers, Neural Networks, and backpropagation. Explore cutting-edge AI, uncover insights, and empower your understanding of these transformative technologies.

backprop & Transformers Meme


  • Neural Network course (opens in a new tab) by 3Blue1Brown, Delve into the intricate world of neural networks through captivating animations and intuitive explanations. Gain a deep understanding of this fundamental ML concept in a visually engaging and enlightening way.

  • Neural Networks / Deep Learning (opens in a new tab) by StatQuest with Josh Starmer, includes everything you need to know about Neural Networks, from the basics, all the way to image classification with Convolutional Neural Networks.

  • Hugging Face Transformers (opens in a new tab): covers core concepts of the Transformers library, model operation, fine-tuning from Hugging Face Hub, and result sharing. It then proceeds to Datasets and Tokenizers for NLP tasks. The final section delves into speech processing and computer vision tasks, emphasizing model optimization and production-ready demos.


  • Software 2.0 (opens in a new tab): Andrej Karpathy (2017) highlighted the profound significance of the emerging AI wave. He emphasized that AI represents a potent and innovative method for computer programming. The rapid advancements in LLMs have validated this perspective, offering a insightful framework for anticipating the potential evolution of the AI market.

  • Introduction to Neural Networks (Part 1) (opens in a new tab) & part 2 (opens in a new tab) by Harsha Bommana, explores the neural network's fundamental components, including neurons, their mathematical operations, and the significance of activation functions for addressing non-linear problems. It delves into various neural network types and offers examples of their applications in ML and deep learning.

  • How Transformers Work (opens in a new tab): Discover the mechanics of Transformers, the revolutionary neural network technology harnessed by industry leaders like OpenAI and DeepMind. Gain valuable insights into the inner workings of these AI giants and explore their transformative applications in the world of ML.

  • The illustrated transformer (opens in a new tab) by Jay Alammar provides an in-depth technical examination of the transformer framework, offering detailed insights into its structure and functionality.

  • Yes you should understand backprop (opens in a new tab) by Andrej Karpathy, depth post on backpropagation provides valuable insights into the intricacies of this crucial technique. For a deeper dive, consider exploring the Stanford CS231n lecture on backpropagation and neural networks.

  • The Annotated Transformer (opens in a new tab): This post provides an annotated paper implementation of "Attention Is All You Need", presented in a line-by-line format. It reorganizes and omits certain sections from the original paper while incorporating comments, offering a fully functional implementation with available code in a concise format. Requires some knowledge of PyTorch.


  • Explore the video article Attention Is All You Need video (opens in a new tab) a visual journey through the groundbreaking concepts introduced in the original paper. If the written article seems complex, this video provides an accessible and insightful way to grasp the key ideas behind the transformative Transformer architecture.

  • Introduction to Transformers (opens in a new tab) by Andrej Karpathy. Since their groundbreaking introduction in 2017, transformers have transformed NLP and expanded into various domains of Deep Learning, including computer vision (CV), reinforcement learning (RL), Generative Adversarial Networks (GANs), Speech, and even Biology. They played a pivotal role in the development of powerful language models like GPT-3 and were instrumental in DeepMind's remarkable AlphaFold2 project, addressing protein folding challenges.

  • A friendly introduction to Deep Learning and Neural Networks (opens in a new tab) by Serrano.Academy, introduces deep learning and neural networks, explaining AI, ML, and deep learning basics. It showcases real-world applications like image recognition and natural language processing.

  • Watching Neural Networks Learn (opens in a new tab) by Emergent Garden, demonstrates neural network learning, focusing on recognizing handwritten digits. Trained on a dataset, it showcases the network's improving accuracy, revealing its capacity for image recognition and evolving internal representations over time.

  • Why Neural Networks can learn (almost) anything (opens in a new tab) by Emergent Garden, provides a fascinating insight into the power of neural networks and their ability to learn complex patterns.

  • The backpropagation algorithm (opens in a new tab) by Geoffrey Hinton delves into the fundamentals of backpropagation, a key algorithm in training neural networks. Hinton's teachings include insights on learning with the idea behind it, the role of hidden units, and learning through perturbing weights, discusses the concept of learning by using perturbations, providing valuable knowledge and techniques for training neural networks effectively.

  • Tensors for Neural Networks, Clearly Explained!!! (opens in a new tab) by Josh Starmer. Tensors are fundamental data structures in machine learning, representing multi-dimensional arrays that store and manipulate information, enabling the foundation for deep learning and neural network operations.


  • Attention Is All You Need (opens in a new tab) (2017): Introduced the groundbreaking concept of the Transformer architecture, an attention-based neural architecture for sequence processing. It outperforms other methods in NLP tasks like machine translation and language modeling, demonstrating its effectiveness in capturing contextual information from input sequences like sentences. (blog)

  • Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context (opens in a new tab) (2019): Suggests improving language models with Transformer-XL, designed for longer text. Uses segment-level recurrence, reusing hidden states for processing extended sequences. Outperforms original Transformer and other models in language tasks.

  • Rethinking Attention with Performers (opens in a new tab) (2020): Introduced Transformer architectures estimating full-rank-attention with linear complexity, devoid of sparsity or low-rankness assumptions. Leveraging Fast Attention Via positive Orthogonal Random features (FAVOR+), they efficiently approximate attention-kernels, extending beyond softmax for scalable kernel methods. Performers demonstrate accuracy across diverse tasks, showcasing a novel attention-learning paradigm.

  • End-to-End Object Detection with Transformers (opens in a new tab) (2020): Presenting a novel object detection method, DETR views the task as a direct set prediction problem, eliminating hand-designed components like non-maximum suppression and anchor generation. Using a set-based global loss and a transformer encoder-decoder architecture, DETR reasons about object relations, delivering accurate predictions in parallel. It outperforms Faster RCNN on COCO, showcasing simplicity and generalizability to panoptic segmentation.

  • Pretraining Data Mixtures Enable Narrow Model Selection Capabilities in Transformer Models (opens in a new tab) (2023): The study reveals that transformer models, pre-trained on a mix of diverse data sources like news, books, and code, exhibit constrained 'model selection' abilities. They excel in tasks aligned with their training domains but struggle in mismatched ones. While pretraining on varied data holds potential for enhancing model flexibility, further research is required to broaden transformers' task generalization.