NLP notes

Annotated Transformer - An in-depth guide on the Transformer model, explaining the architecture and its components.
The Unreasonable Effectiveness of Recurrent Neural Networks - Andrej Karpathy's post on the power and applications of RNNs.
Understanding LSTMs - A detailed explanation of Long Short-Term Memory networks and their mechanisms.
Seq2Seq Learning with Neural Networks - Paper by Ilya Sutskever et al. on sequence-to-sequence learning.
Distributed Representations - Geoffrey Hinton's paper on distributed representations of concepts in neural networks.
Distilling the Knowledge in a Neural Network - Paper on model compression techniques.
ImageNet Classification with Deep Convolutional Neural Networks.
Batch Normalization - Paper introducing the technique to accelerate deep network training by reducing internal covariate shift.
BERT: Pre-training of Deep Bidirectional Transformers - Paper on BERT, a method for pre-training language representations.
ResNet: Deep Residual Learning - Paper on Residual Networks, which allow training very deep networks.
Adam: A Method for Stochastic Optimization - Paper on the Adam optimization algorithm, widely used in training neural networks.
Attention Is All You Need - The influential paper introducing the Transformer model.
Transformers for Image Recognition at Scale - Paper on the application of Transformer models to image recognition.
Generative Adversarial Nets - The original paper on GANs by Goodfellow et al.
Neural Machine Translation by Jointly Learning to Align and Translate - Paper on neural machine translation models that align and translate simultaneously.
One Model to Learn Them All - Paper on a single model that learns tasks across multiple domains.
Wavenet: A Generative Model for Raw Audio - Paper introducing WaveNet, a deep generative model for producing audio.
Attention Is All You Need for Speech Recognition - Paper exploring the application of Transformer models to speech recognition.
Auto-Encoding Variational Bayes - Paper on variational autoencoders (VAEs), a type of generative model.
Deep Convolutional Generative Adversarial Networks - Paper on DCGANs, combining CNNs with GANs for generative tasks.
Memory Networks - Paper on neural networks with a memory component for reasoning tasks.
Graph Neural Networks - A comprehensive survey on graph neural networks (GNNs).
Category Theory for Computing Science - Paper discussing the application of category theory in computer science.
Machine Super Intelligence
Kolmogorov Complexity and Algorithmic Randomness - A comprehensive book on Kolmogorov complexity and its applications.
CS231n: Convolutional Neural Networks for Visual Recognition - Stanford's course materials on CNNs for visual recognition tasks.