Back to articles
ArticleTools

Building makemore Part 3: Activations & Gradients, BatchNorm

via Andrej KarpathyAndrej Karpathy

We dive into some of the internals of MLPs with multiple layers and scrutinize the statistics of the forward pass activations, backward pass gradients, and some of the pitfalls when they are improperly scaled. We also look at the typical diagnostic tools and visualizations you'd want to use to understand the health of your deep network. We learn why training deep neural nets can be fragile and introduce the first modern innovation that made doing so much easier: Batch Normalization. Residual connections and the Adam optimizer remain notable todos for later video. Links: - makemore on github: https://github.com/karpathy/makemore - jupyter notebook I built in this video: https://github.com/karpathy/nn-zero-to-hero/blob/master/lectures/makemore/makemore_part3_bn.ipynb - collab notebook: https://colab.research.google.com/drive/1H5CSy-OnisagUgDUXhHwo1ng2pjKHYSN?usp=sharing - my website: https://karpathy.ai - my twitter: https://twitter.com/karpathy - Discord channel: https://discord.gg/3zy8

Watch on Andrej Karpathy

Opens in a new tab

Watch on YouTube
1 views

Related Articles