FlareStart
HomeNewsHow ToSources
FlareStart

Where developers start their day. All the tech news & tutorials that matter, in one place.

Quick Links

  • Home
  • News
  • Tutorials
  • Sources

Connect

© 2026 FlareStart. All rights reserved.

Back to articles
ArticleMachine Learning

Let's reproduce GPT-2 (124M)

via Andrej KarpathyAndrej Karpathy1y ago

We reproduce the GPT-2 (124M) from scratch. This video covers the whole process: First we build the GPT-2 network, then we optimize its training to be really fast, then we set up the training run following the GPT-2 and GPT-3 paper and their hyperparameters, then we hit run, and come back the next morning to see our results, and enjoy some amusing model generations. Keep in mind that in some places this video builds on the knowledge from earlier videos in the Zero to Hero Playlist (see my channel). You could also see this video as building my nanoGPT repo, which by the end is about 90% similar. Links: - build-nanogpt GitHub repo, with all the changes in this video as individual commits: https://github.com/karpathy/build-nanogpt - nanoGPT repo: https://github.com/karpathy/nanoGPT - llm.c repo: https://github.com/karpathy/llm.c - my website: https://karpathy.ai - my twitter: https://twitter.com/karpathy - our Discord channel: https://discord.gg/3zy8kqD9Cp Supplementary links: - Attention

Watch on Andrej Karpathy

Opens in a new tab

Watch on YouTube
1 views

Related Articles

Why Degrees Don’t Make Developers
Article

Why Degrees Don’t Make Developers

Continuously Delivered • 2w ago

When you write your tests TOO LATE... #softwareengineering
Article

When you write your tests TOO LATE... #softwareengineering

Continuously Delivered • 3w ago

"Hello police? I'd like to report a journalism."
Article

"Hello police? I'd like to report a journalism."

Benn Jordan • 1mo ago

Traditional X-Mas Stream
Article

Traditional X-Mas Stream

Yannic Kilcher • 1mo ago

I Tested Dozens of Python Libraries But These 9 Are Actually Worth Using
News

I Tested Dozens of Python Libraries But These 9 Are Actually Worth Using

Medium Programming • 36m ago

Discover More Articles