FlareStart
HomeNewsHow ToSources
FlareStart

Where developers start their day. All the tech news & tutorials that matter, in one place.

Quick Links

  • Home
  • News
  • Tutorials
  • Sources
  • Privacy Policy

Connect

© 2026 FlareStart. All rights reserved.

Back to articles
124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level
NewsProgramming Languages

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

via Dev.to PythonIngero2d ago

TL;DR: PyTorch's DataLoader can be 50-124x slower than direct tensor indexing for in-memory GPU workloads. We reproduced a real PyTorch issue on an RTX 4090 and traced every CUDA API call and Linux kernel event to find the root cause. The GPU wasn't slow - it was starving. DataLoader workers generated 200,000 CPU context switches and 300,000 page allocations in 40 seconds, leaving the GPU waiting an average of 301ms per data transfer that should take microseconds. The Problem A PyTorch user reported that DataLoader was 7-22x slower than direct tensor indexing for a simple MLP inference workload. Even with num_workers=12 , pin_memory=True , and prefetch_factor=12 , the gap remained massive. GPU utilization sat at 10-20%. We reproduced it. The gap was even worse on our hardware: Method Time vs Direct Direct tensor indexing 0.39s 1x DataLoader (shuffle=True) 48.49s 124x slower DataLoader (optimized, 4 workers, pin_memory) 43.29s 111x slower The workload is trivial: 7M samples, 100 feature

Continue reading on Dev.to Python

Opens in a new tab

Read Full Article
8 views

Related Articles

Official White House app developer also a UFO conspiracy theorist
News

Official White House app developer also a UFO conspiracy theorist

Ars Technica • 2d ago

The Artemis Moon base project is legally dubious
News

The Artemis Moon base project is legally dubious

The Verge • 2d ago

The HP OmniBook 5 Is a MacBook Neo Killer, and It's Only $500
News

The HP OmniBook 5 Is a MacBook Neo Killer, and It's Only $500

Wired • 2d ago

Trump defunding of NPR and PBS blocked by judge, but damage is already done
News

Trump defunding of NPR and PBS blocked by judge, but damage is already done

Ars Technica • 2d ago

Everything is iPhone now
News

Everything is iPhone now

The Verge • 2d ago

Discover More Articles