Performance

From Stutter to Smooth: Our Performance Journey

2025-02-137 min read

The same MP4 file played buttery smooth in IINA but stuttered in our player. Here's the story of how we diagnosed the problem, optimized our AVFoundation pipeline, and ultimately adopted mpv for the best of both worlds.

The Problem

We started with a standard AVFoundation pipeline:

AVPlayer → AVPlayerItemVideoOutput → copyPixelBuffer (BGRA)
→ CIImage → CIFilter chain → CIContext → Metal texture → CAMetalLayer

It worked. But 4K content stuttered. 1080p at 60fps dropped frames. Meanwhile, IINA played the same files without breaking a sweat.

Diagnosing the Bottlenecks

We profiled everything and found four major bottlenecks:

1. Forced BGRA Conversion (~1.9 GB/s wasted)

We requested kCVPixelFormatType_32BGRA from AVFoundation. But VideoToolbox decodes to NV12 natively. Every frame was being converted from YCbCr to BGRA on the CPU before we even touched it.

Fix: Switch to kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange and handle YUV→RGB on the GPU.

2. CIContext on Every Frame

Even for simple playback without filters, we were running every frame through CIContext — creating a CIImage, applying an identity transform, compositing onto a black background, and rendering to Metal.

Fix: Bypass CIContext entirely when no filters are active. Use CVMetalTextureCache for zero-copy texture creation and a Metal shader for NV12→RGB conversion.

3. Main Thread Rendering

All Metal rendering was happening on the main thread. This meant every command buffer encode and GPU submission competed with UI updates, gesture handling, and animation.

Fix: Move rendering to a dedicated DispatchQueue with .userInteractive QoS. Use a semaphore to prevent frame queue backlog.

4. The Fundamental Limitation

After all optimizations, our AVFoundation pipeline was significantly better. But IINA was still smoother. Why?

Because IINA uses mpv, which has a completely different architecture:

FFmpeg demuxing with no intermediate copies
VideoToolbox decoding to NV12
GLSL shaders for YUV→RGB + scaling + filtering in a single pass
Direct FBO output — no CIContext, no CIImage, no Metal texture cache

mpv's renderer is purpose-built for video. CIContext is general-purpose.

The Optimization Results

Improvement	Impact
NV12 native format	CPU bandwidth reduced 50-70%
CIContext bypass (no filters)	Rendering 30-40% faster
CIContext caching	Filter path 10-15% faster
Render thread isolation	UI responsive, stable frame timing

These optimizations transformed our AVFoundation backend from stuttery to smooth for most content.

The Final Move: mpv Integration

But "most content" wasn't good enough. We wanted IINA-level performance for all content, plus MKV/WebM support that AVFoundation simply cannot provide.

So we integrated libmpv — the same engine that powers IINA. The result:

All formats supported — MKV, WebM, AVI, and everything else
IINA-level smoothness — mpv's optimized render pipeline
Hardware decoding — VideoToolbox for H.264, H.265, VP9, AV1
Built-in subtitles — ASS/SSA with full styling support

We kept our optimized AVFoundation backend as a fallback for PiP and other Apple-specific features. The MediaDecoder protocol makes switching between backends seamless.

Lessons Learned

Profile first — We assumed the bottleneck was in Metal rendering. It was actually in pixel format conversion
Accept native formats — Don't fight the hardware decoder's output format
Minimize intermediate steps — Every copy, every conversion, every context switch adds latency
Dedicated render threads — Never block the main thread with GPU work
Know when to adopt — Sometimes the best optimization is using a purpose-built engine