From Stutter to Smooth: Our Performance Journey
The same MP4 file played buttery smooth in IINA but stuttered in our player. Here's the story of how we diagnosed the problem, optimized our AVFoundation pipeline, and ultimately adopted mpv for the best of both worlds.
The Problem
We started with a standard AVFoundation pipeline:
AVPlayer → AVPlayerItemVideoOutput → copyPixelBuffer (BGRA)
→ CIImage → CIFilter chain → CIContext → Metal texture → CAMetalLayerIt worked. But 4K content stuttered. 1080p at 60fps dropped frames. Meanwhile, IINA played the same files without breaking a sweat.
Diagnosing the Bottlenecks
We profiled everything and found four major bottlenecks:
1. Forced BGRA Conversion (~1.9 GB/s wasted)
We requested kCVPixelFormatType_32BGRA from AVFoundation. But VideoToolbox decodes to NV12 natively. Every frame was being converted from YCbCr to BGRA on the CPU before we even touched it.
Fix: Switch to kCVPixelFormatType_420YpCbCr8BiPlanarVideoRange and handle YUV→RGB on the GPU.
2. CIContext on Every Frame
Even for simple playback without filters, we were running every frame through CIContext — creating a CIImage, applying an identity transform, compositing onto a black background, and rendering to Metal.
Fix: Bypass CIContext entirely when no filters are active. Use CVMetalTextureCache for zero-copy texture creation and a Metal shader for NV12→RGB conversion.
3. Main Thread Rendering
All Metal rendering was happening on the main thread. This meant every command buffer encode and GPU submission competed with UI updates, gesture handling, and animation.
Fix: Move rendering to a dedicated DispatchQueue with .userInteractive QoS. Use a semaphore to prevent frame queue backlog.
4. The Fundamental Limitation
After all optimizations, our AVFoundation pipeline was significantly better. But IINA was still smoother. Why?
Because IINA uses mpv, which has a completely different architecture:
- FFmpeg demuxing with no intermediate copies
- VideoToolbox decoding to NV12
- GLSL shaders for YUV→RGB + scaling + filtering in a single pass
- Direct FBO output — no CIContext, no CIImage, no Metal texture cache
mpv's renderer is purpose-built for video. CIContext is general-purpose.
The Optimization Results
| Improvement | Impact |
|---|---|
| NV12 native format | CPU bandwidth reduced 50-70% |
| CIContext bypass (no filters) | Rendering 30-40% faster |
| CIContext caching | Filter path 10-15% faster |
| Render thread isolation | UI responsive, stable frame timing |
These optimizations transformed our AVFoundation backend from stuttery to smooth for most content.
The Final Move: mpv Integration
But "most content" wasn't good enough. We wanted IINA-level performance for all content, plus MKV/WebM support that AVFoundation simply cannot provide.
So we integrated libmpv — the same engine that powers IINA. The result:
- All formats supported — MKV, WebM, AVI, and everything else
- IINA-level smoothness — mpv's optimized render pipeline
- Hardware decoding — VideoToolbox for H.264, H.265, VP9, AV1
- Built-in subtitles — ASS/SSA with full styling support
We kept our optimized AVFoundation backend as a fallback for PiP and other Apple-specific features. The MediaDecoder protocol makes switching between backends seamless.
Lessons Learned
- Profile first — We assumed the bottleneck was in Metal rendering. It was actually in pixel format conversion
- Accept native formats — Don't fight the hardware decoder's output format
- Minimize intermediate steps — Every copy, every conversion, every context switch adds latency
- Dedicated render threads — Never block the main thread with GPU work
- Know when to adopt — Sometimes the best optimization is using a purpose-built engine