Configuring LXC GPU Passthrough

I recently bought a HP Elitedesk 800 G4 off eBay. Bearing an i5-8500T with a 35W TDP, it should be a reasonably low-power solution to deal with the more resource-intensive applications my Raspberry Pi 4 was struggling with - such as Immich.

I decided to try Proxmox (8.1) for this box - rather than my usual go-to of Fedora - and also ventured into LXC for the first time. Knowing that Immich now supports hardware-acceleration for transcoding, I wanted to make sure this was working too, as the i5-8500T integrates Quicksync (Intel’s video encoding/decoding solution.)

There are a bunch of guides and issues across the web documenting the struggles people have had giving Immich access to a GPU while running inside an unprivileged LXC container - but this Reddit post for Jellyfin documents the approach that worked for me, after a number of failed attempts using other guides.

Specifically:

I used the Turnkey Debian LXC template, with Docker installed from the Docker (rather than Debian) repository. Alpine had worked initially but Docker broke sporadically for no obvious reason, and Podman didn’t fare better.
Per the linked post, I shared /dev/dri rather than just /dev/dri/renderd128 per other guides.
Similarly, I ensured that both the video and render groups were mapped from the host.
…and that the container’s root user was added to both those video and render groups.
I verified the GPU was accessible from the container by running vainfo - this was the most convenient way to test access.
Finally, I used the quicksync option in the Immich configuration as documented.

I then added about 85k photos and videos from my NAS and let it get to work importing them all.

On the host, running intel_gpu_top confirmed that ffmpeg was busy converting one video at a time - but using a tiny fraction of the GPU. Bumping the number of concurrent transcodes in the Immich admin UI from 1 to 25 soon fixed that, and the GPU was hitting around 75% of its peak frequency. Not bad given I’d also limited the container to only half of the 6 CPUs available.

This setup has been running for a few days now and got through the backlog pretty quickly - definitely faster than the Pi given the same task, and is noticeably more responsive.