If you don’t care about the technical background and want to use Nix-packaged CUDA applications on a non-NixOS system, scroll down to Solutions.
Dynamic linking outside Nix
Suppose that we have a CUDA application like llama.cpp outside Nix. How does it get its library dependencies such as the required CUDA libraries on Linux? ELF binaries contains dynamic section with information for the dynamic linker. It encodes, among other things, the required dynamic libraries. For instance, we can use patchelf or readelf to get the CUDA libraries that are used:
$ patchelf --print-needed llama-cli | grep cu
libcuda.so.1
libcublas.so.12
libcudart.so.12So the llama.cpp CLI uses the CUDA runtime (libcudart.so.12), cuBLAS (libcublas.so.12), and the CUDA driver library (libcuda.so.1). The CUDA driver library is different than the other libraries in that it is tightly coupled to the NVIDIA driver and does not come with CUDA it self, but the NVIDIA drivers.
Dynamic library dependencies are resolved by the dynamic linker. The dynamic linker uses a cache of known libraries. The directories that are cached can be configured using /etc/ld.so.conf. In addition to that, an ELF binary can specify additional library paths, the so-called _runtime path) or rpath. However, rpath is not used in our llama-cli binary:
$ patchelf --print-rpath llama-cli
# NothingnessSo everything library is loaded from directories configured in ld.so.conf. If you want to get more details on dynamic linking and how libraries are looked up, you can use the LD_DEBUG environment variable to get the dynamic linker to display library search paths:
$ LD_DEBUG=libs ./llama-cli
4002: find library=libcuda.so.1 [0]; searching
4002: search cache=/etc/ld.so.cache
4002: trying file=/lib/x86_64-linux-gnu/libcuda.so.1
4002:
4002: find library=libcublas.so.12 [0]; searching
4002: search cache=/etc/ld.so.cache
4002: trying file=/usr/local/cuda/targets/x86_64-linux/lib/libcublas.so.12
[...]Dynamic linking in Nix
The standard dynamic linking approach is not compatible with the objectives of Nix. Nix aims for full reproducibility, which is not possible with a global dynamic linker cache.
Suppose that we have two applications, both are built against OpenBLAS (same library, same version), but with different OpenBLAS build configurations. With a global dynamic linker cache, we cannot distinguish both builds and ensure that the applications are dynamically linked against the correct build. So, we cannot fully reproduce the intended configurations.
To resolve this issue, Nix avoids using a global cache for dynamic linking. Instead, it embeds the paths of the library dependencies in the binary’s runtime path (rpath). We can observe this by e.g. building the llama.cpp derivation from the nixpkgs repository and inspecting the required libraries and the rpath:
$ export OUT=`nix-build -E '(import ./default.nix { config = { allowUnfree = true; cudaSupport = true; }; }).llama-cpp'`
$ patchelf --print-needed $OUT/lib/libggml.so | grep cu
libcudart.so.12
libcublas.so.12
libcublasLt.so.12
libcuda.so.1
$ patchelf --print-rpath $OUT/lib/libggml.so
/run/opengl-driver/lib:/nix/store/23j56hv7plgkgmhj8l2aj4mgjk32529h-cuda_cudart-12.2.140-lib/lib:/nix/store/9q0rrjr5y5ibqcxc9q1m34g1hb7z9yr8-cuda_cudart-12.2.140-stubs/lib:/nix/store/rnyc2acy5c45pi905ic9cb2iybn35crz-libcublas-12.2.5.6-lib/lib:/nix/store/0wydilnf1c9vznywsvxqnaing4wraaxp-glibc-2.39-52/lib:/nix/store/kgmfgzb90h658xg0i7mxh9wgyx0nrqac-gcc-13.3.0-lib/libAs you can see, rather than just storing the required dynamic libraries and letting the dynamic linker resolve their full paths from its cache, a binary compiled with Nix embeds the full paths of its library dependencies in the Nix Store (/nix/store).
This solves the reproducibility issue, since each binary/library can fully specify the version it uses, and e.g. different build configurations of a binary will lead to different hashes in the output paths (/nix/store/<hash>-<name>-<version>-<output>).
The glitch in the matrix: the CUDA driver library
There is one glitch/impurity that creeps in. Remember that the CUDA driver library (libcuda.so.1) is tightly coupled to the NVIDIA driver? So, in the case of this particular library, we cannot dynamically link against arbitrary versions. It needs to link against the CUDA driver library that corresponds to the system’s NVIDIA driver.
NixOS solves this by allowing an impurity in the form of global state for this particular case. As can be seen in the rpath above, there is an entry /run/opengl-driver/lib. If the NVIDIA driver is configured on a NixOS system, NixOS guarantees that libcuda.so.1 is symlinked into this location. In this way, a binary will always use a CUDA driver library that is consistent with the system’s NVIDIA driver version.
Sadly, this doesn’t work on non-NixOS systems, because they don’t have the /run/opengl-driver/lib directory. This brings us to some hacks to resolve this issue…
Solutions
Make /run/opengl-driver/lib and symlink the driver library
$ sudo mkdir -p /run/opengl-driver/lib
$ sudo find /usr/lib \
. -name 'libcuda.so*' \
-exec ln -s {} /run/opengl-driver/lib/ \;Since /run is not persisted across reboots, these steps need to be reapplied after every reboot. To make it permanent, you can use tmpfiles.d to automatically create the symlink on ever boot. For instance on Ubuntu:
$ echo "L /run/opengl-driver/lib/libcuda.so - - - - /usr/lib/x86_64-linux-gnu/libcuda.so" | \
sudo tee /etc/tmpfiles.d/cuda-driver-for-nix.conf
$ sudo systemd-tmpfiles --createOther distributions may use a different path for libcuda.so.
Preload the driver library
$ export LD_PRELOAD=/usr/lib/x86_64-linux-gnu/libcuda.so.1Warning
LD_PRELOADdoes not cover all cases. When a program/library uses runtime compilation (e.g. Triton), the Nix derivation will typically burn the path/run/opengl-driver/libinto the package as a linker path (i.e.-L/run/opengl-driver/lib).LD_PRELOADdoes not override this and will fail in such cases.
Warning
Avoid using
LD_LIBRARY_PATHunless the CUDA driver library is in a directory by itself. UsingLD_LIBRARY_PATHwith a path with multiple libraries can also override other libraries. In the best case, this breaks reproducibility. In the worst case it breaks the application.
nixGL
nixGL can wrap a program to resolve the CUDA driver library.