Building for the Milk-V Duo
I use a Nix-based cross-compilation environment. The upstream SDK V2 images use a statically-compiled system (no glibc available), so to build for it you need to build static binaries. You can do this by setting up cross-compilation for the riscv64-unknown-linux-musl target and then passing -static to gcc/g++ for static compilation
Vector extension support
Compilation
The SG2002 CPU used by the Milk-V uses the T-Head Vector extensions, which are a proprietary extension of the pre-standard RISC-V Vector (RVV) extension 0.7.1. Yes, this is as bad an idea as it sounds, but I guess it’s hard to complain for a 20 Euro board.
If you would normally compile using -march=rv64gcv for RVV 1.0, you should compile with -march=rv64gc_xtheadvector for T-Head CPUs like the SG2002.
Once you go beyond the basics of RVV, you will find that the T-Head Vector extensions has some missing functionality/idiosynchrasies.
Inverse square root
One thing that is missing is support for the vfrsqrt7.v instruction for reciprocal square root estimations . Unfortunately, the regular square root instruction vfsqrt.v is pretty slow, so square root + division is out for some applications. Luckily, the good old fast inverse square root estimation still works:
// Newton-Raphson iteration for inverse square root:
// y' = y * (1.5 - 0.5 * a * y * y)
inline vfloat32m8_t rsqrt_newton_raphson(vfloat32m8_t y, vfloat32m8_t a, size_t vl) {
auto tmp = __riscv_vfmul_vv_f32m8(y, y, vl);
tmp = __riscv_vfmul_vv_f32m8(a, tmp, vl);
tmp = __riscv_vfmul_vf_f32m8(tmp, 0.5f, vl);
tmp = __riscv_vfrsub_vf_f32m8(tmp, 1.5f, vl);
return __riscv_vfmul_vv_f32m8(y, tmp, vl);
}
// https://en.wikipedia.org/wiki/Fast_inverse_square_root
inline vfloat32m8_t fast_rsqrt(vfloat32m8_t x, size_t vl) {
auto i = __riscv_vreinterpret_v_f32m8_u32m8(x);
i = __riscv_vsrl_vx_u32m8(i, 1, vl);
i = __riscv_vrsub_vx_u32m8(i, 0x5f3759df, vl);
auto rsqrt = __riscv_vreinterpret_v_u32m8_f32m8(i);
// Adjust number of Newton-Raphson iterations depending on the
// required accuracy.
rsqrt = rsqrt_newton_raphson(rsqrt, x, vl);
rsqrt = rsqrt_newton_raphson(rsqrt, x, vl);
return rsqrt;
}