Model building blocks Dish Activation Attention Mechanisms Logit Softcapping Multi-GPU Model Parallelism: Tensor Parallelism Quantization Quantizer Notation GPTQ Checkpoint Format