Writing My First CUDA Matrix Multiplication Kernel: Why I chose CUDA(C++) Over OpenAI's Triton. I just completed my first custom CUDA matrix multiplication kernel for a (1024 x 1024) matrix ...