Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
-
Updated
Mar 19, 2026 - Python
Autoresearch for GPU kernels. Give it any PyTorch model, go to sleep, wake up to optimized Triton kernels.
Production-Grade Autoresearch. Ideal for GPU kernels, ML model development, feature engineering, prompt engineering, and other optimizable code.
Extended TileLang as a unified DSL to enable high-performance kernel development for Near-Memory Computing, Distributed Memory AI Accelerators, and Networked Accelerators.
CUDA Kernel Library for LLM Inference: FlashAttention, HGEMM, Tensor Core GEMM with pybind11 Bindings | LLM 推理加速 CUDA Kernel 库:FlashAttention、HGEMM、Tensor Core GEMM,含 pybind11 Python 绑定
Skill pack for custom PyTorch MPS kernels on Apple Silicon (examples, tests, and optimization patterns).
Optimize PyTorch GPU kernels by autonomously profiling, extracting, and improving Triton or CUDA C++ code for better performance and efficiency.
Add a description, image, and links to the kernel-optimization topic page so that developers can more easily learn about it.
To associate your repository with the kernel-optimization topic, visit your repo's landing page and select "manage topics."