-
Notifications
You must be signed in to change notification settings - Fork 267
RFC: Cythonize cuda.core while keeping it CUDA-agnostic #866
Copy link
Copy link
Closed
7 / 87 of 8 issues completed
Copy link
Labels
P0High priority - Must do!High priority - Must do!RFCPlans and announcementsPlans and announcementscuda.coreEverything related to the cuda.core moduleEverything related to the cuda.core moduleenhancementAny code-related improvementsAny code-related improvementspackagingAnything related to wheels or Conda packagesAnything related to wheels or Conda packages
Milestone
Metadata
Metadata
Assignees
Labels
P0High priority - Must do!High priority - Must do!RFCPlans and announcementsPlans and announcementscuda.coreEverything related to the cuda.core moduleEverything related to the cuda.core moduleenhancementAny code-related improvementsAny code-related improvementspackagingAnything related to wheels or Conda packagesAnything related to wheels or Conda packages
Type
Projects
Status
Needs Triage
Today the majority of
cuda.coreis implemented in pure Python. As a result, we've been dealing with microsecond-level overhead in the past few months if not weeks (ex: #739, #658). As much as I think it is pre-mature optimization at this stage, I do hear the desire of keeping the performance competitive while staying productive.This RFC outlines one such solution to address the performance concerns. Below are the critical requirements
cuda.corecontinues to support multiple CUDA major versionspip install cuda-corestays unchangedThe critical question to answer is how we'll lower to Cython while having to build against
cuda.bindings12x & 13.x. Here is the steps following the great work @dalcinl did for mpi4py v4.1.0 (to support both Open MPI and MPICH):.pyto.pyxand update the build system.pyxwith others being.pyiand literal-included, similar to thempi4py.MPImodulecuda-coretwice, once against CUDA &cuda-bindings12.x, and then 13.xcuda/core/experimental/__init__.pyto decide which extension module to load, based on the installedcuda-bindingsmajor versionIt is worth noting that the Step 2 and 3 only happen in the public CI, so as to meet the Requirement 3 (for local development, neither internal nor external developers should need to have multiple CUDA versions installed).
Another note is that this RFC is only applicable to make our Python wheels stay variant-free (no -cu12/-cu13); for conda packages, it is trivial to build variant packages without changing the UX (
conda install cuda-core), so no extra work is needed.This RFC also mirrors our plan for
cuda-cccl(NVIDIA/cccl#2555).