refactor(pathfinder): descriptor-driven library discovery and loading by cpcloud · Pull Request #1685 · NVIDIA/cuda-python

cpcloud · 2026-02-24T18:08:33Z

Summary

Library metadata was scattered across parallel hand-maintained tables, loader internals, and module-level constants. This made it easy for policy to drift between search and load paths, hard to review changes to individual libraries, and fragile to extend. This PR consolidates all of that into a single descriptor model and restructures the search/load pipeline around it.

Why

Drift risk: library behavior was assembled from multiple overlapping sources (supported_nvidia_libs tables, inline constants, per-platform branching). Changing one without the others caused subtle bugs.
Review friction: understanding what a library needs at load time required reading across several modules. Now it's one catalog entry.
Canary fragility: CTK-root canary fallback (critical for nvvm on non-standard installs) was wired through globals that the refactor initially dropped. This PR restores it as descriptor-configured behavior.
Weak test coverage: some catalog tests were tautological (comparing data to itself). Replaced with structural invariants that catch real errors.

Change progression

Unify library policy in descriptors — introduced LibDescriptor and an authored catalog as the single metadata source, then derived runtime and legacy compatibility tables from it.
Decouple orchestration from platform specifics — refactored discovery/loading into composable search steps with explicit seams (SearchPlatform, PlatformLoader) so OS-specific behavior is localized and the cascade is auditable.
Stabilize during migration — threaded descriptors through loader internals, fixed edge cases in search/load wiring, cleaned up dead code.
Restore CTK canary fallback — reintroduced canary-based CTK-root discovery in the refactored flow, with anchor policy on the descriptor and probing subprocess-isolated.
Harden correctness guardrails — replaced tautological catalog tests with structural invariants; added focused fallback coverage.

Functional outcome

Public API unchanged (load_nvidia_dynamic_lib).
Driver libraries remain system-search-only.
CTK canary fallback preserved, now descriptor-configured.
Adding or modifying a library is now a single catalog entry change.

Made with Cursor

copy-pr-bot · 2026-02-24T18:08:36Z

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

cpcloud · 2026-02-24T18:09:00Z

Recommend going per-commit on this one.

cpcloud · 2026-02-24T18:09:07Z

/ok to test

github-actions · 2026-02-24T18:20:43Z

Doc Preview CI
🚀 View preview at https://nvidia.github.io/cuda-python/pr-preview/pr-1685/
https://nvidia.github.io/cuda-python/pr-preview/pr-1685/cuda-core/
https://nvidia.github.io/cuda-python/pr-preview/pr-1685/cuda-bindings/
https://nvidia.github.io/cuda-python/pr-preview/pr-1685/cuda-pathfinder/
Preview will be ready when the GitHub Pages deployment is complete.

rwgk · 2026-02-24T18:59:41Z

Library metadata was scattered across parallel hand-maintained tables, loader internals, and module-level constants.

That's not entirely fair: there is an existing very large degree of centralization of config-like information in supported_nvidia_libs.py.

Essentially, this PR is doing the equivalent of a matrix transposition (row-major vs column-major for the matrix case / libname-oriented (new) vs packaging/platform-oriented (old)). That has pros-and-cons either way.

My main high-level concern: SUPPORTED_LINUX_SONAMES_CTK and SUPPORTED_WINDOWS_DLLS_CTK are generated from scratch starting from extracted CTK distributions, with simple toolshed scripts. After this PR, that'll be a lot more difficult. I think it'd be fine if we essentially kept those two dicts as source of truth, and populate into the descriptors. The non-ctk libs are fine purely descriptor-based.

cpcloud · 2026-02-24T19:05:31Z

My main high-level concern: SUPPORTED_LINUX_SONAMES_CTK and SUPPORTED_WINDOWS_DLLS_CTK are generated from scratch starting from extracted CTK distributions, with simple toolshed scripts.

I'd rather keep a single source of truth. That is one of two main goals of this PR besides cleaning up all the comingled search and platform logic.

We can adapt the toolshed scripts to the new layout.

cpcloud · 2026-02-24T19:16:12Z

Just to note a few things:

The old layout is derived and still exported. Downstream consumers should require zero code changes.
Changing scripts is basically a one time cost.
Two sources of truth is worse than one, it's a desync surface and defeats the purpose of a big design aspect of this PR.
Descriptors are easier for the script to target.

I'll just adapt the scripts in this PR.

rwgk · 2026-02-24T20:19:46Z

I'll just adapt the scripts in this PR.

That works. I assumed it's difficult/messy.

Two sources of truth is worse than one

Totally, I was suggesting one source of truth, just with one level of indirection, so that we can still easily auto-populate the equivalent of SUPPORTED_LINUX_SONAMES_CTK and SUPPORTED_WINDOWS_DLLS_CTK. If you can achieve that without the indirection, even better.

The indirection I had in mind, roughly:

SUPPORTED_LINUX_SONAMES_CTK = ...
SUPPORTED_WINDOWS_DLLS_CTK = ...
...
    DescriptorSpec(
        name="cudart",
        strategy="ctk",
        linux_sonames=SUPPORTED_LINUX_SONAMES_CTK["cudart"],
        windows_dlls=SUPPORTED_WINDOWS_DLLS_CTK["cudart"],
        site_packages_linux=("nvidia/cu13/lib", "nvidia/cuda_runtime/lib"),
        site_packages_windows=("nvidia/cu13/bin/x86_64", "nvidia/cuda_runtime/bin"),
    ),

You could even hide that behind a DescriptorSpecCtk helper function.

BTW (because I'm copy-pasting that code): Could you please replace strategy with e.g. packaged_with or similar? That's really what it's about: packaged with ctk, or driver, or other. Another idea would be directory_layout, or directory_structure.

cpcloud · 2026-02-25T14:54:04Z

Could you please replace strategy with e.g. packaged_with or similar? That's really what it's about: packaged with ctk, or driver, or other. Another idea would be directory_layout, or directory_structure.

Yep. I like that much better. strategy is a bit too generic.

Add a per-library descriptor dataclass that consolidates all metadata (sonames, DLLs, site-packages paths, dependencies, loader flags) into a single frozen object. The registry is built at import time from the existing data tables -- zero behavioral change. 291 parametrized tests verify the registry is a faithful representation of the source dicts. Co-authored-by: Cursor <cursoragent@cursor.com>

Introduce SearchContext and FindStep to replace the monolithic finder class. Each search mechanism (site-packages, conda, CUDA_HOME) becomes a standalone step with a uniform (SearchContext) -> FindResult | None signature. Keep already-loaded handling and dependency loading as orchestration concerns. Delete the old find_nvidia_dynamic_lib module after migrating its logic. Co-authored-by: Cursor <cursoragent@cursor.com>

Add per-platform anchor-relative directory lists to LibDescriptor and use them for CUDA_HOME/conda anchor resolution. This removes special-case branching (e.g. nvvm) from the anchor-point search. Co-authored-by: Cursor <cursoragent@cursor.com>

Update the platform-specific loader code to consume LibDescriptor directly instead of consulting supported_nvidia_libs tables at runtime. This makes the loading path data-driven (desc.linux_sonames/windows_dlls, desc.requires_* flags, desc.dependencies). Co-authored-by: Cursor <cursoragent@cursor.com>

Introduce the platform loader boundary for dlopen calls and fold the immediate wrapper cleanup into the same change so loader dispatch stays straightforward while preserving behavior. Co-authored-by: Cursor <cursoragent@cursor.com>

Introduce search_platform.py, exporting a single PLATFORM instance that implements the per-OS filesystem search behavior. search_steps routes all platform differences through SearchContext.platform, removing OS branching from the search step implementations. Co-authored-by: Cursor <cursoragent@cursor.com>

Inline single-use lib-dir lookup helpers into the platform implementations to reduce helper surface area while keeping shared rel-dir scanning helpers. Co-authored-by: Cursor <cursoragent@cursor.com>

Drop the platform-dispatch convenience properties that became unused after introducing PlatformLoader/SearchPlatform. Co-authored-by: Cursor <cursoragent@cursor.com>

Add a canonical descriptor catalog module that contains one DescriptorSpec per supported dynamic library. Add exhaustive parity tests asserting the catalog matches the current LIB_DESCRIPTORS registry field-for-field before runtime wiring is flipped. Co-authored-by: Cursor <cursoragent@cursor.com>

Switch lib_descriptor.py from "assemble-from-supported tables" to "registry-from-authored catalog". Keep backward-compatible names (`LibDescriptor`, `Strategy`, and `LIB_DESCRIPTORS`) while making descriptor_catalog the canonical source. Co-authored-by: Cursor <cursoragent@cursor.com>

Replace the hand-authored supported_nvidia_libs tables with compatibility constants derived from DESCRIPTOR_CATALOG while preserving historical export names and behaviors. This makes descriptor data the single authored source and keeps supported_nvidia_libs as a derived-views shim for existing imports. Co-authored-by: Cursor <cursoragent@cursor.com>

Consolidate post-refactor fixes for driver-lib test alignment, platform search-path edge cases, and typing/import cleanup so behavior and diagnostics remain stable. Co-authored-by: Cursor <cursoragent@cursor.com>

…variants The previous tests compared catalog entries against LIB_DESCRIPTORS, which is built directly from the same catalog -- always passing by construction. Replace with parametrized checks that verify real properties: name uniqueness, valid identifiers, strategy values, dependency graph integrity, soname/dll format, and driver lib constraints. Co-authored-by: Cursor <cursoragent@cursor.com>

Simplify catalog entries by relying on DescriptorSpec defaults and fold companion import-order cleanup into the same readability-focused change. EOF && git cherry-pick -n 5d5547f a804f26c4 && git commit --trailer "Co-authored-by: Cursor <cursoragent@cursor.com>" -F - <<'EOF' refactor(pathfinder): split add-nv-library core flow from optional UI Keep add-nv-library lightweight by default with prompt/CLI-first behavior while moving Textual chrome behind explicit UI tasks and lockfile feature splits. Co-authored-by: Cursor <cursoragent@cursor.com>

Reinstate CTK-root canary discovery in the refactored loader path and define canary eligibility/anchors on per-library descriptors so fallback policy lives with the rest of library metadata. Co-authored-by: Cursor <cursoragent@cursor.com>

Co-authored-by: Cursor <cursoragent@cursor.com>

Provides helpers to import, update, and rewrite the descriptor catalog file in place. Each toolshed extraction script can merge its findings (sonames, DLLs, site-packages paths) into the authored catalog without touching fields it doesn't own. Co-authored-by: Cursor <cursoragent@cursor.com>

…alog.py in place Replace print-to-stdout workflow (manual copy-paste) with direct in-place catalog updates via the shared _catalog_writer module. - build_pathfinder_sonames.py: scans directories for .so files and extracts SONAMEs via readelf (replaces find_sonames.sh pipeline) - build_pathfinder_dlls.py: parses 7z listings, updates windows_dlls - make_site_packages_libdirs.py: parses collected paths, updates site_packages_linux or site_packages_windows All three scripts now derive the set of known library names from the catalog itself, eliminating the duplicated LIBNAMES_IN_SCOPE constant. Co-authored-by: Cursor <cursoragent@cursor.com>

Platform-aware dispatcher: runs sonames extraction on Linux, DLL listing parsing on Windows. Single command to update descriptor_catalog.py from CTK installations. Made-with: Cursor

Prevent import-time failures after the descriptor-catalog refactor by deriving availability from linux/windows descriptor fields instead of removed helpers. Remove an unused catalog-writer variable so pre-commit remains green. Made-with: Cursor

cpcloud · 2026-03-03T17:54:17Z

/ok to test

Use a more descriptive field name for library classification so descriptor semantics are explicit in runtime and toolshed code paths. This aligns terminology with review feedback while preserving behavior and compatibility. Made-with: Cursor

cpcloud · 2026-03-03T18:40:54Z

@rwgk Ping!

I addressed the strategy bit in the most recent commit (f3c96cd).

cpcloud · 2026-03-03T18:42:21Z

/ok to test

rwgk

Please give me a couple hours to make sure we'll be fine for the next pathfinder CTK update.

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/descriptor_catalog.py

Remove the `Strategy = PackagedWith` alias from the descriptor catalog and related re-exports to avoid confusing, unused terminology. Made-with: Cursor

rwgk · 2026-03-03T22:17:04Z

Please give me a couple hours to make sure we'll be fine for the next pathfinder CTK update.

Luckily, we don't need any updates here for the next CTK release; i.e. my timing concern isn't an issue.

I'm looking through this PR again now.

rwgk

I tried the new maintenance commands and looked at the PR carefully with the help of Cursor. The only issue is that AGENTS.md is out of date, but we can update it in a follow-on PR.

rwgk · 2026-03-04T01:02:52Z

toolshed/_catalog_writer.py

@@ -0,0 +1,182 @@
+#!/usr/bin/env python3


I was able to successfully use the new code.

First I did what I always did before, unpack all CTKs like this:

smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/extracted $ ll total 44K drwxr-xr-x 37 rgrossekunst dip 4.0K Feb 6 2025 12.0.1_525.85.12 drwxr-xr-x 37 rgrossekunst dip 4.0K Feb 6 2025 12.1.1_530.30.02 drwxr-xr-x 36 rgrossekunst dip 4.0K Feb 6 2025 12.2.2_535.104.05 drwxr-xr-x 36 rgrossekunst dip 4.0K Feb 6 2025 12.3.2_545.23.08 drwxr-xr-x 37 rgrossekunst dip 4.0K Feb 6 2025 12.4.1_550.54.15 drwxr-xr-x 37 rgrossekunst dip 4.0K Feb 6 2025 12.5.1_555.42.06 drwxr-xr-x 37 rgrossekunst dip 4.0K Nov 4 10:43 12.6.3_560.35.05 drwxr-xr-x 38 rgrossekunst dip 4.0K Apr 23 2025 12.8.1_570.124.06 drwxr-xr-x 38 rgrossekunst dip 4.0K Aug 4 2025 12.9.1_575.57.08 drwxr-xr-x 40 rgrossekunst dip 4.0K Nov 4 10:34 13.0.2_580.95.05 drwxr-xr-x 42 rgrossekunst dip 4.0K Jan 14 13:35 13.1.1_590.48.01

Then I ran:

/wrk/forked/cuda-python/TestVenv/bin/python /wrk/forked/cuda-python/toolshed/build_pathfinder_sonames.py $(find /wrk/extracted -maxdepth 1 -type d -name '*.*.*_*' | sort)

I verified that it works correctly by manually removing one lib and then rerunning the command.

Similarly for Windows:

smc120-0009.ipp2a2.colossus.nvidia.com:/wrk/exe_7z_l $ ll total 6.6M -rw-r--r-- 1 rgrossekunst dip 635K Apr 23 2025 cuda_12.0.1_528.33_windows.txt -rw-r--r-- 1 rgrossekunst dip 635K May 1 2025 cuda_12.1.1_531.14_windows.txt -rw-r--r-- 1 rgrossekunst dip 675K Apr 23 2025 cuda_12.2.2_537.13_windows.txt -rw-r--r-- 1 rgrossekunst dip 676K Apr 23 2025 cuda_12.3.2_546.12_windows.txt -rw-r--r-- 1 rgrossekunst dip 675K Apr 23 2025 cuda_12.4.1_551.78_windows.txt -rw-r--r-- 1 rgrossekunst dip 694K Apr 23 2025 cuda_12.5.1_555.85_windows.txt -rw-r--r-- 1 rgrossekunst dip 674K Nov 4 10:49 cuda_12.6.3_561.17_windows.txt -rw-r--r-- 1 rgrossekunst dip 632K Apr 23 2025 cuda_12.8.1_572.61_windows.txt -rw-r--r-- 1 rgrossekunst dip 650K Aug 4 2025 cuda_12.9.1_576.57_windows.txt -rw-r--r-- 1 rgrossekunst dip 365K Nov 4 10:31 cuda_13.0.2_windows.txt -rw-r--r-- 1 rgrossekunst dip 385K Jan 14 13:32 cuda_13.1.1_windows.txt

/wrk/forked/cuda-python/TestVenv/bin/python /wrk/forked/cuda-python/toolshed/build_pathfinder_dlls.py /wrk/exe_7z_l/*.txt

I'm putting that info here because we should work that into the AGENTS.md file, which seems out of date?

To add more information, since we're on track to document the full workflow, these are the Linux files from which the /wrk/extracted directory was created (e.g. ./cuda_12.0.1_525.85.12_linux.run --extract=/wrk/extracted/12.0.1_525.85.12):

rwgk-win11.localdomain:~/ctk_downloads $ ll cuda_1*linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 4.0G Jan 28 2023 cuda_12.0.1_525.85.12_linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 4.1G Apr 17 2023 cuda_12.1.1_530.30.02_linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 4.1G Aug 23 2023 cuda_12.2.2_535.104.05_linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 4.1G Dec 12 2023 cuda_12.3.2_545.23.08_linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 4.2G Mar 28 2024 cuda_12.4.1_550.54.15_linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 4.1G Jun 25 2024 cuda_12.5.1_555.42.06_linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 4.2G Nov 14 2024 cuda_12.6.3_560.35.05_linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 5.1G Mar 3 2025 cuda_12.8.1_570.124.06_linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 5.5G May 31 2025 cuda_12.9.1_575.57.08_linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 4.1G Oct 3 00:31 cuda_13.0.2_580.95.05_linux.run -r-xr-xr-x 1 rgrossekunst rgrossekunst 4.1G Dec 23 08:47 cuda_13.1.1_590.48.01_linux.run

These are the Windows files from which the exe_7z_l/*.txt files were generated (e.g. 7z l cuda_12.0.1_528.33_windows.exe > /wrk/exe_7z_l/cuda_12.0.1_528.33_windows.exe.txt)

rwgk-win11.localdomain:~/ctk_downloads $ ll cuda*.exe -r--r--r-- 1 rgrossekunst rgrossekunst 3.5G Jan 28 2023 cuda_12.0.1_528.33_windows.exe -r--r--r-- 1 rgrossekunst rgrossekunst 3.2G Apr 17 2023 cuda_12.1.1_531.14_windows.exe -r--r--r-- 1 rgrossekunst rgrossekunst 3.1G Aug 22 2023 cuda_12.2.2_537.13_windows.exe -r--r--r-- 1 rgrossekunst rgrossekunst 3.1G Nov 29 2023 cuda_12.3.2_546.12_windows.exe -r--r--r-- 1 rgrossekunst rgrossekunst 3.0G Mar 28 2024 cuda_12.4.1_551.78_windows.exe -r--r--r-- 1 rgrossekunst rgrossekunst 3.0G Jun 25 2024 cuda_12.5.1_555.85_windows.exe -r--r--r-- 1 rgrossekunst rgrossekunst 3.1G Nov 14 2024 cuda_12.6.3_561.17_windows.exe -r--r--r-- 1 rgrossekunst rgrossekunst 3.2G Mar 3 2025 cuda_12.8.1_572.61_windows.exe -r--r--r-- 1 rgrossekunst rgrossekunst 3.4G May 31 2025 cuda_12.9.1_576.57_windows.exe -r--r--r-- 1 rgrossekunst rgrossekunst 2.3G Oct 3 01:11 cuda_13.0.2_windows.exe -r--r--r-- 1 rgrossekunst rgrossekunst 2.4G Dec 23 09:10 cuda_13.1.1_windows.exe

rwgk · 2026-03-04T01:24:22Z

/ok to test a8680de

rwgk · 2026-03-04T02:43:54Z

@cpcloud do you want to take care of the AGENT.md update, or should I?

Removed preview folders for the following PRs: - PR #1519 - PR #1685

cpcloud · 2026-03-04T09:34:20Z

I can update it.

cpcloud requested a review from rwgk February 24, 2026 18:09

cpcloud and others added 20 commits March 3, 2026 11:45

refactor(pathfinder): inline SearchPlatform lib-dir lookup helpers

e46fc9f

Inline single-use lib-dir lookup helpers into the platform implementations to reduce helper surface area while keeping shared rel-dir scanning helpers. Co-authored-by: Cursor <cursoragent@cursor.com>

refactor(pathfinder): remove unused LibDescriptor properties

365a952

Drop the platform-dispatch convenience properties that became unused after introducing PlatformLoader/SearchPlatform. Co-authored-by: Cursor <cursoragent@cursor.com>

fix(pathfinder): tighten refactor follow-ups across search and tests

a97ce4b

Consolidate post-refactor fixes for driver-lib test alignment, platform search-path edge cases, and typing/import cleanup so behavior and diagnostics remain stable. Co-authored-by: Cursor <cursoragent@cursor.com>

style(pathfinder): fix pre-commit formatting and mypy return type

66b73a5

Co-authored-by: Cursor <cursoragent@cursor.com>

feat(toolshed): add unified update_catalog.py entry point

d06744a

Platform-aware dispatcher: runs sonames extraction on Linux, DLL listing parsing on Windows. Single command to update descriptor_catalog.py from CTK installations. Made-with: Cursor

cpcloud force-pushed the lib-descriptor-refactor branch from c989b32 to 450d366 Compare March 3, 2026 17:54

rwgk reviewed Mar 3, 2026

View reviewed changes

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/descriptor_catalog.py Outdated Show resolved Hide resolved

refactor(pathfinder): drop unused Strategy alias

5349fc7

Remove the `Strategy = PackagedWith` alias from the descriptor catalog and related re-exports to avoid confusing, unused terminology. Made-with: Cursor

rwgk approved these changes Mar 4, 2026

View reviewed changes

Merge branch 'main' into lib-descriptor-refactor

a8680de

rwgk enabled auto-merge (squash) March 4, 2026 01:24

rwgk merged commit 6121573 into NVIDIA:main Mar 4, 2026
86 checks passed

github-actions bot pushed a commit that referenced this pull request Mar 4, 2026

Clean up PR preview folders for 2 closed/merged PRs

d373421

Removed preview folders for the following PRs: - PR #1519 - PR #1685

cpcloud deleted the lib-descriptor-refactor branch March 4, 2026 09:34

rwgk mentioned this pull request Mar 25, 2026

Add pathfinder.locate_nvidia_header_directory support for mathdx, cutlass, cute #1816

Merged

Conversation

cpcloud commented Feb 24, 2026

Summary

Why

Change progression

Functional outcome

Uh oh!

copy-pr-bot bot commented Feb 24, 2026

Uh oh!

cpcloud commented Feb 24, 2026

Uh oh!

cpcloud commented Feb 24, 2026

Uh oh!

github-actions bot commented Feb 24, 2026

Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

rwgk commented Feb 24, 2026

Uh oh!

cpcloud commented Feb 24, 2026

Uh oh!

cpcloud commented Feb 24, 2026

Uh oh!

rwgk commented Feb 24, 2026

Uh oh!

cpcloud commented Feb 25, 2026

Uh oh!

cpcloud commented Mar 3, 2026

Uh oh!

cpcloud commented Mar 3, 2026

Uh oh!

cpcloud commented Mar 3, 2026

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rwgk commented Mar 3, 2026

Uh oh!

rwgk left a comment

Choose a reason for hiding this comment

Uh oh!

rwgk Mar 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rwgk commented Mar 4, 2026

Uh oh!

Uh oh!

rwgk commented Mar 4, 2026

Uh oh!

cpcloud commented Mar 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rwgk Mar 4, 2026 •

edited

Loading