Claude/update pooling type bf nxj by bernardladenthin · Pull Request #59 · bernardladenthin/java-llama.cpp

bernardladenthin · 2026-04-01T14:22:45Z

No description provided.

…ests Passing --pooling unspecified is not a valid llama.cpp CLI value and would prevent the model from applying its own default pooling strategy. Guard the parameter so UNSPECIFIED is a true no-op. Also expand unit coverage in ModelParametersTest (one test per PoolingType, including the UNSPECIFIED skip invariant) and add a new LlamaEmbeddingsTest integration suite that verifies embed() dimension, NaN/Inf/all-zero guards, and that CLS ≠ LAST for multi-token input. RANK and NONE are intentionally skipped: RANK requires a reranker model; NONE returns per-token embeddings and embed() silently returns only the first-token row, making a sentence-embedding assertion misleading. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb

…getArgValue() Each PoolingType constant now carries Javadoc that names its matching LLAMA_POOLING_TYPE_* enum value (include/llama.h) and CLI string (common/arg.cpp --pooling), both anchored to the pinned b8609 tag. Also reverts the accidental use of type.name().toLowerCase() introduced in the previous commit back to the explicit type.getArgValue() constant, so a future rename of an enum constant cannot silently break the CLI arg. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb

…gValue() Introduce ModelParameters.ARG_POOLING = \"--pooling\" so the CLI key appears in exactly one place and is never duplicated as a raw string. Update setPoolingType() to use ARG_POOLING and update all pooling assertions in ModelParametersTest to use ModelParameters.ARG_POOLING and PoolingType.<CONSTANT>.getArgValue() instead of inline literals. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb

Two root causes: 1. OOM on macOS (exit 137): no setCtxSize() call meant llama.cpp allocated the model's full default KV cache (~8 GB). Added .setCtxSize(128) to openModel(), matching LlamaModelTest. 2. SIGABRT on Ubuntu (exit 134): CLS pooling on a decoder-only model (LLaMA/CodeLlama has no CLS token) causes a native assertion failure in llama.cpp. Removed testEmbedClsPooling and testClsAndLastPoolingDiffer; replaced the latter with testMeanAndLastPoolingDiffer which only uses pooling types safe for decoder-only models. Also document the CLS limitation on PoolingType.CLS. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb

The fitting step (--fit on, the default) causes a native abort in llama.cpp when running an embedding model on CPU-only CI. LlamaModelTest already uses setFit(false) for the same reason. Match that pattern. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb

CodeLlama's GGUF metadata has pooling_type = -1 (no built-in default). When --pooling is omitted (UNSPECIFIED), llama.cpp keeps -1 and the JNI layer returns the BOS-token vector (first embedding row), which differs significantly from MEAN pooling (~-0.09 vs ~-0.0006 at index 0). The assertion was factually incorrect for this model. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb

claude added 6 commits April 1, 2026 14:12

bernardladenthin mentioned this pull request Apr 1, 2026

Fix passing of embedding pooling type kherud/java-llama.cpp#118

Open

bernardladenthin merged commit 03cef58 into master Apr 1, 2026
14 checks passed

bernardladenthin deleted the claude/update-pooling-type-bfNxj branch April 1, 2026 15:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Claude/update pooling type bf nxj#59

Claude/update pooling type bf nxj#59
bernardladenthin merged 6 commits intomasterfrom
claude/update-pooling-type-bfNxj

bernardladenthin commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bernardladenthin commented Apr 1, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants