Skip to content

Claude/update pooling type bf nxj#59

Merged
bernardladenthin merged 6 commits intomasterfrom
claude/update-pooling-type-bfNxj
Apr 1, 2026
Merged

Claude/update pooling type bf nxj#59
bernardladenthin merged 6 commits intomasterfrom
claude/update-pooling-type-bfNxj

Conversation

@bernardladenthin
Copy link
Copy Markdown
Owner

No description provided.

claude added 6 commits April 1, 2026 14:12
…ests

Passing --pooling unspecified is not a valid llama.cpp CLI value and
would prevent the model from applying its own default pooling strategy.
Guard the parameter so UNSPECIFIED is a true no-op.

Also expand unit coverage in ModelParametersTest (one test per
PoolingType, including the UNSPECIFIED skip invariant) and add a new
LlamaEmbeddingsTest integration suite that verifies embed() dimension,
NaN/Inf/all-zero guards, and that CLS ≠ LAST for multi-token input.
RANK and NONE are intentionally skipped: RANK requires a reranker model;
NONE returns per-token embeddings and embed() silently returns only the
first-token row, making a sentence-embedding assertion misleading.

https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
…getArgValue()

Each PoolingType constant now carries Javadoc that names its matching
LLAMA_POOLING_TYPE_* enum value (include/llama.h) and CLI string
(common/arg.cpp --pooling), both anchored to the pinned b8609 tag.

Also reverts the accidental use of type.name().toLowerCase() introduced
in the previous commit back to the explicit type.getArgValue() constant,
so a future rename of an enum constant cannot silently break the CLI arg.

https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
…gValue()

Introduce ModelParameters.ARG_POOLING = \"--pooling\" so the CLI key
appears in exactly one place and is never duplicated as a raw string.
Update setPoolingType() to use ARG_POOLING and update all pooling
assertions in ModelParametersTest to use ModelParameters.ARG_POOLING
and PoolingType.<CONSTANT>.getArgValue() instead of inline literals.

https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
Two root causes:

1. OOM on macOS (exit 137): no setCtxSize() call meant llama.cpp
   allocated the model's full default KV cache (~8 GB). Added
   .setCtxSize(128) to openModel(), matching LlamaModelTest.

2. SIGABRT on Ubuntu (exit 134): CLS pooling on a decoder-only model
   (LLaMA/CodeLlama has no CLS token) causes a native assertion failure
   in llama.cpp. Removed testEmbedClsPooling and testClsAndLastPoolingDiffer;
   replaced the latter with testMeanAndLastPoolingDiffer which only uses
   pooling types safe for decoder-only models.

Also document the CLS limitation on PoolingType.CLS.

https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
The fitting step (--fit on, the default) causes a native abort in
llama.cpp when running an embedding model on CPU-only CI. LlamaModelTest
already uses setFit(false) for the same reason. Match that pattern.

https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
CodeLlama's GGUF metadata has pooling_type = -1 (no built-in default).
When --pooling is omitted (UNSPECIFIED), llama.cpp keeps -1 and the
JNI layer returns the BOS-token vector (first embedding row), which
differs significantly from MEAN pooling (~-0.09 vs ~-0.0006 at index 0).
The assertion was factually incorrect for this model.

https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
@bernardladenthin bernardladenthin merged commit 03cef58 into master Apr 1, 2026
14 checks passed
@bernardladenthin bernardladenthin deleted the claude/update-pooling-type-bfNxj branch April 1, 2026 15:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants