Claude/update pooling type bf nxj#59
Merged
bernardladenthin merged 6 commits intomasterfrom Apr 1, 2026
Merged
Conversation
…ests Passing --pooling unspecified is not a valid llama.cpp CLI value and would prevent the model from applying its own default pooling strategy. Guard the parameter so UNSPECIFIED is a true no-op. Also expand unit coverage in ModelParametersTest (one test per PoolingType, including the UNSPECIFIED skip invariant) and add a new LlamaEmbeddingsTest integration suite that verifies embed() dimension, NaN/Inf/all-zero guards, and that CLS ≠ LAST for multi-token input. RANK and NONE are intentionally skipped: RANK requires a reranker model; NONE returns per-token embeddings and embed() silently returns only the first-token row, making a sentence-embedding assertion misleading. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
…getArgValue() Each PoolingType constant now carries Javadoc that names its matching LLAMA_POOLING_TYPE_* enum value (include/llama.h) and CLI string (common/arg.cpp --pooling), both anchored to the pinned b8609 tag. Also reverts the accidental use of type.name().toLowerCase() introduced in the previous commit back to the explicit type.getArgValue() constant, so a future rename of an enum constant cannot silently break the CLI arg. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
…gValue() Introduce ModelParameters.ARG_POOLING = \"--pooling\" so the CLI key appears in exactly one place and is never duplicated as a raw string. Update setPoolingType() to use ARG_POOLING and update all pooling assertions in ModelParametersTest to use ModelParameters.ARG_POOLING and PoolingType.<CONSTANT>.getArgValue() instead of inline literals. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
Two root causes: 1. OOM on macOS (exit 137): no setCtxSize() call meant llama.cpp allocated the model's full default KV cache (~8 GB). Added .setCtxSize(128) to openModel(), matching LlamaModelTest. 2. SIGABRT on Ubuntu (exit 134): CLS pooling on a decoder-only model (LLaMA/CodeLlama has no CLS token) causes a native assertion failure in llama.cpp. Removed testEmbedClsPooling and testClsAndLastPoolingDiffer; replaced the latter with testMeanAndLastPoolingDiffer which only uses pooling types safe for decoder-only models. Also document the CLS limitation on PoolingType.CLS. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
The fitting step (--fit on, the default) causes a native abort in llama.cpp when running an embedding model on CPU-only CI. LlamaModelTest already uses setFit(false) for the same reason. Match that pattern. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
CodeLlama's GGUF metadata has pooling_type = -1 (no built-in default). When --pooling is omitted (UNSPECIFIED), llama.cpp keeps -1 and the JNI layer returns the BOS-token vector (first embedding row), which differs significantly from MEAN pooling (~-0.09 vs ~-0.0006 at index 0). The assertion was factually incorrect for this model. https://claude.ai/code/session_01BtqgQVygSaKMi7CpKTg6yb
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.