gh-116: Fix SPLIT handling of empty delimiter.#117
gh-116: Fix SPLIT handling of empty delimiter.#117python-processing-unit merged 1 commit intogh-101from
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the SPLIT builtin to reject an empty string delimiter (preventing undefined/incorrect split behavior) and adjusts the delimiter-based split path to preserve empty trailing fields.
Changes:
- Add a runtime error when
SPLITis called with an empty delimiter string. - Modify delimiter-based splitting to always append the final field (including empty trailing fields) instead of dropping it.
- Remove the special-case return of an empty tensor for delimiter-based splits that previously produced zero items.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| size_t len = strlen(cur); | ||
| char* piece = malloc(len + 1); | ||
| memcpy(piece, cur, len); | ||
| piece[len] = '\0'; | ||
| if (count + 1 > cap) { cap *= 2; items = realloc(items, sizeof(Value) * cap); } | ||
| items[count++] = value_str(piece); | ||
| free(piece); |
There was a problem hiding this comment.
The newly added last-piece handling allocates piece with malloc(len + 1) but never checks for allocation failure before memcpy, which can dereference NULL on OOM. Since value_str() already duplicates its input, you can avoid this allocation entirely by passing cur directly; otherwise add an explicit NULL check and clean up (free(copy), free(items)) before raising the runtime error.
| size_t len = strlen(cur); | |
| char* piece = malloc(len + 1); | |
| memcpy(piece, cur, len); | |
| piece[len] = '\0'; | |
| if (count + 1 > cap) { cap *= 2; items = realloc(items, sizeof(Value) * cap); } | |
| items[count++] = value_str(piece); | |
| free(piece); | |
| if (count + 1 > cap) { cap *= 2; items = realloc(items, sizeof(Value) * cap); } | |
| items[count++] = value_str(cur); |
| @@ -4984,16 +4987,15 @@ static Value builtin_split(Interpreter* interp, Value* args, int argc, Expr** ar | |||
| free(piece); | |||
| cur = found + seplen; | |||
| } | |||
| // last piece | |||
| if (*cur != '\0') { | |||
| if (count + 1 > cap) { cap *= 2; items = realloc(items, sizeof(Value) * cap); } | |||
| items[count++] = value_str(cur); | |||
| } | |||
| // last piece, including empty trailing fields | |||
| size_t len = strlen(cur); | |||
| char* piece = malloc(len + 1); | |||
There was a problem hiding this comment.
This change introduces new SPLIT behaviors that should be covered by the existing builtin test suite (tests/test2.pre): (1) empty delimiter now raises a runtime error, and (2) empty trailing fields are now preserved (e.g., splitting "a," by "," should include a final empty string). Adding explicit assertions will prevent regressions and clarify intended semantics (including the "" input case).
| // last piece, including empty trailing fields | ||
| size_t len = strlen(cur); | ||
| char* piece = malloc(len + 1); | ||
| memcpy(piece, cur, len); | ||
| piece[len] = '\0'; | ||
| if (count + 1 > cap) { cap *= 2; items = realloc(items, sizeof(Value) * cap); } | ||
| items[count++] = value_str(piece); | ||
| free(piece); |
There was a problem hiding this comment.
The PR title mentions fixing empty-delimiter handling, but this hunk also changes SPLIT semantics by always appending the final field (including empty trailing fields, and changing the result for an empty input string). If that broader behavior change is intentional, consider reflecting it in the PR description/title and/or documenting it in the SPLIT comment/spec so the change is discoverable for users.
No description provided.