Skip to content

gh-116: Fix SPLIT handling of empty delimiter.#117

Merged
python-processing-unit merged 1 commit intogh-101from
main
Apr 1, 2026
Merged

gh-116: Fix SPLIT handling of empty delimiter.#117
python-processing-unit merged 1 commit intogh-101from
main

Conversation

@python-processing-unit
Copy link
Copy Markdown
Owner

No description provided.

Copilot AI review requested due to automatic review settings April 1, 2026 23:05
@python-processing-unit python-processing-unit merged commit b86b599 into gh-101 Apr 1, 2026
6 of 7 checks passed
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the SPLIT builtin to reject an empty string delimiter (preventing undefined/incorrect split behavior) and adjusts the delimiter-based split path to preserve empty trailing fields.

Changes:

  • Add a runtime error when SPLIT is called with an empty delimiter string.
  • Modify delimiter-based splitting to always append the final field (including empty trailing fields) instead of dropping it.
  • Remove the special-case return of an empty tensor for delimiter-based splits that previously produced zero items.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +4991 to +4997
size_t len = strlen(cur);
char* piece = malloc(len + 1);
memcpy(piece, cur, len);
piece[len] = '\0';
if (count + 1 > cap) { cap *= 2; items = realloc(items, sizeof(Value) * cap); }
items[count++] = value_str(piece);
free(piece);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The newly added last-piece handling allocates piece with malloc(len + 1) but never checks for allocation failure before memcpy, which can dereference NULL on OOM. Since value_str() already duplicates its input, you can avoid this allocation entirely by passing cur directly; otherwise add an explicit NULL check and clean up (free(copy), free(items)) before raising the runtime error.

Suggested change
size_t len = strlen(cur);
char* piece = malloc(len + 1);
memcpy(piece, cur, len);
piece[len] = '\0';
if (count + 1 > cap) { cap *= 2; items = realloc(items, sizeof(Value) * cap); }
items[count++] = value_str(piece);
free(piece);
if (count + 1 > cap) { cap *= 2; items = realloc(items, sizeof(Value) * cap); }
items[count++] = value_str(cur);

Copilot uses AI. Check for mistakes.
Comment on lines 4955 to +4992
@@ -4984,16 +4987,15 @@ static Value builtin_split(Interpreter* interp, Value* args, int argc, Expr** ar
free(piece);
cur = found + seplen;
}
// last piece
if (*cur != '\0') {
if (count + 1 > cap) { cap *= 2; items = realloc(items, sizeof(Value) * cap); }
items[count++] = value_str(cur);
}
// last piece, including empty trailing fields
size_t len = strlen(cur);
char* piece = malloc(len + 1);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change introduces new SPLIT behaviors that should be covered by the existing builtin test suite (tests/test2.pre): (1) empty delimiter now raises a runtime error, and (2) empty trailing fields are now preserved (e.g., splitting "a," by "," should include a final empty string). Adding explicit assertions will prevent regressions and clarify intended semantics (including the "" input case).

Copilot uses AI. Check for mistakes.
Comment on lines +4990 to +4997
// last piece, including empty trailing fields
size_t len = strlen(cur);
char* piece = malloc(len + 1);
memcpy(piece, cur, len);
piece[len] = '\0';
if (count + 1 > cap) { cap *= 2; items = realloc(items, sizeof(Value) * cap); }
items[count++] = value_str(piece);
free(piece);
Copy link

Copilot AI Apr 1, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title mentions fixing empty-delimiter handling, but this hunk also changes SPLIT semantics by always appending the final field (including empty trailing fields, and changing the result for an empty input string). If that broader behavior change is intentional, consider reflecting it in the PR description/title and/or documenting it in the SPLIT comment/spec so the change is discoverable for users.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants