Skip to content

gh-130273: Fix traceback color output with unicode characters#142529

Open
grayjk wants to merge 5 commits intopython:mainfrom
grayjk:issue-130273
Open

gh-130273: Fix traceback color output with unicode characters#142529
grayjk wants to merge 5 commits intopython:mainfrom
grayjk:issue-130273

Conversation

@grayjk
Copy link
Copy Markdown
Contributor

@grayjk grayjk commented Dec 10, 2025

Account for the display width of unicode characters so that colors and underlining in traceback output is correct

Closes #130273

@python-cla-bot
Copy link
Copy Markdown

python-cla-bot bot commented Dec 10, 2025

All commit authors signed the Contributor License Agreement.

CLA signed

@vstinner
Copy link
Copy Markdown
Member

@serhiy-storchaka: Here is a PR about text width and Unicode characters :-)

@grayjk
Copy link
Copy Markdown
Contributor Author

grayjk commented Jan 28, 2026

updated to use @serhiy-storchaka's recently added unicodedata.iter_graphemes

@grayjk
Copy link
Copy Markdown
Contributor Author

grayjk commented Feb 18, 2026

@pablogsal @hauntsaninja as recent reviewers of traceback.py, would you mind taking look

@StanFromIreland
Copy link
Copy Markdown
Member

There are conflicts, please fix them.

@grayjk
Copy link
Copy Markdown
Contributor Author

grayjk commented Feb 18, 2026

@StanFromIreland conflicts resolved

Lib/traceback.py Outdated
2 if unicodedata.east_asian_width(char) in _WIDE_CHAR_SPECIFIERS else 1
for char in line[:offset]
)
from _pyrepl.utils import wlen
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not depend on _pyrepl in the traceback module. I would prefer to move wlen() here, and modify _pyrepl.utils to get it from traceback.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've moved wlen/str_width in commit 467656e and made them private (prefixed with _) to avoid putting them in traceback.__all__ but mypy isn't happy about that. Should I make them public?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are # type: ignore comments in this case okay?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

or alternatively I could move wlen to a new support file with a name prefixed with _

@StanFromIreland
Copy link
Copy Markdown
Member

There are conflicts again I'm afraid, and mypy isn't happy either.

return 2


ANSI_ESCAPE_SEQUENCE = re.compile(r"\x1b\[[ -@]*[A-~]")
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should also be private.

Comment on lines +985 to +987
import unicodedata
if ord(c) < 128:
return 1
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to import unicodedata for ASCII characters:

Suggested change
import unicodedata
if ord(c) < 128:
return 1
if ord(c) < 128:
return 1
import unicodedata

Comment on lines +974 to +980
def _zip_display_width(line, carets):
import unicodedata
carets = iter(carets)
for char in unicodedata.iter_graphemes(line):
char = str(char)
char_width = _display_width(char)
yield char, "".join(itertools.islice(carets, char_width))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible to avoid the heavy unicodedata import for ASCII line?

Suggested change
def _zip_display_width(line, carets):
import unicodedata
carets = iter(carets)
for char in unicodedata.iter_graphemes(line):
char = str(char)
char_width = _display_width(char)
yield char, "".join(itertools.islice(carets, char_width))
def _zip_display_width(line, carets):
carets = iter(carets)
if line.isascii():
for char in line:
yield char, next(carets, "")
else:
import unicodedata
for char in unicodedata.iter_graphemes(line):
char = str(char)
char_width = _display_width(char)
yield char, "".join(itertools.islice(carets, char_width))

I'm not sure that my code is correct :-)

@@ -0,0 +1 @@
Fix traceback color output with unicode characters
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Fix traceback color output with unicode characters
Fix traceback color output with Unicode characters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Traceback colors are shifted when the line contains wide unicode characters

3 participants