-
Notifications
You must be signed in to change notification settings - Fork 213
Fix utf8 cursor col #397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix utf8 cursor col #397
Conversation
When using Harbour with UTF-8 codepage (UTF8EX), the `col()`
function returns incorrect column positions for text containing
wide characters (e.g., CJK characters like 中文, Japanese
characters like 日, Korean characters, etc.).
This fixes cursor mis-alignment in UTF-8 terminals when output contains CJK, Emoji or other multi-column characters.
Key points:
New HB_GTI_WIDECHARWIDTH switch, disabled by default, zero run-time overhead.
Width logic is activated only after hb_gtInfo(HB_GTI_WIDECHARWIDTH, .T.).
Implementation based on public-domain mk_wcwidth; supports narrow(1), wide(2) and zero-width(0) characters.
No binary bloat, no breaking changes—existing applications compile and run unchanged.
Example:
hb_cdpSelect("UTF8EX")
hb_gtInfo(HB_GTI_WIDECHARWIDTH, .T.) && enable wide-char calculation
The cursor now advances by actual display columns, so subsequent prompts are correctly aligned.
|
I'm not able to look at it extensievly right now, but pulling in |
src/rtl/cdpapi.c
Outdated
| #include "hbapi.h" | ||
| #include "hbapierr.h" | ||
| #include "hbapicdp.h" | ||
| #include "hbgtcore.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling GT core api from here doesn't look like a good idea, respecting the setting should be done somehow still in terminal code.
d9bd0ae to
8331ff0
Compare
The fix add the `hb_cdpUTF8CharWidth` function in `cdpapi.c` to check
the GT driver's `fWideCharWidth` flag. When enabled, it uses the `mk_wcwidth`
function from the public domain implementation for accurate Unicode TR11
width calculation. When disabled, it returns the default width of 1 for
backward compatibility.
- Add HB_GTI_WIDECHARWIDTH switch, disabled by default for zero overhead
- Only activate width calculation when user calls
hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.)
- Based on public-domain mk_wcwidth; supports narrow(1)/wide(2)/zero(0)
- 100% backward compatible, no binary bloat
- Fixes cursor mis-alignment in UTF-8 terminals with CJK/Emoji
Usage:
hb_cdpSelect("UTF8EX")
hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.) && enable
8331ff0 to
077d369
Compare
|
Thank you for your patient review and guidance—I’ve revised the code accordingly. I last used Clipper over thirty years ago and then left the IT industry. With the help of AI, I’m now trying to fix UTF-8 character-display issues in CJK terminals, and I hope the solution meets Harbour’s standards. |
src/codepage/mk_wcwidth.h
Outdated
|
|
||
| /* Type aliases: preserve original API */ | ||
| #define wchar_t HB_WCHAR | ||
| #define size_t HB_SIZE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering we need this code to be pulled in, I think the types should be replaced directly in the source, not using a macro. Also proper placing of header file is under include/mk_mcwidth.h plus it should be renamed to something like hbgtwide.h. In it's current form it more likely belongs to a terminal subsystem, but the char tables in final form could be fit in CODEPAGE modules.
All the macro definitions should moved to HB_ namespace, like COMBINING_WIDTH -> HB_COMBINING_WIDTH or even better a reverted scheme like HB_CHARWIDTH_COMBINING that better matches a larger project.
Or HB_WCWIDTH_* if someone else wants to jump what naming scheme fits the best?
Any exported functions also moved to hb_namespace too:
mk_wcwidth -> hb_wcwidth
Still thinking about all implications, but it's not the perfect timing for me to have the possibility to accommodate all the glyph knowledge.
Your original contribution also included support for emojis, where as far i have seen in the wild, width of an emoji character may be individual to the font face used. Therefore the static implementation may not be the best fit for all the needs, finally you will need additional hooks for some kind of dynamic tables.
New Files
include/hbgtwide.h – public header for wide-character width calculation
Renamed / Refactored
src/codepage/mk_wcwidth.c → src/codepage/hb_wcwidth.c (re-implemented with Harbour types, no macros)
Updated Makefiles & References
src/codepage/Makefile – source list changed from mk_wcwidth.c to hb_wcwidth.c
src/rtl/hbgtcore.c – include path switched to hbgtwide.h
src/rtl/Makefile – adjusted -I path (../codepage → ../include)
Review Comments Addressed
Source-level types instead of macros
Uses Harbour types: HB_WCHAR32, HB_SIZE, HB_WCHAR.
Header placed under include/ and renamed to hbgtwide.h
Public include now matches project convention.
HB_ namespace for all exported symbols
mk_wcwidth() → hb_wcwidth()
Helper tables/functions already prefixed (hb_wcswidth, hb_wcswidth_cjk).
All widths returned as literal 0 / 1 / 2; no #define used.
Character tables remain inside CODEPAGE modules
combining[] and wide[] static tables live in src/codepage/hb_wcwidth.c.
Deferred for Future Work
Emoji support
Static Unicode TR-11 rules are used; emoji widths can vary by font—dynamic hooks can be added later.
Dynamic width tables
Maintainer noted potential need for runtime-replaceable tables; current code keeps static arrays but leaves the function-pointer path open.
Full glyph knowledge
As reviewer stated, “perfect timing” for complete glyph handling is not now; this patch provides the minimal, working foundation.
Merge branch 'fix-utf8-cursor-col' of github.com:woodhead2019/harbour into fix-utf8-cursor-col
Signed-off-by: woodhead2019 <woodhead2019@users.noreply.github.com>
ede39d4 to
36eb8f2
Compare
|
Summary of Changes Renamed / Refactored Updated Makefiles & References Review Comments Addressed Header placed under include/ and renamed to hbgtwide.h HB_ namespace for all exported symbols Character tables remain inside CODEPAGE modules Deferred for Future Work Dynamic width tables Full glyph knowledge |
Fixes cursor mis-alignment in UTF-8 terminals with CJK/Emoji
gt: optional Unicode wide-char width calculation (HB_GTI_WIDECHARWIDTH)
hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.)
Usage:
hb_cdpSelect("UTF8EX")
hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.) && enable