Skip to content

Conversation

@woodhead2019
Copy link

Fixes cursor mis-alignment in UTF-8 terminals with CJK/Emoji
gt: optional Unicode wide-char width calculation (HB_GTI_WIDECHARWIDTH)

  • Add HB_GTI_WIDECHARWIDTH switch, disabled by default for zero overhead
  • Only activate width calculation when user calls
    hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.)
  • Based on public-domain mk_wcwidth; supports narrow(1)/wide(2)/zero(0)
  • 100% backward compatible, no binary bloat

Usage:
hb_cdpSelect("UTF8EX")
hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.) && enable

When using Harbour with UTF-8 codepage (UTF8EX), the `col()`
function returns incorrect column positions for text containing
wide characters (e.g., CJK characters like 中文, Japanese
characters like 日, Korean characters, etc.).

This fixes cursor mis-alignment in UTF-8 terminals when output contains CJK, Emoji or other multi-column characters.

Key points:
    New HB_GTI_WIDECHARWIDTH switch, disabled by default, zero run-time overhead.
    Width logic is activated only after hb_gtInfo(HB_GTI_WIDECHARWIDTH, .T.).
    Implementation based on public-domain mk_wcwidth; supports narrow(1), wide(2) and zero-width(0) characters.
    No binary bloat, no breaking changes—existing applications compile and run unchanged.

Example:
hb_cdpSelect("UTF8EX")
hb_gtInfo(HB_GTI_WIDECHARWIDTH, .T.)  && enable wide-char calculation
The cursor now advances by actual display columns, so subsequent prompts are correctly aligned.
@alcz
Copy link
Contributor

alcz commented Jan 5, 2026

I'm not able to look at it extensievly right now, but pulling in mk_mcwidth that is not ANSI C and would make compatibility problems. <wchar.h> and wchar_t are around C95, so the code would have to be ported.

src/rtl/cdpapi.c Outdated
#include "hbapi.h"
#include "hbapierr.h"
#include "hbapicdp.h"
#include "hbgtcore.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling GT core api from here doesn't look like a good idea, respecting the setting should be done somehow still in terminal code.

@woodhead2019 woodhead2019 force-pushed the fix-utf8-cursor-col branch 2 times, most recently from d9bd0ae to 8331ff0 Compare January 6, 2026 14:22
The fix add the `hb_cdpUTF8CharWidth` function in `cdpapi.c` to check
the GT driver's `fWideCharWidth` flag. When enabled, it uses the `mk_wcwidth`
function from the public domain implementation for accurate Unicode TR11
width calculation. When disabled, it returns the default width of 1 for
backward compatibility.

- Add HB_GTI_WIDECHARWIDTH switch, disabled by default for zero overhead
- Only activate width calculation when user calls
  hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.)
- Based on public-domain mk_wcwidth; supports narrow(1)/wide(2)/zero(0)
- 100% backward compatible, no binary bloat
- Fixes cursor mis-alignment in UTF-8 terminals with CJK/Emoji

Usage:
  hb_cdpSelect("UTF8EX")
  hb_gtInfo(HB_GTI_WIDECHARWIDTH,.T.)     && enable
@woodhead2019
Copy link
Author

Thank you for your patient review and guidance—I’ve revised the code accordingly. I last used Clipper over thirty years ago and then left the IT industry. With the help of AI, I’m now trying to fix UTF-8 character-display issues in CJK terminals, and I hope the solution meets Harbour’s standards.

@woodhead2019 woodhead2019 requested a review from alcz January 13, 2026 12:52

/* Type aliases: preserve original API */
#define wchar_t HB_WCHAR
#define size_t HB_SIZE
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Considering we need this code to be pulled in, I think the types should be replaced directly in the source, not using a macro. Also proper placing of header file is under include/mk_mcwidth.h plus it should be renamed to something like hbgtwide.h. In it's current form it more likely belongs to a terminal subsystem, but the char tables in final form could be fit in CODEPAGE modules.

All the macro definitions should moved to HB_ namespace, like COMBINING_WIDTH -> HB_COMBINING_WIDTH or even better a reverted scheme like HB_CHARWIDTH_COMBINING that better matches a larger project.

Or HB_WCWIDTH_* if someone else wants to jump what naming scheme fits the best?

Any exported functions also moved to hb_namespace too:
mk_wcwidth -> hb_wcwidth

Still thinking about all implications, but it's not the perfect timing for me to have the possibility to accommodate all the glyph knowledge.

Your original contribution also included support for emojis, where as far i have seen in the wild, width of an emoji character may be individual to the font face used. Therefore the static implementation may not be the best fit for all the needs, finally you will need additional hooks for some kind of dynamic tables.

woodhead2019 and others added 3 commits January 18, 2026 17:34
New Files

    include/hbgtwide.h – public header for wide-character width calculation

Renamed / Refactored

    src/codepage/mk_wcwidth.c → src/codepage/hb_wcwidth.c (re-implemented with Harbour types, no macros)

Updated Makefiles & References

    src/codepage/Makefile – source list changed from mk_wcwidth.c to hb_wcwidth.c
    src/rtl/hbgtcore.c – include path switched to hbgtwide.h
    src/rtl/Makefile – adjusted -I path (../codepage → ../include)

Review Comments Addressed
 Source-level types instead of macros
    Uses Harbour types: HB_WCHAR32, HB_SIZE, HB_WCHAR.

 Header placed under include/ and renamed to hbgtwide.h

    Public include now matches project convention.

 HB_ namespace for all exported symbols

    mk_wcwidth() → hb_wcwidth()
    Helper tables/functions already prefixed (hb_wcswidth, hb_wcswidth_cjk).
    All widths returned as literal 0 / 1 / 2; no #define used.

 Character tables remain inside CODEPAGE modules

    combining[] and wide[] static tables live in src/codepage/hb_wcwidth.c.

Deferred for Future Work
Emoji support
Static Unicode TR-11 rules are used; emoji widths can vary by font—dynamic hooks can be added later.

Dynamic width tables
Maintainer noted potential need for runtime-replaceable tables; current code keeps static arrays but leaves the function-pointer path open.

Full glyph knowledge
As reviewer stated, “perfect timing” for complete glyph handling is not now; this patch provides the minimal, working foundation.

Merge branch 'fix-utf8-cursor-col' of github.com:woodhead2019/harbour into fix-utf8-cursor-col
Signed-off-by: woodhead2019 <woodhead2019@users.noreply.github.com>
@woodhead2019
Copy link
Author

woodhead2019 commented Jan 20, 2026

Summary of Changes
New Files
include/hbgtwide.h – public header for wide-character width calculation

Renamed / Refactored
src/codepage/mk_wcwidth.c → src/codepage/hb_wcwidth.c (re-implemented with Harbour types, no macros)

Updated Makefiles & References
src/codepage/Makefile – source list changed from mk_wcwidth.c to hb_wcwidth.c
src/rtl/hbgtcore.c – include path switched to hbgtwide.h
src/rtl/Makefile – adjusted -I path (../codepage → ../include)

Review Comments Addressed
Source-level types instead of macros
Uses Harbour types: HB_WCHAR32, HB_SIZE, HB_WCHAR.

Header placed under include/ and renamed to hbgtwide.h
Public include now matches project convention.

HB_ namespace for all exported symbols
mk_wcwidth() → hb_wcwidth()
Helper tables/functions already prefixed (hb_wcswidth, hb_wcswidth_cjk).
All widths returned as literal 0 / 1 / 2; no #define used.

Character tables remain inside CODEPAGE modules
combining[] and wide[] static tables live in src/codepage/hb_wcwidth.c.

Deferred for Future Work
Emoji support
Static Unicode TR-11 rules are used; emoji widths can vary by font—dynamic hooks can be added later.

Dynamic width tables
Maintainer noted potential need for runtime-replaceable tables; current code keeps static arrays but leaves the function-pointer path open.

Full glyph knowledge
As reviewer stated, “perfect timing” for complete glyph handling is not now; this patch provides the minimal, working foundation.

@woodhead2019 woodhead2019 requested a review from alcz January 22, 2026 10:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants