Execution character set on Windows - add `execution_windows_acp` and `execution_windows_ocp`?

On Windows, the question of "execution character set" (at least for narrow characters) is complicated by some additional factors:

1. There are _two_ execution character sets in play at runtime:
    - the `OEM Code Page` (`CP_OCP`), as is used for the console
    - the `ANSI Code Page` (`CP_ACP`), as is used for the GUI
2. The behavior of the `mbrto*` and `*tombr` functions is, at least according to the documentation, _inconsistent_:
    - [`mbrtowc`](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/mbrtowc?view=msvc-170) is documented as treating its input as the "current locale" (and you can play games with the `.ACP` and `.OCP` locales, accordingly)
    - [`mbrtoc32`](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/mbrtoc16-mbrtoc323?view=msvc-170), on the other hand, is documented as treating its input as UTF-8 _unconditionally_.

As a result, I'm not sure that there is currently _any_ way that `ztd.text` currently handles the "execution character set" on Windows that provides the expected result under all circumstances:

- `<cuchar>`/`<uchar.h>` is affected by (2), as it uses `mbrtoc32`
- `iconv` is not available
- `cuneicode` only seems to have [three approaches](https://github.com/soasis/cuneicode/blob/bd843d3799320247af4fa056d4f57092eaee9245/source/include/ztd/cuneicode/detail/core_mcharn.hpp#L942-L1013):
    1. [`ztdc_is_execution_encoding_utf8()`](https://github.com/soasis/cuneicode/blob/bd843d3799320247af4fa056d4f57092eaee9245/source/include/ztd/cuneicode/detail/core_mcharn.hpp#L945-L952), which is false (or at least ought to be) unless the system code page has been set to `CP_UTF8`/`65001`
    2. [`mbrtoc32`](https://github.com/soasis/cuneicode/blob/bd843d3799320247af4fa056d4f57092eaee9245/source/include/ztd/cuneicode/detail/core_mcharn.hpp#L954-L1007), which falls to (2) above
    3. [using `reinterpret_cast` to treat the input as UTF-8](https://github.com/soasis/cuneicode/blob/bd843d3799320247af4fa056d4f57092eaee9245/source/include/ztd/cuneicode/detail/core_mcharn.hpp#L1009-L1011), which is certainly not correct.
    
I haven't verified at runtime that (2) actually presents itself, partly because while this documentation is for Visual Studio I'm using Embarcadero C++ Builder (and their standard library is sorely underdocumented and variable by version), and partly because issue (1) is the more pressing (the application I'm working with needs to interact with _both_, as we're currently in the midst of making it UTF-8 native, but need to retain the ability to interact with legacy files that were due to oversights written according to `CP_ACP`, and also need to emit data to the console in certain circumstances).

As a result, is there any chance that flavors of the execution character set could be added for `CP_ACP` and `CP_OCP` - or possibly for Windows `CP_*` values in general?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Execution character set on Windows - add `execution_windows_acp` and `execution_windows_ocp`? #45

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Execution character set on Windows - add execution_windows_acp and execution_windows_ocp? #45

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Execution character set on Windows - add `execution_windows_acp` and `execution_windows_ocp`? #45