-
-
Notifications
You must be signed in to change notification settings - Fork 28
Description
On Windows, the question of "execution character set" (at least for narrow characters) is complicated by some additional factors:
- There are two execution character sets in play at runtime:
- the
OEM Code Page(CP_OCP), as is used for the console - the
ANSI Code Page(CP_ACP), as is used for the GUI
- the
- The behavior of the
mbrto*and*tombrfunctions is, at least according to the documentation, inconsistent:
As a result, I'm not sure that there is currently any way that ztd.text currently handles the "execution character set" on Windows that provides the expected result under all circumstances:
<cuchar>/<uchar.h>is affected by (2), as it usesmbrtoc32iconvis not availablecuneicodeonly seems to have three approaches:ztdc_is_execution_encoding_utf8(), which is false (or at least ought to be) unless the system code page has been set toCP_UTF8/65001mbrtoc32, which falls to (2) above- using
reinterpret_castto treat the input as UTF-8, which is certainly not correct.
I haven't verified at runtime that (2) actually presents itself, partly because while this documentation is for Visual Studio I'm using Embarcadero C++ Builder (and their standard library is sorely underdocumented and variable by version), and partly because issue (1) is the more pressing (the application I'm working with needs to interact with both, as we're currently in the midst of making it UTF-8 native, but need to retain the ability to interact with legacy files that were due to oversights written according to CP_ACP, and also need to emit data to the console in certain circumstances).
As a result, is there any chance that flavors of the execution character set could be added for CP_ACP and CP_OCP - or possibly for Windows CP_* values in general?