netcdf-c/ncdump/tst_cygutf8.c

56 lines
1.6 KiB
C
Raw Normal View History

Improve UTF8 Support On Windows re: Issue https://github.com/Unidata/netcdf-c/issues/2190 The primary purpose of this PR is to improve the utf8 support for windows. This is persuant to a change in Windows that supports utf8 natively (almost). The almost means that it is still utf16 internally and the set of characters representable by utf8 is larger than those representable by utf16. This leaves open the question in the Issue about handling the Windows 1252 character set. This required the following changes: 1. Test the Windows build and major version in order to see if native utf8 is supported. 2. If native utf8 is supported, Modify dpathmgr.c to call the 8-bit version of the windows fopen() and open() functions. 3. In support of this, programs that use XGetOpt (Windows versions) need to get the command line as utf8 and then parse to arc+argv as utf8. This requires using a homegrown command line parser named XCommandLineToArgvA. 4. Add a utility program called "acpget" that prints out the current Windows code page and locale. Additionally, some technical debt was cleaned up as follows: 1. Unify all the places which attempt to read all or a part of a file into the dutil.c#NC_readfile code. 2. Similary unify all the code that creates temp files into dutil.c#NC_mktmp code. 3. Convert almost all remaining calls to fopen() and open() to NCfopen() and NCopen3(). This is to ensure that path management is used consistently. This touches a number of files. 4. extern->EXTERNL as needed to get it to work under Windows.
2022-02-09 11:53:30 +08:00
/*
https://www.cygwin.com/cygwin-ug-net/using-specialnames.html
https://docs.microsoft.com/en-us/archive/msdn-magazine/2016/september/c-unicode-encoding-conversions-with-stl-strings-and-win32-apis
*/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#ifdef _WIN32
#include <windows.h>
#endif
#include "netcdf.h"
#include "ncpathmgr.h"
static const unsigned char name1[] = {
'x','u','t','f','8','_',
'\xe6', '\xb5', '\xb7',
'\0'
};
static unsigned char name2[] = {
'x','u','t','f','8','_',
0xCE, 0x9A, /* GREEK CAPITAL LETTER KAPPA : 2-bytes utf8 */
0xCE, 0xB1, /* GREEK SMALL LETTER LAMBDA : 2-bytes utf8 */
0xCE, 0xBB, /* GREEK SMALL LETTER ALPHA : 2-bytes utf8 */
0xCE, 0xB7, /* GREEK SMALL LETTER ETA : 2-bytes utf8 */
0xCE, 0xBC, /* GREEK SMALL LETTER MU : 2-bytes utf8 */
0xCE, 0xAD, /* GREEK SMALL LETTER EPSILON WITH TONOS
: 2-bytes utf8 */
0xCF, 0x81, /* GREEK SMALL LETTER RHO : 2-bytes utf8 */
0xCE, 0xB1, /* GREEK SMALL LETTER ALPHA : 2-bytes utf8 */
0x00
};
static char* name3 = "xutf8_사람/접는사람";
/* This is CP_1252 */
//static char* name4 = "xutf8_Å";
static char name4[8] = {'x','u','t','f','8','_',0XC5,0x00} ;
int
main()
{
FILE* f;
f = NCfopen((char*)name1,"w");
if (f) fclose(f);
f = NCfopen((char*)name2,"w");
if (f) fclose(f);
f = NCfopen((char*)name3,"w");
if (f) fclose(f);
printf("|name4|=%u\n",(unsigned)strlen(name4));
f = NCfopen((char*)name4,"w");
if (f) fclose(f);
return 0;
}