Update: it appears Liberal Crime Squad already has code to convert Code Page 437 characters to Unicode's UTF-8 encoding, as part of the translateGraphicsChar() and related functions; however right now translateGraphicsChar() is only used on special characters in the Curses movies (i.e., new cable news anchor, failed partial birth abortion on TV, show about lifestyles of the rich and famous, video of black man getting beaten by LAPD just like Rodney King, etc.) as well as special characters in newspaper fonts and newspaper graphics (like the graphic about the wacky CEO who owns slaves in another country or doesn't know what his company does or whatever, or the graphic about people poisoned by genetically modified food, etc.). The code to output that in Unicode seems to already be fully implemented.
But code to output regular strings and characters in Unicode doesn't seem to be implemented that way. I guess the solution is, just as there is an addchar() function to replace addch() because of addch()'s mishandling of characters (which now that I think of it is probably because C++ usually uses signed characters from -128 to 127 instead of 0 to 255, and then addch() takes an int as input, so extended characters, instead of being in the 128-255 range they ought to be in, are mistakenly in the -128 to -1 range, so instead of the addchar() function we COULD have just typecast the chars as "unsigned char"), well just like we have the addchar() wrapper function to replace addch() and mvaddchar() to replace mvaddch(), we need ones to replace addstr() and mvaddstr(). They would loop thru each character in a string and output it, using translateGraphicsChar(). And addchar() could be changed to also use translateGraphicsChar(). So then ALL text output in the entire program would be consistently filtered thru translateGraphicsChar().
The other thing to do would be to finish implementing the conversion tables, that have a lot of characters commented out... the UTF-8 conversion table actually has all the characters listed but many are commented out. And it seems translateGraphicsChar() currently, for characters 0-31, outputs UTF-8 characters corresponding to those characters interpreted as control characters rather than interpreted as display characters, which is wrong... it should output everything as a display character. The comments say that the conversion tables ought to include corresponding standard ASCII characters for the ASCII hack too, for every character I uncomment and enable. So, I'll need to enable all 256 characters in both conversion tables... the UTF-8 one has proper info already there for all of them, but the ASCII hack one needs to have someone come up with corresponding characters for each one. That's pretty simple for a lot of extended characters. The standard ASCII equivalent of a letter with an accute accent/grave accent/umlaut/circumflex over it is simply that letter WITHOUT any accent/umlaut/etc. over it. There's other characters like various symbols, block characters, line-drawing characters, etc. that don't really have corresponding ASCII characters... the standard ASCII hack way to do line-drawing characters is "-" for a horizontal line, "|" for a vertical line, and "+" for anything else. And most Greek letters have regular letters they look like... and the German double S looks similar to a capital B, and the Japanese yen currency looks similar to a Y, etc... those are all in Code Page 437... I plan on implementing entire conversion tables for it, for both standard ASCII and for UTF-8, and then getting translateGraphicsChar() used on all outputted characters and strings, since it is the function that takes care of this in the current code, it just isn't used widely enough.
But obviously this analysis here is slightly simplistic and there might be some bumps along the way to fully implementing this, but I'm sure it's doable... someone else recently did a fix to improve extended character display that involves translateGraphicsChar() on Linux... that fix actually deals with how ncurses and PDcurses display colors differently... basically PDcurses and ncurses have the colors red and blue swapped... and then colors that have red but not blue or blue but not red get swapped too... like yellow (red+green) is swapped with cyan (blue+green). This is because one of them is RGB (red bit, green bit, blue bit) and the other is BGR (blue bit, green bit, red bit)... basically the same problem as the difference between big-endian and little-endian CPU architectures for accessing bits. But that has been fixed now, which is very nice, I forget who fixed it, probably either nickdumas or blomkvist, I could check the changelogs to see but it's not that important, both of them have made plenty of good contributions lately, regardless.
So right now addchar() is a wrapper that replaces addch() calls with addstr() calls... once I finish making translateGraphicsChar() get called on all characters including ones in strings, things will be the other way around, and everything will use addch() (addch() takes ints for each character, thus supporting multibyte character encodings like UTF-8, whereas addstr() takes chars for each character, with each 8-bit char assumed to represent a single character). So instead of replacing addch() calls with addstr() calls we'll replace addstr() calls with addch() calls, since it seems addch() is what supports Unicode best... I was just having some trouble with addch() due to the typecasting thing, because if you call it on a char it'll think of extended characters as being in the -128 to -1 range instead of the 128 to 255 range, since you need to typecast it to an unsigned type FIRST before calling addch()... this is not any problem with addch(), I just misunderestimated (to use a word George W. Bush coined) that function and thought something was wrong with it, when actually the only thing that was wrong was my failure to properly typecast the input to it as unsigned. Ah and a SPECIAL oddity of C++ is, a regular char without the signed or unsigned word before it is typically a two's-complement signed integer between -128 and 127, but a signed char declared as a signed char is typically a one's complement signed integer between -127 and 127... one's complement signed numbers are terrible, I don't know why they exist, signed numbers should always be two's complement, whoever thought up in the specification that they should be one's complement was either some wacko with terrible ideas, or was forced to do this to maintain C++'s backwards compatibility with the original vanilla C language from the 1970s back in the Dark Ages of computing. I am thinking the reason for this absurdity is backwards compatibility with code from the 1970s so it would still compile correctly in C++ without modifications. Apparently this absurdity is finally fixed, not in C++11, but in C++14... in C++14 finally signed char's are two's complement instead of one's complement, according to the fancy chart at this URL:
http://en.cppreference.com/w/cpp/language/types. They should've had it been that way in the original C++98 standard, I dunno what those dudes were smoking when they came up with that one's complement nonsense. One's complement is an epic fail compared to two's complement... more complicated to implement, you have one less number, and you have TWO zeroes (positive zero and negative zero) which are distinct, separate numbers. Utterly ridiculous. There can be only one zero, to think otherwise is madness... although apparently the ISO standard implementation of floating-point numbers also has both positive and negative zero... how did THAT get approved? And then, how does the == operator work... does the expression (+0 == -0) return true or false when positive and negative zero are implemented as different numbers? It seems like that would vary depending on implementation... in other words a recipe for disaster. And then stuff like "int c = some number; if(c) {do this} else {do that}" might not work correctly... conversion from integer to boolean types should return true for nonzero numbers and false for zero, but if there's more than one zero, than does it return false for BOTH zeroes or just ONE of them? Again, something that would vary depending on implementation and another recipe for disaster. That's why one's complement is the most terrible thing ever, and the fact that the C++ standard type "unsigned char" uses it up until the C++14 specification is also terrible. Now in C/C++, C-strings are "null"-terminated (actually terminated with a zero). But if the unsigned char type has both positive and negative zero, there would be TWO possible "nulls" to terminate with... but maybe one of them might work and the other might not... and if you use the wrong one you might get a string that goes on longer than the space you allocated for it and leads to an array index out of bounds and crashes the program or causes undefined behavior. So things go from terrible to super-ultra-mega-terrible. The good news, if you can call it that, is that a char array or char* pointer without "unsigned" before it is implemented (usually) as a two's complement signed number, and occasionally some compilers implement it as an unsigned number, since the language standard doesn't specify which to use for the standard "char" type and leaves it up to the compiler implementation to decide (another epic failure in the original C++98 language standard). So no wonder so many programs written in C/C++ might work with one compiler on one platform but not another compiler on another platform, with such shoddiness in the standards themselves. That kinda stuff is why C++03 was necessary (it didn't change the actual language for programmers, just specified standards for compiler behavior to be more uniform). At least they finally fixed it this year with the C++14 standard, although it'll be years before most C++ software uses that, most C++ software still uses C++98/03 rather than C++11, including Liberal Crime Squad... there isn't a single compiler out there that implements C++14 even remotely fully, although at least with C++11, GNU's GCC compiler has implemented all of the core features of it, and Microsoft Visual C++ has implemented more than half of the core features in the latest version (kinda slow given that standard is from 3 years ago, you'd think Microsoft with its billions of dollars and thousands of talented genius programmers from top universities would be able to get the full C++11 specification implemented faster in their compiler, but the reverse is true). Maybe Microsoft has some legitimate reason for it (I guess they like to keep their compiler very optimized and don't want bugs in it, and while they could probably easily implement the entire C++11 specification in a shoddy way, maybe they are taking the time to do it as well as possible, or something like that... just kidding, I think they are just too lazy to implement it at a reasonable speed since they have other priorities, like fixing the things people don't like about Windows so that everyone will want to use Windows again, even on tablets, rather than iOS or Android, as well as diversifying what they make money on so they're less dependent on Windows).
But getting back on topic, curses functions that deal with single characters use signed integers rather than chars, which allows for storing multiple bytes inside that integer, and allows support for Unicode in which characters often take up more than 1 byte, and this is the solution which will allow for extended characters to work on Linux and Mac OS X, combined with finishing the conversion tables used by translateGraphicsChar(). So I oughta be able to get this whole thing fixed up just fine just by having translateGraphicsChar() do conversions on all 256 characters and having it used on every character output to the screen by the entire program, thru the use of wrapper functions. Those wrapper functions also will benefit the program by making it less curses-specific and easier to integrate with other APIs such as libtcod, etc. The single functions enter_name() and setcolor() are already excellent wrapper functions that would be easy to add support for other non-curses APIs like libtcod to. Then debug defines and such could switch between compiling for curses and compiling for libtcod, and it would support both. The libtcod version would look a bit nicer, while the curses version would continue to retain compatibility with traditional consoles (including DOS, which can run Win32 versions of Liberal Crime Squad or other Win32 Console Applications using the HX DOS Extender... and this DOS compatibility extends to DOS emulators like DOSBox, allowing the game to easily run on any platform DOSBox can run on... this includes DOSBox Turbo for Android, and allows you to play the Win32 version of Liberal Crime Squad, unmodified, on Android thru DOSBox Turbo + HX DOS Extender.) There's also a native Android version of Liberal Crime Squad too, which is a port written in Java, which has a different user interface, but retaining curses console support allows the Windows version to work on DOS or DOSBox and be emulated on anything. Or it can be emulated using Wine on other platforms. Or directly compiled for other platforms... it's good for people to have as many options as possible how to compile/build/run/port this game... on Linux or Mac OS X you could have the native-compiled POSIX-compliant version, the Win32 version running in Wine, and Win32 version running in DOSBox with help from HX DOS Extender all running side-by-side, and see whether there are differences or bugs that appear in one but not the other, as one example of how this is actually useful for programming/debugging the game. This would, for example, be useful in debugging problems with display of extended characters. And if a libtcod version is ever made, it could run side-by-side with the curses version to make sure they both work the same (other than a few minor libtcod-specific enhancements that would probably be implemented to make the game look nicer, which you wouldn't see in the curses version). It'd be sorta like how NetHack has multiple interfaces, both text and graphical (although obviously we can't use any NetHack code since the NetHack GPL and GNU GPL are incompatible in both directions).
But yeah, we can do this... already halfway there, with it implemented for curses movies, newspaper headlines, and newspaper graphics.