I am wondering why, with the vast tracts of codepoint space available up in the wilds of Plane 15 and 16, why does everyone (MUFI, SMuFL, ConScript, SIL, etc) want to crowd down into the tiny PUA island carved out in the BMP?
I (generally) get the historical argument, but PUA-A and -B have been around since Unicode v2.0 (23+ years?)
Is there a reason / rationale here? Is there a general carving-up of the PUA that I'm missing?
What causes well-tempered typographers to avoid saddling up the font tools and venturing to the wide open spaces of the higher Plains?
0
Comments
IIRC, MUFI’s PUA usage was to some extent shaped by the PUA in TITUS Cyberbit, and that’s been around since back when the Flintstones still peacefully coexisted with the dinosaurs.
See this earlier discussion.
Picks a second plot of code-point territory in the upper planes (the "upper PUA") in addition to your PUA range in the BMP (the "lower PUA") and multi-map your PUA glyphs onto both your Upper and Lower PUA ranges.
The Lower PUA still works and apps can migrate to using the Upper PUA over time. You can even incentivize the use of the Upper PUA by adding new code points in your range only in the Upper PUA.
Is this workable?? I'm likely missing some significant issues here ... but ...
The way is stands, the lower PUA is essentially the dreaded code-page scenario ... swapping contexts (fonts) to get the right set of characters that are overloading the code points.
“apps can migrate”: Urm, no. General-purpose apps like Word have no knowledge or understanding of these PUA assignments, nor should they. Nor even could they for the most part, since it varies by font. (Unless they hard-code codepoint meaning on a per-font basis, and that is so not going to ever happen!)
What would need to migrate is the encoding of existing documents, in a large variety of apps. That is hard to do.
As long as the “lower PUA” still works, even any specialty app that really understands this stuff has little incentive to change how it treats new docs, either. Say that the specialty app is (for example) a music composition app. Why should it even care that Klingon overlaps with its PUA usage? If the fonts double-encode stuff, what reason would that app have to change its usage of those code points? Not saying you couldn’t convince some apps, perhaps, out of “principle,” but the functional rationale for them is... slim.
However, as Jacob Casal asked ...
Is there a technical difference between the PUA and PUA-A and -B?
e.g. Are there significant current apps that can handle the PUA, but not PUA-A and -B? ... or other dastardly daunting scenarios pushing us back down into the BMP?
So, I agree 100% with Khaled, he has it right. For those who don’t know all the lingo, I am going to restate what he said"
The “BMP” is the Basic Multi-lingual Plane, the antique section of Unicode that can be represented as a single double-byte code point. It is all in the first 64K characters of Unicode. There are additional “planes” (64K sections), and apps have to be just a tiny bit smarter to deal with them. Not much smarter, but just a little.
Most of the stuff people actually use day-to-day is in the BMP, including the original PUA (“Private Use Area”).
BUT, there are an increasingly large quantity of emoji going outside the BMP. The reason is simple: almost everything new being added to Unicode goes outside the BMP, because the BMP is quite full. It is just that most of the new stuff being added is pretty obscure.
Emoji is the exception to that. They are outside the BMP! So suddenly apps have started caring about extending their Unicode support beyond the BMP.
This helps emoji work, but also helps enable a ton of other things! There are all sorts of things beyond the BMP: relatively new writing systems (Adlam, from west Africa); super obscure (Warang Citi, Mro, Duployan, Minoan Linear A, Phaistos Disc); extensions of rare, obsolete or historic characters for better known writing systems that are mostly in the BMP (such as for Arabic, Sinhala, Mongolian); or in a few cases, languages we have all heard of but hardly anybody actually uses (Egyptian Hieroglyphs, Cuneiform). So, anything that happens to be a real latecomer to Unicode.
I'll sprinkle some characters at the nosebleed end of PUA-B and try it with some common apps on Win7x64 to see what breaks ...
Most of the MS apps (Word, Excel, PowerPoint, WordPad) and Acrobat work as expected and inter-operate nicely. Windows Explorer shows TOFU, but handles cut and paste correctly.
Windows Character Map fails.
Corel Draw works internally, but does not inter-operate (cut/paste) between any other apps.
** The Details
The OS is Microsoft Windows 7 Pro 6.1.7601 SP1 x64
All Microsoft ("MS") Office apps are from the suite Office 365 MSO (16.0.11601.20130) 32-bit.
Versions of other applications (the circa dates are based on the copyright notice):
Windows Character Map fails.
MS Word works.
Adobe Acrobat works.
Corel Draw fails cut/paste between all MS Office apps, but works internally.
MS Excel works.
MS WordPad works.
MS PowerPoint works.