PUA Overcrowding

I am wondering why, with the vast tracts of codepoint space available up in the wilds of Plane 15 and 16, why does everyone (MUFI, SMuFL, ConScript, SIL, etc) want to crowd down into the tiny PUA island carved out in the BMP?

I (generally) get the historical argument, but PUA-A and -B have been around since Unicode v2.0 (23+ years?)

Is there a reason / rationale here? Is there a general carving-up of the PUA that I'm missing?

What causes well-tempered typographers to avoid saddling up the font tools and venturing to the wide open spaces of the higher Plains? 


  • André G. IsaakAndré G. Isaak Posts: 324
    edited May 4
    I don’t know anything about ConScripts or SMuFL, but SIL and and MUFI have all been around for sometime. I would assume that the historical argument was far more relevant when they first begun, and relocating would cause significant hassles once they'd already carved out their own, often conflicting, space in the BMP PUA.

    IIRC, MUFI’s PUA usage was to some extent shaped by the PUA in TITUS Cyberbit, and that’s been around since back when the Flintstones still peacefully coexisted with the dinosaurs.
  • AbrahamLeeAbrahamLee Posts: 236
    edited May 4
    For those unaware, SMuFL stands for Standard Music Font Layout. It is an attempt to do for the music notation software community what Unicode did for the multilingual typesetting community, but still comply with Unicode since most computer software expects that. However, since Unicode only defines a relatively small glyph set for music, SMuFL exists almost entirely in the PUA, containing several thousand unique music related glyphs that wouldn’t otherwise require an opentype feature to access them.
  • Mark SimonsonMark Simonson Posts: 1,074
    No one can stake a claim in the PUA. Anyone using it (or "carving it up"), even for non-Unicode standards, must do so with the understanding that it can be used by other fonts for other things.
  • Jacob CasalJacob Casal Posts: 47
    Is there a technical difference between the PUA and PUA-A and -B? I suppose other than when one changes fonts the supplementary PUA glyphs wouldn’t change to said other font’s PUA encodings?
  • Thomas PhinneyThomas Phinney Posts: 1,403
    In normal apps, the encodings stay the same, and you get whatever glyphs the font has at those codepoints.
  • Peter BakerPeter Baker Posts: 16
    André is exactly right about MUFI, which has been coordinated with TITUS from the beginning. The current version of the TITUS Cyberbit font is downloaded from a page dated 2009, with a header "Compliant with UNICODE 4.0," i.e. 2003-5. But TITUS is older. MUFI has been around since 2001--a time when I suspect application support for the upper range was poor.

    Between MUFI and TITUS, the PUA for medievalists/classicists/linguists is getting very crowded. It looks unlikely that MUFI will add much to its recommendation in the near future, but if there is ever a push to expand it, I'm sure it will push into the upper range.
  • ClintGossClintGoss Posts: 17
    edited May 5
    ... relocating would cause significant hassles ...
    ... except that maybe OT provides a straightforward (and pretty slick) upgrade path into the upper planes ... what if each PUA user:

    Picks a second plot of code-point territory in the upper planes (the "upper PUA") in addition to your PUA range in the BMP (the "lower PUA") and multi-map your PUA glyphs onto both your Upper and Lower PUA ranges.

    The Lower PUA still works and apps can migrate to using the Upper PUA over time. You can even incentivize the use of the Upper PUA by adding new code points in your range only in the Upper PUA.

    Is this workable?? I'm likely missing some significant issues here ... but ...

    The way is stands, the lower PUA is essentially the dreaded code-page scenario ... swapping contexts (fonts) to get the right set of characters that are overloading the code points. 
  • Thomas PhinneyThomas Phinney Posts: 1,403
    “the lower PUA is essentially the dreaded code-page scenario”: Yes, quite. Except for specialized bits of stuff, where any single user is unlikely to have a major conflict at any time.

    “apps can migrate”: Urm, no. General-purpose apps like Word have no knowledge or understanding of these PUA assignments, nor should they. Nor even could they for the most part, since it varies by font. (Unless they hard-code codepoint meaning on a per-font basis, and that is so not going to ever happen!)

    What would need to migrate is the encoding of existing documents, in a large variety of apps. That is hard to do.

    As long as the “lower PUA” still works, even any specialty app that really understands this stuff has little incentive to change how it treats new docs, either. Say that the specialty app is (for example) a music composition app. Why should it even care that Klingon overlaps with its PUA usage? If the fonts double-encode stuff, what reason would that app have to change its usage of those code points? Not saying you couldn’t convince some apps, perhaps, out of “principle,” but the functional rationale for them is... slim.
  • ClintGossClintGoss Posts: 17
    Ah, OK, Thomas ... I get it now. There's unlikely to be concurrent contention over the PUA (unless Klingons take a shine to FontAwesome icons ... and then only if they had not figured out how to switch fonts).

    However, as Jacob Casal asked ...  

    Is there a technical difference between the PUA and PUA-A and -B?

    e.g. Are there significant current apps that can handle the PUA, but not PUA-A and -B? ... or other dastardly daunting scenarios pushing us back down into the BMP?
  • Thomas PhinneyThomas Phinney Posts: 1,403
    ClintGoss said:

    However, as Jacob Casal asked ...  

    Is there a technical difference between the PUA and PUA-A and -B?

    e.g. Are there significant current apps that can handle the PUA, but not PUA-A and -B? ... or other dastardly daunting scenarios pushing us back down into the BMP?
    Yeah, I missed answering this properly.

    So, I agree 100% with Khaled, he has it right. For those who don’t know all the lingo, I am going to restate what he said"

    The “BMP” is the Basic Multi-lingual Plane, the antique section of Unicode that can be represented as a single double-byte code point. It is all in the first 64K characters of Unicode. There are additional “planes” (64K sections), and apps have to be just a tiny bit smarter to deal with them. Not much smarter, but just a little.

    Most of the stuff people actually use day-to-day is in the BMP, including the original PUA (“Private Use Area”).

    BUT, there are an increasingly large quantity of emoji going outside the BMP. The reason is simple: almost everything new being added to Unicode goes outside the BMP, because the BMP is quite full. It is just that most of the new stuff being added is pretty obscure.

    Emoji is the exception to that. They are outside the BMP! So suddenly apps have started caring about extending their Unicode support beyond the BMP.

    This helps emoji work, but also helps enable a ton of other things! There are all sorts of things beyond the BMP: relatively new writing systems (Adlam, from west Africa); super obscure (Warang Citi, Mro, Duployan, Minoan Linear A, Phaistos Disc); extensions of rare, obsolete or historic characters for better known writing systems that are mostly in the BMP (such as for Arabic, Sinhala, Mongolian); or in a few cases, languages we have all heard of but hardly anybody actually uses (Egyptian Hieroglyphs, Cuneiform). So, anything that happens to be a real latecomer to Unicode.
  • ClintGossClintGoss Posts: 17
    Thank you all!  ... this has be really helpful ...

    I'll sprinkle some characters at the nosebleed end of PUA-B and try it with some common apps on Win7x64 to see what breaks ...
  • ClintGossClintGoss Posts: 17
    edited May 6
    I've done some very basic testing of four "High PUA" characters: U+10ff00 through U+10ff03 on a small flock of apps.

    Most of the MS apps (Word, Excel, PowerPoint, WordPad) and Acrobat work as expected and inter-operate nicely. Windows Explorer shows TOFU, but handles cut and paste correctly.

    Windows Character Map fails.

    Corel Draw works internally, but does not inter-operate (cut/paste) between any other apps.

    ** The Details

    The OS is Microsoft Windows 7 Pro 6.1.7601 SP1 x64

    All Microsoft ("MS") Office apps are from the suite Office 365 MSO (16.0.11601.20130) 32-bit.

    Versions of other applications (the circa dates are based on the copyright notice):
    • Corel Draw X8 v18.1.0.661 (circa 2016).
    • Adobe Acrobat 9 Pro version 9.0.0 (circa 2008).
    Windows Explorer mostly works.
    It shows TOFU for a file name with High PUA characters, but handles cut/paste operations between MS Word, correctly preserving the code points.

    Windows Character Map fails.
    Does not seem to show any characters above U+FFFD.

    MS Word works.
    Microsft Word for Office 365 MSO (16.0.11601.20130) 32-bit correctly handles converting High PUA characters using alt-X (e.g. converting 10ff00 followed by alt-X to a U+10ff00 character). It and then treats them like other characters. On Save-As/PDF, writes a .pdf with embedded fonts, which looks OK in Acrobat.

    Adobe Acrobat works.
    It views the document written by MS Word. Cut from PDF and paste into MS Word works.

    Corel Draw fails cut/paste between all MS Office apps, but works internally.
    Corel Draw X8 v18.1.0.662
    Fails - shows .notdef for High PUA characters pasted from MS Word, but allows selection of High PUA characters from it's internal Insert Character docker and treats those characters reasonably thereafter.

    MS Excel works.
    Microsoft Excel for Office 365 MSO (16.0.11601.20130) 32-bit 
    Handles cut-paste to/from other MS Office apps.
    Fails cut-paste to/from Corel Draw.

    MS WordPad works.
    Handles cut-paste to/from other MS Office apps.
    Saves/restores correctly to/from "Unicode" .txt (writing a BOM and UTF16).
    Saves and restores to/from .rtf format.

    MS PowerPoint works.
    Handles cut-paste to/from other MS Office apps.
    Fails cut-paste to/from Corel Draw.
    It also has its own rules for line spacing, but that's another issue ...

Sign In or Register to comment.