How would you redo Unicode?

Tofu Type Foundry
Tofu Type Foundry Posts: 50
edited May 6 in Technique and Theory
I’m not really sure how to word this question. If you could “redo” Unicode right now with no issues regarding compatibility in legacy software, what changes would you make? It seems that some of the decisions made decades ago have led to unexpected issues with modern digital typesetting. I’m curious what Unicode would look like in an “optimized” release with all language systems and languages accounted for from the start.
Tagged:

Comments

  • Nick Shinn
    Nick Shinn Posts: 2,265
    For Latin: Abolish quotesingle and quotedbl.
    That might prompt keyboard manufacturers to provide separate keys for all four “curly quotes.” 

    I doubt that having separate code points for quoteright and apostrophe would solve more problems than it would create.
  • John Hudson
    John Hudson Posts: 3,369
    Completely eliminate all decomposable diacritic characters and enforce use of letter + combining mark sequences in all languages.

    Fix Hebrew canonical combining class assignments.

    Consistently assign script properties based on the context in which characters are used, not on the script from which they historically derive (looking at you, not-Greek ᶿ).

    Avoid unification or compatibility decompositions for letter/symbol lookalikes, so e.g. separate codepoints for hooked-f and florin sign, and for lowercase mu and micro sign.

    Provide recommendations for encoding choices for similar and confusable characters, especially for digitally disadvantaged languages.
  • Igor Freiberger
    Igor Freiberger Posts: 288
    Completely eliminate all decomposable diacritic characters and enforce use of letter + combining mark sequences in all languages.
    Yes, but we would need to define where to place the cedilla in letters like H, K, N, R or turned V. Until now, I found no reliable information about this. Did you get anything new in recent years?
  • John Hudson
    John Hudson Posts: 3,369
    ...we would need to define where to place the cedilla in letters like H, K, N, R or turned V.
    That’s already the case, independent of how such things are encoded. Personally, I am fine with floating the cedilla under the middle of these letters, in the absence of any attested forms in actual use.
  • John Savard
    John Savard Posts: 1,163
    edited May 6
    Completely eliminate all decomposable diacritic characters and enforce use of letter + combining mark sequences in all languages.
    And, of course, if I were re-doing Unicode, I would do exactly the opposite. I would provide the less popular languages, the languages of countries which entered the computer age later, the languages of countries that are less economically powerful, with a full set of pre-composed characters - as are often found in unofficial, unrecognized encodings people have used in those countries - if they are desired.
    Why?
    Because pre-composed characters make it simpler to process text in those languages. Less processing power, less complicated algorithms, less sophisticated programs are required.
    But I have to admit, this isn't a no-brainer. It seems like a natural consequence of the current desire to provide all peoples with full equality.
    But the only reason a program that handles a given language can be simpler due to the availability of pre-composed characters is if it only handles the pre-composed versions of those characters. Otherwise, having two alternatives that both need to be handled just makes things more complicated. And that means those programs won't work properly - they won't be compatible with other programs that are more sophisticated which do properly handle combining mark sequences properly, which presumably would also be exist and which would be likely to also be running on the same computers.
    So I do admit that what I would prefer is seriously flawed.
    Thus, perhaps what I would really want to see is instead for Unicode to be succeeded by two codes - one done the way John Hudson advocates, one done the way I propose, each of these codes being designed to serve a different purpose.
    His successor to Unicode would serve the purpose of being a logical standard for worldwide communications.
    My successor to Unicode would serve the purpose of either serving as a computer code, or being closely related to a computer code or codes, that are well suited to simple and straightforward computation in each particular language.
  • Thomas Phinney
    Thomas Phinney Posts: 2,977
    The one thing that is 100% for sure worse than John Savard’s proposal, is his additional proposal to have two encoding standards.

    Good grief, please, no. That way lies madness.
  • John Savard
    John Savard Posts: 1,163
    edited 2:15AM
    The one thing that is 100% for sure worse than John Savard’s proposal, is his additional proposal to have two encoding standards.

    Good grief, please, no. That way lies madness.
    Sadly, we've already passed this point.
    The world of standards is already in the grip of that sort of madness.
    Of course, though, my goals can be achieved without having two standards. Add in all the desired precomposed characters for those who need them... but deprecate both them and the existing ones to point modern systems in the better direction.


  • Simon Cozens
    Simon Cozens Posts: 772
    Because pre-composed characters make it simpler to process text in those languages.
    Well, this just isn't true. But even if it were true, think of it the other way around: If even the majority languages had to deal with decomposed characters, software implementers would get them right.

    Doing the complex stuff by default makes things better for minority languages. Trying to turn the processing of minority languages into the same process used for majority languages is precisely the wrong direction, and the thing that got us into this mess in the first place.
  • How is waiting years before a precomposed accented character is added and usable on updated devices a good approach?
  • How would you redo Unicode?

    a) do basic research and systematics about notation systems first
    b) define usable standards with regard to font technics – not only for combined characters, but also for variant characters and ligatures
    c) re-order code blocks
    d) straighten out terminology
    e) edit glyph bugs and annotation faults

    since all this will never happen, f):

    paint a picture in oil with a flat landscape in sunset (purple sky), a timber barn on the left side (with open door), a white unicorn with golden hair on the right side and a black horse with white figures painted on it, in the middle.