Separate language codes for different Englishes

Nick Shinn
Nick Shinn Posts: 2,208
edited January 2018 in Font Technology
I would like to code fonts to behave differently for North American and British English.
(In particular, for punctuation.)
Is this possible?
«1

Comments

  • When I made,a font primarily to be used in some journals that quoted later 18th, early 19th century French, I did proper French punctuation in a stylistic set. I was doing the layout, though, and did it to speed up both the import and any keyed corrections. I don't know if I would do so in a released font.

    But there isn't any reason I couldn't move or copy the chaining contexts into a locl. Maybe. But I think that would mess up French users keying or laying out their work.
  • John Hudson
    John Hudson Posts: 3,191
    Examples of the kind of things you're wanting to handle in this way, Nick?

    Mike, are you talking just about the conventional French spacing of punctuation, or e.g. representing raised quote mark characters with guillemet glyphs?
  • John, 

    All the above and more.

    The way the text came to me was keyed-in by an English speaking person using English convention. So basically the font did the substitutions of both primes and typographical quotes, the spacing, etc. It also dealt with a set of medial /s forms according to what I could ferret out for early 19th century French rules...it at least matched the manuscripts being quoted.
  • Nick Shinn
    Nick Shinn Posts: 2,208
    edited January 2018
    Examples of the kind of things you're wanting to handle in this way, Nick?

    I would like to replace quotesingle with a right quote mark for North America, to remedy the “smartquote” fail that generates ‘18, rock ‘n’ roll, etc.

    (Still can’t understand why there aren’t separate Unicode points for right quote and apostrophe.)

    Also, for some typefaces, flipped left quotes.
     
  • There is a separate Unicode for apostrophe; software is just still stuck on QWERTY overloading.
    Previously: http://typedrawers.com/discussion/comment/24738/#Comment_24738
  • André G. Isaak
    André G. Isaak Posts: 633
    edited January 2018
    There is a separate Unicode for apostrophe
    U+02BC is named ‘apostrophe’, but this character is classified as a modifier letter rather than as punctuation. I think this was intended more for words like ’alif or O’odham where it represents a glottal stop. (Similarly U+02BC should be reserved for words like Hawai‘i).

    André
  • Kent Lew
    Kent Lew Posts: 937
    André — I think you mistyped. I believe you meant U+02BB for Hawaiʻian okina in your last sentence.

  • joeclark
    joeclark Posts: 122
    Examples of the kind of things you're wanting to handle in this way, Nick?

    The only material differences between en-CA/en-US (one set) and every other form (also one set):

    • Periods and commas inside (en-CA/en-US) or outside quotation marks
      • Flowchart required for utterances vs. other quoted text†
    • Double (en-CA/en-US) vs. single at outset

    The issue I have daggered above requires human intervention in every case, hence could not be automated even if you wanted to.

    Are you also going to deal with adding thin spaces between adjoining quotation marks? How about British style in a quotation whose first word begins with an apostrophe?

    Further, in this comment I have chosen to use hyphen instead of nonbreaking hyphen. This could be argued.

  • André — I think you mistyped.

    Indeed I did! Good catch.

    André
  • The ʻokina is for Hawaiian, but we should make an effort to reclaim U+02BC for proper use.
  • Nick Shinn
    Nick Shinn Posts: 2,208
    edited January 2018

    Quote mark standards, opening and nested

    „Afrikaans, Dutch, Polish”
         ‚Afrikaans, Dutch, Polish’

    „Bulgarian, Czech, German, Icelandic, Lithuanian, Slovak, Serbian, Romanian“
         ‚Bulgarian, Czech, German, Icelandic, Lithuanian, Slovak, Serbian, Romanian‘

    »Danish, Croatian«
         ›Danish, Croatian‹

    «Greek, Spanish, Albanian, Switzerland, Turkish»
         ‹Greek, Spanish, Albanian, Switzerland, Turkish›

    ‘British’
         “British”

    “American English, Irish, Portuguese”
         ‘American English, Irish, Portuguese’

    ”Finnish, Swedish”
         ’Finnish, Swedish’

    «French»
         “French” or ‹French›

    «Norwegian»
         ‘Norwegian’

    ***

    My premise is that <quoteleft> is almost never used in North America, except in nested quotes in body text. However, in display, it appears in error with massive frequency, in lieu of the apostrophe, courtesy of “Smart Quote” algorithms.

    Therefore, I would like to replace the <quoteleft> glyph with one of apostrophic shape. But that wouldn’t work in the UK, so I would like to treat them differently—but the <locl> tag doesn’t differentiate Englishes. 

    Perhaps I should just go ahead, and make different fonts for North America and elsewhere, clearly labelled.


    Joe:

    The issue I have daggered above requires human intervention in every case, hence could not be automated even if you wanted to.

    But it has been automated, the aforementioned “Smart Quote” algorithm. That was well-intentioned, to appease typographers who love their curly quotes, but with an unfortunate side effect.

  • Nick Shinn
    Nick Shinn Posts: 2,208
    edited January 2018
    In general, but in this case that is exactly what “Smart Quotes” attempts to do, so I see no reason not to try and bug-fix that, with a less fail-prone kludge, if such a thing is possible, which is what I started this thread to find out—on the assumption that language coding might be useful.

    I’m also not convinced that reversed left quote marks are legitimate Unicode characters, they strike me more as an alternate glyph form that may be typeface-specific; for instance, in certain historic usages such as movie title cards. And of course many ATF typefaces of the early 20th century in which they were the norm—a particularly American style.
  • notdef
    notdef Posts: 168
    calt scan everything for glyph sequence "colour" and/or "collywobbles" and toss that left-leaning commie bastard overboard with the tea
  • Kent Lew
    Kent Lew Posts: 937
    In general, but in this case that is exactly what “Smart Quotes” attempts to do, so I see no reason not to try and bug-fix that
    No, “Smart Quotes” attempts to solve the character-level problems at the character level, not the glyph/font level. It just doesn’t do it well in all cases.

    What you need is a more complete (or competent) Smart Quotes algorithm that is coded in such a way as to be easily integrated into a range of text input environments and on various platforms.

  • Hrant Հրանդ Փափազեան Papazian
    edited January 2018
    Don't try to solve character-level problems at the glyph level.
    When the barriers are too great, hack. AKA jugaad. Under-represented typography knows this well, like how Armenian became available to computer users many years before Unicode.
  • John Hudson
    John Hudson Posts: 3,191
    When the barriers are too great, hack. AKA jugaad. Under-represented typography knows this well, like how Armenian became available to computer users many years before Unicode.

    Lots of scripts had varieties of standard, pseudo-standard, and non-standard character encoding schemes pre-Unicode. So? Mixing character space and glyph space was a bad idea then as now, as Adobe expert set fonts demonstrated.
  • Yeah it was really bad Armenians didn't just twiddle their thumbs.
    https://en.wikipedia.org/wiki/Jugaad
  • John Savard
    John Savard Posts: 1,126
    Yeah it was really bad Armenians didn't just twiddle their thumbs.
    No, that's not the point. Of course if doing things in the future-proof standards-compliant way is not an option, it's better to hack than to do without.

    But when you have the choice, it's better to do things the right way, and people do have that choice now, or at least, this is what he is claiming.
  • Quote mark standards, opening and nested

    „Afrikaans, Dutch, Polish”
         ‚Afrikaans, Dutch, Polish’

    „Bulgarian, Czech, German, Icelandic, Lithuanian, Slovak, Serbian, Romanian“
         ‚Bulgarian, Czech, German, Icelandic, Lithuanian, Slovak, Serbian, Romanian‘

    »Danish, Croatian«
         ›Danish, Croatian‹

    «Greek, Spanish, Albanian, Switzerland, Turkish»
         ‹Greek, Spanish, Albanian, Switzerland, Turkish›

    ‘British’
         “British”

    “American English, Irish, Portuguese”
         ‘American English, Irish, Portuguese’

    ”Finnish, Swedish”
         ’Finnish, Swedish’

    «French»
         “French” or ‹French›

    «Norwegian»
         ‘Norwegian’

    ***



    I would classify the use of quotes in the abovr scheme in Dutch as old fashioned, most current Dutch media would opt for the 'English' version, and both are correct. My point: these things are subject to fashion, so perhaps ill-suited to code into a typeface.

  • John Hudson
    John Hudson Posts: 3,191
    edited January 2018
    Hrant, you seem to be wilfully ignoring the point, which is not about hacking vs twiddling thumbs, but about where to hack. Armenian text processing pre-Unicode involved using the same 8-bit codes as ANSI and assigning them to Armenian characters, just as was done for dozens of other writing systems. Sometimes that was done within the framework of official standards — e.g. the ISCII encoding standards in India —, and sometimes it was done in an ad hoc way by specific communities (where community might be a user group of a particular computer platform), and sometimes it was done on a font-by-font basis. Obviously, the more standardised the encoding, the better the chances of text interchange and platform interoperability, and its no wonder that the standardised 8-bit encodings tended to become the model for migration to 16-bit codepages, which allowed for fairly easy migration of fonts and documents too. So even when hacking a character encoding solution, there were clearly better and worse ways to do it, better and worse places to apply the hack.

    Nick is describing a limitation within the algorithm that converts some character codes to different character codes in certain circumstances. The algorithms don't always produce the correct result (because the circumstances are more complex than the algorithms allow for: there are exceptions to simple rules that the algorithms don't anticipate). So that's the problem in need of a solution. Where should that solution reside?

    It seems obvious to me that the font-specific glyph processing level is not a very good place at which to try to solve that problem. It isn't an interoperable solution (in order to be interoperable the same hack would have to be made in all fonts), and it masks rather fixes the problem, because it leaves the incorrect, unwanted character in the text string.

    I'm not opposed to a well-considered hack*, but I think the tendency of font makers to try to solve text processing problems in glyph space is that of the proverbial man with a hammer who sees everything as a nail.

    _____

    * cf. the custom normalisation schema that I developed with Biblical Hebrew scholars and text processing experts to bypass uncorrectable errors in the Unicode canonical combining class assignments for Hebrew marks. [SBL Hebrew User Manual : Appendix B p.21] That's a necessary hack. I was pleasantly surprised recently to discover that its been adopted as an ad hoc standard in a range of software that needs to handle Hebrew text.]
  • Hrant Հրանդ Փափազեան Papazian
    edited January 2018
    Except it doesn't seem that what Nick wants has an officially sanctioned solution. Not unrelated to how the apostrophe has been crippled by bureaucrats with muddled intentions.

    Things like the Adobe Expert Sets only seem a bad idea in retrospect; in fact they helped people feasibly implement better typography in their time.

    The purity some people seek can become a crutch.

    @Jasper de Waard Typefaces are even more subject to fashion than typesetting conventions. Plus they're easy to modify.

    Also subject to fashion: "standards".
  • Nick Shinn
    Nick Shinn Posts: 2,208
    “Smart Quotes” attempts to solve the character-level problems at the character level, not the glyph/font level. It just doesn’t do it well in all cases.

    True Kent, “Smart Quotes” isn’t a font-level hack, but it does the same thing that we’re not supposed to do in fonts, which is to replace one character by another (not written in the text) that has a glyph deemed more appropriate by a third party (not the document’s author or typographer).

    What you need is a more complete (or competent) Smart Quotes algorithm that is coded in such a way as to be easily integrated into a range of text input environments and on various platforms.

    Yes, it should be upgraded to utilize grammar and dictionary Intelligence, to work better. And that would also require some kind of language specificity that distinguishes Englishes—to address the issues noted by Joe.
  • Let’s talk about expectations. If you’re typesetting a text, would you expect the quotation marks to change direction when you change font?

    Typesetters and designers who care about these things can change the character to get the result they want. Providing a weird and unexpected experience for everyone else doesn’t seem to justify any potential benefit.
  • Hrant Հրանդ Փափազեան Papazian
    edited January 2018
    You choose the typeface based on what it can do for your text, not merely for following lowest-common-denominator expectations. And you generally don't switch a typeface unless you chose the one at hand poorly, which means the new one should be given the benefit of the doubt.

    Would you expect the ampersand to change from its conventional shape to an "Et"? Does that make the "Et" form necessarily bad? And what if the numerals go from lining to OS?

    In the end, if a typeface designer believes that a convention is dysfunctional, following it anyway can become an act of hypocrisy.
  • Hrant Հրանդ Փափազեան Papazian
    edited January 2018
    To me the sine qua non is what the user sees, not how we made it so.
    And nothing lasts forever,
  • This whole discussion reminds me of my favourite bit of OpenType code,
    sub period space space by period space
    If you think fonts should be opinionated about linguistic conventions, you should probably include that one too.
  • John Hudson
    John Hudson Posts: 3,191
    [Signing out, since Hrant has once again reduced himself to communicating in slogans. No dialogue to be had here.]
  • Nick Shinn
    Nick Shinn Posts: 2,208
    edited January 2018
    This discussion, like many today, pits those who believe in correct principles vs. those who believe in correct outcomes, both as best practices.

    However, this thread was started to address the practicalities of two very specific situations, namely “Smart Quotes” and reversed left quote marks.

    I wondered if one way to deal with these might involve making a distinction between American and other Englishes.

    And I’m getting a lot of flak from the principled purists, of whom I would ask, do you have a better idea? Certainly it’s true that as Kent says, “What you need is [a better algorithm]”, but I already know that ain’t gonna happen, and it’s not something I can come up with—but a font hack is.

    I was interested to know if there is, technically, any standard that distinguishes different nationalities of English. Joe’s post of January 4th has been the most helpful so far.

    I have actually put the apostrophe glyph in the <quoteleft> character, for proprietary display fonts for North American companies that use the fonts in packaging, adverts and posters in the USA, where it is effective in preventing apostrophe boo-boos, and has no down side that I’m aware of. 

    And I’ve put reversed quote marks in some fonts, coded for English:
    <div>&nbsp;language ENG &nbsp;exclude_dflt; # English
    sub [quoteleft quotedblleft] by [quoteleft.alt quotedblleft.alt];</div>
    Strictly speaking, this is wrong, because it represents one Unicode character by another’s glyph, e.g. Double High-Reversed-9 Quotation Mark (U+201F).

    However, this is described as “has same semantic as 201C, but differs in appearance”, which is rather like the relationship between single and double storey /a, and they don’t have separate Unicode points. 

    Will that be the house red sir, or something special?