Escape accented characters in Font Info?

Hey all! I was looking for this info and can’t remember where I wrote it down. Do I remember correctly that there’s a way to escape accented characters so they appear correctly in the Font Info (metadata)? Can anyone tell me how to do this? Nothing particularly complicated, it’s on the order of an Ä. If I just put that in the string directly, the font doesn’t compile (makeotf in RoboFont says “Error bad string”, which I take somewhat personally because it happens to my last name on every other web form). I tried HTML escape sequences with &# etc but that just comes out the other end as a # number sequence.
Thanks for any pointers!

Comments

  • John Hudson
    John Hudson Posts: 3,230
    edited December 3
    The name table can have multiple platformID and encodingID (and languageID) entries for each piece of metadata, a legacy from when different platforms used different 8-bit encodings. It is possible to use the 0 platformID, which is Unicode, with either encodingID 3 (Basic Multilingual Plane-only) or 4 (all Unicode planes), which in theory means you can include diacritic characters or any other Unicode codepoint or text string.

    However, for legacy reasons, some name table IDs are restricted as to which characters they can contain (as well as restrictions on length in some cases). The name ID6 Postscript name (and related CID and variable font PS name IDs) is limited to a subset of ASCII.

    In which name fields are you trying to use diacritic characters?
  • Nina Stössinger
    Nina Stössinger Posts: 155
    edited December 3
    Thank you John! I should have probably been more specific, although this already is quite informative, thank you. In this case, I am trying to see if I can use diacritics in the name of a stylistic set; this wouldn’t be a translation or an alternate form, though, it would be the canonical name of the stylistic set. Do you know if that’s possible?

    Unclear on process too — I just tried putting the string inline in the feature code when it broke. I guess I could also put a decoy there and edit the name table later via ttx, if an edited encoding ID is the way to go. 
  • Thomas Phinney
    Thomas Phinney Posts: 2,899
    Stylistic set names are stored in the name table, too! So they can be in any encoding supported in the name table, including Unicode, WinANSI and MacRoman (all three of which would support the character in question). And like other name table entries, they can even be in multiple language IDs: there is not necessarily a single canonical true name.

    I suspect this is really a question about the tooling you are using upstream of that, and what it supports. You mentioned makeotf in Robofont, so I suspect this becomes a question about Robofont and its UFO sources and what that ecosystem supports, for stylistic sets.
  • Nina Stössinger
    Nina Stössinger Posts: 155
    edited December 4
    Right! Thank you Thomas. I just tried airlifting the diacritics directly into the name table via ttx, and this seems promising so far (though I have yet to do any sort of rigorous testing). 
  • John Hudson
    John Hudson Posts: 3,230
    I experimented briefly with some non-Latin characters in stylistic set names a while ago, and they didn’t get interpreted correctly. I didn’t have time to troubleshoot on that project but have been meaning to get back to it. I agree with Thomas that this is likely a tool or build library issue, and probably is something that can be addressed either at the tool level by making encodingID explicit or performing some heuristic during the build to automatically set the encodingID based on the string content.
  • Denis Moyogo Jacquerye
    edited December 4
    According to the spec (8c and 9e), OT Feature should take UTF-8 and convert it for you, so you shouldn't have to escape Ä:
    featureNames {
        # Windows, Unicode BMP, English (same as default)
        name 3 1 0x0409 "some text with Ä"
        # Unicode, Unicode BMP
        name 0 3 "some text with Ä"
    };

    But if that doesn't work you can use the character UTF-16 value for those platform/encoding:
    featureNames {
        # Windows, Unicode BMP, English (same as default)
        name 3 1 0x0409 "some text with \00C4"
        # Unicode, Unicode BMP
        name 0 3 "some text with \00C4"
    };
    If you use a non Unicode encoding, then the escaped characters should use values from that encoding, for example \80 for Ä in Mac Roman.

    If using the character directly doesn't work, maybe your feature file is not UTF-8 or something is not following the specs.

  • Thomas Phinney
    Thomas Phinney Posts: 2,899
    Something else occurs to me… we do not know EXACTLY what Nina meant when she said “it broke.”

    It is sadly not unbelievable to me that one could encode non-ASCII text in a stylistic set name, and some specific app could have trouble with it afterwards or display it incorrectly. So I am curious about what the “breakage” was.
  • Nina Stössinger
    Nina Stössinger Posts: 155
    edited December 4
    Ha, sorry about the added mystery there. What I meant by “breaking” was, it didn’t get through the makeotf implementation in RoboFont — just didn’t compile. The traceback said “bad string.” 

    Thanks so much Denis for pointing out the encoding IDs in the Stylistic Set name block and linking to that exact bit of documentation which I had missed before. I had just been copy pasting those “0”s without really knowing what they mean :). Edited that and now it appears to work! Not that I’ve fully put this through testing yet; I am a little worried about what you say, Tom, about display breaking in *some* app. (Curious if this is the kind of thing you ran into, John?)
  • John Hudson
    John Hudson Posts: 3,230
    I’m afraid I don’t remember the details of where things broke when I tried this: it was one of those things where I tried something, it didn’t work, so I changed direction because I didn’t have time to troubleshoot the problem. Which also meant my mental cache cleared pretty much immediately except for a tiny note saying ‘Go back and look at this sometime’. Maybe that time is now.

    I’m not typically using Adobe fea code for this sort of thing, so Denis’ very helpful post doesn’t directly help me. I need to be able to add ssXX and cvXX feature parameters to GSUB and name tables in fonts that are built in a variety of ways, so the Tiro build tool allows for direct specification of these in the build configuration file. It’s possible that this is where things broke and need to be addressed, e.g. by adding platform and encoding options, or tweaking the builder to heuristically set these based on the featureparams name string.
  • I don’t recall handling feature parameters for ssXX or cvXX when writing fonttools voltlib as the VOLT format didn’t support it back then. It still doesn’t seem to support it. I guess VOLT is not being maintained or hasn’t been in a while.

    For Tirotools, it could be useful to be able to define feature parameters in other languages than English. Luckily the fontTools API can resolve the language tags and platform/encodings to the appropriate name table records so it wouldn’t be too complicated.

  • John Hudson
    John Hudson Posts: 3,230
    edited December 4
    I don’t recall handling feature parameters for ssXX or cvXX when writing fonttools voltlib as the VOLT format didn’t support it back then. It still doesn’t seem to support it. I guess VOLT is not being maintained or hasn’t been in a while.
    VOLT was written almost thirty years ago, in Visual Basic. It’s been occasionally extended and maintained about as far as was possible on that code base, but that hasn’t included adding support for feature parameters. So in the Tiro builder we have the option to add feature parameters for ssXX and cvXX after-the-fact, meaning we can compile the features from VOLT and then update the GSUB and name tables.

    For Tirotools, it could be useful to be able to define feature parameters in other languages than English. Luckily the fontTools API can resolve the language tags and platform/encodings to the appropriate name table records so it wouldn’t be too complicated.
    That’s what I am thinking. We already have the ability to target specific script and langsys entries in the GSUB table, so it would definitely be useful to be able to include non-Latin strings in the feature parameter entries in the name table. Will discuss with Khaled.
  • Igor Freiberger
    Igor Freiberger Posts: 280
    edited December 5
    I'm not able to compile the OT code in FontLab 8.4 when there is a escape in the SS names or a character outside the ASCII. Compile also fails if I use the 0 indicator pointed by Denis: error message says that only 1 and 3 are valid system codes.

    But the names with ASCII-only characters work perfectly in Adobe InDesign (at least since late 2021), Affinity Publisher (version 2.x) and Apple TextEditor. I have no Quark Xpress to test. OT features are not available in Apple Pages. In MS Word for Mac, although there are drop-down menus to control them, no font makes them active.

    The last capture shows the code with escaped characters. German and Italian texts are from Google Translate, not sure about their correctness.


    .

    .

    .

  • Denis Moyogo Jacquerye
    edited December 5
    @Igor Freiberger The Mac Roman encoding needs to use Mac Roman values, not Unicode UTF-16 values. So in your case ú is \00FA in UTF-16 and \9C in Mac Roman.

    I’m not sure anyone needs to use Mac encodings anymore. You could use Unicode/Unicode (name 0 3) instead. In that case, like with Windows/Unicode (name 3 1), you should be able to use ú directly.

    Those translations are horrible ("small hats" or "small helmets", "and" instead of "to", or parts not translated at all) and the two French translations are not the same: "Small caps et petite caps" and "Lowecase et petite caps".
  • John Hudson
    John Hudson Posts: 3,230
    I’m not sure anyone needs to use Mac encodings anymore.
    Agreed. I think some tools still automatically generate name tables with Mac Roman entries, but they’re not needed.


  • I’m not sure anyone needs to use Mac encodings anymore. You could use Unicode/Unicode (name 0 3) instead. In that case, like with Windows/Unicode (name 3 1), you should be able to use ú directly.

    Nowadays the Windows platform is all you need.
  • Igor Freiberger
    Igor Freiberger Posts: 280
    edited December 6
    Thank you very much, @Denis Moyogo Jacquerye. Shame on me, I mixed entries while copying and pasting between the ssXX.

    And @Erwin Denissen is right: if I use only the Windows code, the names are also language-aware in macOS. Thanks a lot.