u+00af and u+02c9

Why do people put u+00af (macron)  and u+02c9 (modifier letter macron) in fonts? Why is u+02c9 appearing in fonts but not other modifier letters? Is this just a mistake that one person made and lots of other people have copied?

Comments

  • Thomas PhinneyThomas Phinney Posts: 1,141
    These days, I generally put in both combining diacritics (modifier letter whatsitsname) and standalone diacritics (whatsitsname). It isn't really any extra work, and at least for basic accents, they are part of some standard character sets, basically for legacy reasons.

    But I wouldn't be wagging my finger very hard at somebody who left out the standalone diacritics.

    Are you seeing this specific to the macron, though? With the same designer leaving out the other modifier letters they have standalone accents for? That is... odd.

    I can also imagine somebody retrofitting an existing font that has precomposed accents for western European characters, adding CE support and doing that through combining accents. But in such a case, I would expect to see the combining accents for other CE characters, such as the Hungarian double-acute-style umlaut (humgarumlaut).
  • James PuckettJames Puckett Posts: 1,592
    What’s the difference between the combining marks and the modifier letter marks? Are the modifier marks supposed to be zero width so they appear atop letters?

    I asked this because the modifier letter macron is in Underware’s latin plus, which means it’s turning up in new fonts based on latin fonts. Given how many people think their work supports the Oneipot language I assume that modifier letter macron is showing up in lots of new fonts.
  • Craig EliasonCraig Eliason Posts: 752
    Related question: is there utility to including both spacing and non-spacing versions of Greek (e.g. tonos) and Cyrillic (e.g. Cyrillic breve) diacritics in fonts?
  • edited July 12
    U+00AF MACRON (= overline, APL overbar) is really an overline. Most fonts include it because it’s in common legacy encodings like Windows CodePage 1252 or Mac Roman. It’s mostly meant to be used as the U+005F LOW LINE (= underscore) but above. These legacy encodings either used it as macron / overline above letters or as a spacing overline or even both, just like they used U+005F as macron below / underline or as a spacing low line or even both. When alone, it usually has a wider shape than the macrons used on top of letters and may even connect when following or preceding itself.
    Edit: Apparently it’s called “high minus” in APL.

    U+02C9 MODIFIER LETTER MACRON was encoded to represent the spacing diacritic used for tone marking. It usually has the same shape as the macron used on top letters. Many fonts use it as a component for letters with macron.

    U+0304 COMBINING MACRON is a non-spacing diacritic, it should be used as the macron on top of letters (whether they have composed characters encoded or are composed by the shaping engine with OT features).


  • Max PhillipsMax Phillips Posts: 461
    Combining marks should generally be zero-width […]

    @john hudson What happens if they're not? And does this apply to legacy standalone diacritics, too?

  • Vasil StanevVasil Stanev Posts: 227
    Combining marks should generally be zero-width […]

    @john hudson What happens if they're not? And does this apply to legacy standalone diacritics, too?

    I also would like to know.

  • John HudsonJohn Hudson Posts: 1,489
    edited July 12
    What happens if they're not?
    Ah now, the answer to that depends on the platform, the specific character and script, whether the glyphs are categorised as marks in the GDEF table, and probably other factors that escape me at the moment. The most important consideration in this respect is the GDEF categorisation, because at least some layout engines — notably Microsoft's Uniscribe — will tend to enforce a zero-width on any glyph defined as a mark in GDEF* (excepting in a font with a monospace flag set, in which case you need to collapse the width of marks in a GPOS lookup prior to positioning). So my recommendation is that any time you categorise a glyph as a mark in GDEF you should ensure that it is zero-width, because better you control this than run into situations where some shaping engines zero the width and some don't.

    *Not sure whether this is the case for all the shaping engines in Uniscribe, and whether it is true for Latin script.

    [Note that there are situations in some complex scripts where there are post-positioning spacing signs that you need to categorise as marks in GDEF so that you can put them into GSUB mark filter sets, so they are skipped in ligation between glyphs on either side of the sign. In that case, one spaces them in the glyf/CFF table as zero-width, since they're nominally marks, and then uses GPOS to add width to them.]

    And does this apply to legacy standalone diacritics, too?
    No. The legacy standalone diacritics are spacing characters. If you look at e.g. U+0060 GRAVE ACCENT in Unicode, you'll see it is annotated 'this is a spacing character'. Some of the legacy spacing accents also have compatibility decompositions to the space character + combining diacritic e.g. U+00B8 CEDILLA ≈ 0020 0323.

  • Michel BoyerMichel Boyer Posts: 109
    edited July 12
    I always took for granted that I could rely on the FontForge metrics window to see how the glyphs are expected to be positioned. Here the combining diacritic in U+0302, the font is Source Sans pro, mark is activated and this is the 2012 version of FontForge. All that is changed in the clip is the width of uni0302.

    Was I mistaken?

  • Michel BoyerMichel Boyer Posts: 109
  • Michel BoyerMichel Boyer Posts: 109
    I just checked with TextEdit on the macintosh and uni0302 quite wide and the width had no effect on the rendering. :wink:
  • John HudsonJohn Hudson Posts: 1,489
    I just checked with TextEdit on the macintosh and uni0302 quite wide and the width had no effect on the rendering.
    I'm not sure what you mean here. If the /uni0302/ glyph is as you have it in the FontForge screenshot, then it is zero-width, not 'quite wide'.

    If you made a version in which you gave the /uni0302/ glyph an advance width of non-zero, and you still get the same positioning and spacing of adjacent letters in TextEdit as when the glyph was zero-width, that is an indication that Apple's text engine is zero-ing the width of the glyph because it recognises it as a mark (presumably because it is being identified as such in the GDEF table).

  • Michel BoyerMichel Boyer Posts: 109
    I just checked with TextEdit on the macintosh and uni0302 quite wide and the width had no effect on the rendering.
    I'm not sure what you mean here. [...]

    If you made a version in which you gave the /uni0302/ glyph an advance width of non-zero, [...] 

    Yes that is exactly what I did. 
  • Khaled HosnyKhaled Hosny Posts: 220
    edited July 13
    AFAIK, HarfBuzz and Uniscribe enforces zero-width for mark glyphs (either based on GDEF glyph classes or Unicode character properties, depends on the script IIRC), but Core Text does not enforce it, unless they changed this behavior.
  • Thomas PhinneyThomas Phinney Posts: 1,141
    I always took for granted that I could rely on the FontForge metrics window to see how the glyphs are expected to be positioned. Here the combining diacritic in U+0302, the font is Source Sans pro, mark is activated and this is the 2012 version of FontForge. All that is changed in the clip is the width of uni0302.

    Was I mistaken?

    You were mistaken.

    More clearly: as discussed in the thread, the results can be engine-dependent. If you build things “properly,” you will generally get the same result across all engines.

    But if you do wacky things like give a non-zero advance width to a character Unicode defines as a zero-width combining mark, the results will vary. Some engines will quite reasonably ignore the advance width. Others may respect it.
  • John HudsonJohn Hudson Posts: 1,489
    If you actually want a nominal mark glyph to have an advance width, it is fairly reliable to do this via GPOS. This is something I do quite frequently in complex script fonts, especially South Indian scripts where it's necessary to kern off some marks, and hence it is easiest to be able to first give the marks a consistent left or right sidebearing amount. And as mentioned previously, there are some properly spacing signs that need to be classified as marks in order to be skipped in GSUB, and which then need to have their widths added back.
  • Michel BoyerMichel Boyer Posts: 109
    edited July 13
    Thomas

    According to Unicode Technical Note #2, here is the way to get the bounding box of a character with combining diacritical marks:

       combination_bounding_rect = base_bounding_rect;
       display the base glyph at (0,0);
       while (more marks) {
           display the mark relative to combination_bounding_rect;
           increase combination_bounding_rect by
       		the extent of mark_bounding_rect;
       }
       move horizontally by the width of the base glyph;
    
    That code assumes the diacritic may have a non zero advance and the examples given are with standard combining marks for Latin. Where do you find in the standard that the advance should be zero?

    By the way, if I understand correctly, what FontForge does not do correctly is the last line, i.e. move horizontally by the width of the base glyph.

    [1 hour after...] The code simply uses the bounding rectangle of the diacritic, not its advance (character width). The question concerning the width remains though, especially if my understanding of the last line is correct.
  • Michel BoyerMichel Boyer Posts: 109
    [...]  And as mentioned previously, there are some properly spacing signs that need to be classified as marks in order to be skipped in GSUB, and which then need to have their widths added back.
    John, where is that documented?
  • John HudsonJohn Hudson Posts: 1,489
    John, where is that documented?
    It isn't, so far as I know. It's in the 'things I needed to figure out' category. This is a strategy for making fonts, not a requirement of any standard. The circumstances in which one uses this method depends on the script, the design, and the approach taken to representing a particular combination of glyphs within a cluster.

    OpenType Layout lacks a move operator, i.e. there is no way in GSUB to explicitly move a glyph from one position in the glyph string to another. That means that if one wants to ligate two glyphs that are not adjacent in the glyph string, one needs to find some other method to get them next to each other for purposes of the ligature lookup.

    One method is to use a two-step contextual substitution to insert a duplicate of one of the glyphs in a desired location and then remove the initial instance of the glyph. This only works, though, so long as the complete context is perfectly and unambiguously definable.

    The other method is to skip the intervening glyphs, which is only possible by classifying them as marks. Since shaping engines may or may no zero the width of marks, this obliges one to set the advance width of such glyphs to zero at the glyf/CFF level, and then to use GPOS to manage the width after GSUB has been performed.

    A good example of this is in Telugu, where vowel signs that need to ligate with the base consonant in a cluster (green) might be separated from that glyph by a postscript form of a second consonant (red). That postscript glyph is not handled as a mark for positioning, since it is a spacing sign, but needs to be classified as a glyph in GDEF so that it can be put into an appropriate mark filtering set for the ligature lookup.


    There has been some talk over the years about introducing new mechanisms that would make this sort of stuff unnecessary. One idea I had was to make it possible to filter arbitrary glyphs, not just marks. Another, proposed by Martin Hoskens, was to add an explicit move operator to OpenType Layout. The latter is probably the better idea, but either would involve a major overhaul of OTL at both spec and implementation levels, with all the attendant issues around staggered support on different platforms, and there doesn't seem to be a lot of enthusiasm for such disruption given that we have workarounds that get the job done with existing support.
Sign In or Register to comment.