Looking for help with Contextual Alternates

I'm working on a font, creating a glyph based system, based on IPA input, using Consonant-Vowel pairs, as well as Vowel-Consonant pairs.
Each "sound" is mapped to a glyph, and each pair is overlaid on top of the other. If a consonant precedes a vowel, the vowel glyph is overlaid on top of the consonant. Vice versa, if the vowel precedes the consonant, then the consonant glyph is overlaid on top of a vowel glyph, but the vowel changes to an alternate form with a diacritic attached to the bottom.
My kerning is set to reflect that, overlay the preceding glyph ontop of the former, and any consonant-consonant and vowel-vowel pairs are left alone.
I have a rule set up that if a vowel is followed by a consonant to change it to the alternate glyph, and that works okay. Consonant-vowel pairs work right too. Everything is good until a word hits a vowel-consonant-vowel sequence, in which it should combine the alternate vowel with a consonant, but the problem is that it sees the consonant and second vowel as a valid pair, combines them, and overlays all three on top of each other, breaking the rest of the word from that point onwards.
Is there any way to get the lookup to only look for a pair, combine them, then start anew? Most everything is working correctly aside from that hiccup. I'm not versed in scripting directly, and I'm unsure of what to do. I have the contextual rules following one another in a set, but even if I separate them, it still acts the same, so I don't think that matters. If anyone can help, it'd be much appreciated.

As a secondary question, and not entirely necessary to answer, but I have the vowels set to each have their contextual alternate forms as separate glyphs, but is it possible to replace all of these with just a actual diacritic mark that gets placed beneath when the vowel-consonant pair is detected?
«1

Comments

  • John Hudson
    John Hudson Posts: 3,229
    edited March 2022
    Without visual examples, I can’t confirm that I correctly understand what you are describing, but this sounds like a classic OpenType Layout lookup ordering issue. In your vowel-consonant-vowel combination, you want your vowel-consonant behaviour to be processed first, and then you consonant-vowel behaviour to be ignored. The way to implement this is, unfortunately, going to be in part determined by the font tools you are using. I do all my OTL processing in VOLT, which makes it super easy to manage this kind of thing, because I could put the vowel-consonant and consonant-vowel substitutions in different sublookups, such that if a match is found in the first the second is not processed. But AFDKO-based tools like Glyphs and FontLab still only allow sublookup structures to be used for the kern feature.  :/

    In an AFDKO implementation, you might need to add an IGNORE context to your consonant-vowel lookup, so that it will not be triggered if preceded by the vowel with which the consonant combines.

    As to your second question, yes, you could substitute your vowels to combining marks and then position those using mark-to-base anchor attachments.
  • CalicoStonewolf
    CalicoStonewolf Posts: 21
    edited March 2022
    Every time I've tried placing the vowel-consonant pairing first, and ignoring the consonant-vowel pairings, the inverse happens, and it makes all vowels their alternate forms. I'm using FontCreator to work with, and some I'm unsure as to how to perform the order of operations correctly. 
    Here's an example of how it *should* work if implemented correctly, and how it's actually going.



  • I don't 100% understand this, and I don't 100% understand FontCreator either, but I would first try putting the V+C substitutions in one lookup and the C+V substitutions in another, with the expectation that once you've done V+C substitutions the output glyphs would not be an input glyph to the C+V substitution.
  • Well, having the C+V rule forces it to even look for those pairs to begin with, but I'm not really substituting anything, and I think it's technically an error or something, seeing as I'm "substituting" the vowels for regular vowels, with a backtrack for consonants. The second rule is obviously no backtrack with replacing the vowels for an alternates class, followed by a lookahead of consonants.
    In the order I have it, C-V takes place first, V-C following that, but as I said before, the C-V that follows a V-C order, borrows the consonant from the previous pair. I'm unsure how to tell it to *not* use that consonant. If I could do that, then it should loop back around and use next V-C pair or whatever comes after, since single consonants or vowels can take place as well, standalone from pairings. Whether or not these two rules are in the same calt set or not seems to not matter, at least, currently. If I separate them, I don't know what to tell it to do to get it to halt after it finds a pair of two, and move to the next two characters (if there is any). Is there any way to signify some sort of break?
  • Christian Thalmann
    Christian Thalmann Posts: 1,988
    edited March 2022
    I'm not sure I understand what you're doing either, but I think the problem is that your second lookup doesn't know whether a consonant has already been used by the first lookup. Maybe it would help to replace the consonants used by the first lookup with a copy of themselves (e.g. replace /t/ by /t.used/, which looks identical). Then have the second lookup react to plain consonants but not «used» ones.
    BTW, are you a conlanger?
  • CalicoStonewolf
    CalicoStonewolf Posts: 21
    edited March 2022
    Sort of? I've dabbled in a few things, nothing concrete, though. I love trying to design fonts and other things, though.
    Is there any way you could show me in the way the script shows? I think that might give me a general clue as to what you mean. This is what mine currently says:
        sub @consonants @vowels' lookup Regular_Vowels;
        sub @vowels' lookup Vowel_First_CAlts @consonants;
     I think if I can visualize something better, it'll work better for me.
    I found this page talking about cycles of Contextual Alternates, but it's talking about replacing single letters with a cycle of alternates over and over again, and I'm not even sure if it's relevant, but if it is, I'm having a hard time seeing it: https://glyphsapp.com/learn/features-part-3-advanced-contextual-alternates

    Sorry if what I'm asking is too much, I'm still trying to learn, and it seems that tutorials don't ever go in-depth into what can and can't be done, and I get the feeling that this can be, I'm just not seeing it correctly, and I'm *definitely* not doing it right at the moment. And seeing how this appears to be the only snag I'm on right now, it's very frustrating.
  • Christian Thalmann
    Christian Thalmann Posts: 1,988
    edited March 2022
    I'm not familiar with your usage of «lookup», so I don't think I can help. I would expect something like this:
    (Forgot to close the lines with «;», sorry.)
    So the second lookup shouldn't bite when a consonant has already been replaced by its backspaced version.
    BTW, if your number of supported consonants and vowels is not absurdly large, you might consider auto-generating every possible combination as a pre-made ligature. That will avoid any of the trouble you might run into for drawing different glyphs on top of each other at the end user level. (I successfully did that for a constructed script with CVC syllables, which yielded some 3000 ligatures...)



  • The problem with that is, because I'm using IPA input, certain characters, depending on the consonants around them, can change to a different glyph, one also used by a different sound, because they're so close together. I was going to try that earlier this morning, and it sort of broke a bit harder. There's 18 different vowel glyphs, and I think there's 20 different English IPA vowel sounds, if I remembering correctly.
  • Again, this is hard to understand without an example.
    But you can presumably make more lookups that pre-bundle characters that belong together before VC/CV substitution...?
  • Okay, well, that definitely worked. I made duplicate Consonant gylphs as alternates, added a rule behind the V-C rule, using V Calts as the backtrack, with substituting consonants for the duplicates, and that's halted it, as it no longer considers them to be the same glyph. Thank you so much for your help!
  • I'm a bit perplexed about your example since you seem to want to give the word 'alluring' what would be a rather perverse syllable structure for any language. Normally, this word would be syllabified as ə.lʊ.ɹɨŋ (or simply as ə.lɚ.ɨŋ) rather than as əl.ʊɹ.ɨŋ.

    How would you expect your system to behave with a consonant initial word like 'Topeka' or 'Kansas' or with a word with a particularly evil syllable structure like 'sixths'?
  • CalicoStonewolf
    CalicoStonewolf Posts: 21
    edited March 2022
    I'm unsure what you mean by 'sixths'?
    I'm a bit perplexed about your example since you seem to want to give the word 'alluring' what would be a rather perverse syllable structure for any language. Normally, this word would be syllabified as ə.lʊ.ɹɨŋ (or simply as ə.lɚ.ɨŋ) rather than as əl.ʊɹ.ɨŋ.
    This is using standard English IPA, for sounds, then converted to a set of glyphs, combined into pairs of consonant and vowel pairs, or the other way around. əˈlʊrɪŋ is the way it's presented in standard American English phonetics. Because of the combination of pairs,  ə & ˈl are combined, followed by ʊ & r, then ɪ & ŋ, to give the result of three side-by-side glyphs.
    With words such as Topeka, or Kansas, the Consonant would have a Vowel overlaid on top of it, (T+o - p+e - ka), or with Kansas, consonants can be represented by themselves too (K+a - n - s+a - s).
    New problem that I'm curious if anyone has a solution to. Since I'm using the IPA, the Greek character lowercase Theta (θ) is not behaving like the rest of the set, and isn't kerning properly. Is there any way to force kerning on it? I've tried adding a Greek language set and kerning to that, but I'm getting nothing.
  • With words such as Topeka, or Kansas, the Consonant would have a Vowel overlaid on top of it, (T+o - p+e - ka), or with Kansas, consonants can be represented by themselves too (K+a - n - s+a - s).
    Actually, as written above it would prefer VC syllables and thus yield K-an-s-as and T-op-ek-a, which is rather weird. I would also prioritize CV over VC syllables to yield Ka-n-sa-s, To-pe-ka, and ə.lʊ.rɪ.ŋ.

  • Thomas Phinney
    Thomas Phinney Posts: 2,897

    New problem that I'm curious if anyone has a solution to. Since I'm using the IPA, the Greek character lowercase Theta (θ) is not behaving like the rest of the set, and isn't kerning properly. Is there any way to force kerning on it? I've tried adding a Greek language set and kerning to that, but I'm getting nothing.
    If you want to kern the Greek Theta with the IPA characters… nope. I mean, you can put the data in the font, but most layout engines separate text runs by script (writing system). So they will treat a Theta in the midst of some IPA as a separate text run. OpenType layout can’t cross text runs.

    You should be able to kern Theta against itself just fine, though!  :p  (Ok, not much consolation, I know.)


  • With words such as Topeka, or Kansas, the Consonant would have a Vowel overlaid on top of it, (T+o - p+e - ka), or with Kansas, consonants can be represented by themselves too (K+a - n - s+a - s).
    Actually, as written above it would prefer VC syllables and thus yield K-an-s-as and T-op-ek-a, which is rather weird. I would also prioritize CV over VC syllables to yield Ka-n-sa-s, To-pe-ka, and ə.lʊ.rɪ.ŋ.


    I have prioritized C-V pairs over V-C. I'm using both. The font worked fine up until it hit a V-C-V pattern, in which it was seeing both a V-C and C-V pair, and subsequently laying them on top of each other. But, per Christian's suggestion, I made duplicates of the consonant glyphs, and whenever a V-C pattern is detected, it replaces the consonant with an identical copy of the glyph, so that it's no longer seeing it in the same way, yet seeing it as a V-Alt-C-Alt set instead.
  • CalicoStonewolf
    CalicoStonewolf Posts: 21
    edited March 2022

    New problem that I'm curious if anyone has a solution to. Since I'm using the IPA, the Greek character lowercase Theta (θ) is not behaving like the rest of the set, and isn't kerning properly. Is there any way to force kerning on it? I've tried adding a Greek language set and kerning to that, but I'm getting nothing.
    If you want to kern the Greek Theta with the IPA characters… nope. I mean, you can put the data in the font, but most layout engines separate text runs by script (writing system). So they will treat a Theta in the midst of some IPA as a separate text run. OpenType layout can’t cross text runs.

    You should be able to kern Theta against itself just fine, though!  :p  (Ok, not much consolation, I know.)
    Hmm. I'm needing it to act like the other Latin-based characters, and allow the vowel to cross overlay it. Any ideas of a workaround?
    Edit: Thinking on it, do you think it would be possible, if I copied the 20 vowel glyphs I have to Greek character mappings, and used contextual alternates to switch Latin characters for Greek ones, whenever it encountered a pair, and be able to kern those?
    For example, in the example of ˈθʌndər (thunder), instead of producing ˈθ ʌ(Latin alt)+n d+ər which it's doing now, I could substitute the ʌ for a Greek replacement instead, so it would force it to be ˈθ+ʌ(Greek alt) n d+ər?
  • I'm unsure what you mean by 'sixths'?
    I simply mean the word 'sixths' as in five sixths (= 5/6). Here you've only got one vowel but a hellish number of consonants /sɪksθs/ (though most speakers will drop some except in hypercorrect speech).
  • CalicoStonewolf
    CalicoStonewolf Posts: 21
    edited March 2022
    I'm unsure what you mean by 'sixths'?
    I simply mean the word 'sixths' as in five sixths (= 5/6). Here you've only got one vowel but a hellish number of consonants /sɪksθs/ (though most speakers will drop some except in hypercorrect speech).
    In the case of "sixths", it would be s+ɪ k s θ s. θ is classified as a consonant, so it'd be fine, since consonants don't cross over each other as pairs, yet sit side-by-side.
    The problem I'm currently facing is Theta playing nice with other Latin characters, which doesn't seem to want to happen with a vowel next to it. I'm looking into possibly trying to make copies of the vowels as Greek characters, and substituting Latin characters for their Greek character copies, and hoping that will allow me to kern them.
    Frankly, I get characters being in language sets, but I think that it's a load that Latin-styled characters or similar can't be kerned together.

  • I don't think you need any kerning with my system of replacements. Just give the .backspaced glyphs a width of zero and move their content forward or backward by one glyph position. Your base glyphs are all the same width, right?
  • I don't think you need any kerning with my system of replacements. Just give the .backspaced glyphs a width of zero and move their content forward or backward by one glyph position. Your base glyphs are all the same width, right?
    They are all the same width, yes. Mostly everything is working at the moment, outside of the Theta-grek symbol messing up. It's fine if it's at the end of a word, as that's the end of the pattern anyways. But when it's preceded or followed by a vowel character, there's no overlap. I can't kern Greek and Latin together, so, trying to figure out how to do that. Chaining context doesn't seem to want to replace a Latin vowel with a Greek substitute, and I'm not entirely sure why.
  • Theta is, I believe, the only greek character used in IPA which doesn't have its own codepoint for use as a phonetic character. Since your font doesn't actually display IPA characters you could always use þ in its place.
  • Maybe start CALT with a dedicated lookup that replaces all theta with þ?
    I still don't understand why you need kerning.
  • John Hudson
    John Hudson Posts: 3,229
    Maybe start CALT with a dedicated lookup that replaces all theta with þ?
    The problem with theta is at the character level: it is itemised as a Greek character so during run segmentation it is put in a different glyph processing run than adjacent Latin characters. Substituting a different glyph at the GSUB level doesn’t change that: the theta character and whatever glyph represents it is still in a separate run determined by Unicode script property.
  • RichardW
    RichardW Posts: 100
    OpenType lookups work at the glyph level, except that the system may have some remembrance of the original encoding.  Replacing theta by thorn in the font would just result in the thorn not interacting.  A hack that should work would be to encode as thorn but give it the glyph of theta.

    I am surprised that THORN isn't listed as also Latin in the Unicode script extension property.
  • OK, that makes sense. Definitely use thorn for theta then...
    I still don't understand why kerning would be needed when spacing will do? Especially since reliance on kerning introduces an unecessary failure mode for the font.
  • RichardW said:
    OpenType lookups work at the glyph level, except that the system may have some remembrance of the original encoding.  Replacing theta by thorn in the font would just result in the thorn not interacting.  A hack that should work would be to encode as thorn but give it the glyph of theta.

    I am surprised that THORN isn't listed as also Latin in the Unicode script extension property.
    How would I go about doing this, then? I'm using Theta because I'm wanting to use direct IPA transliterations as input to be converted by the font itself into the glyphs representing each sound, so Theta is important to have. Ideally, I'm just wanting to use Theta as an input, but as far as I care, if it maps somehow to something else, and spits out as whatever glyph represents Theta in the long run, so that vowels can overlap with it, or vice versa. I don't care how I get there for that. Just that the input in text needs to be Theta.

  • Why does it *need* to be IPA? What you're creating doesn't bear any resemblance to actual IPA so it isn't clear why the input should need to be in IPA as opposed to some sort of practical orthography which might employ a mixture of IPA and more easily typeable characters all of which are drawn from latin ranges. There is no way to overcome the fact that theta will be treated as part of a different script system than other IPA characters.
  • RichardW
    RichardW Posts: 100
    The hack is to have the cmap map U+00FE to the glyph you're currently mapping  U+03B8 to.  The problem is that Latin and Greek character codes are not mapped to a single glyph run.

    I presume you explicitly have sets of lookups for the latn and grek scripts.  It's just conceivable that the kerning failure might go way if you didn't explicitly support those scripts, but put the lookups in the DFLT script.  I don't know that it will work - the renderer might very well process the scripts separately regardless.

  • RichardW said:
    The hack is to have the cmap map U+00FE to the glyph you're currently mapping  U+03B8 to.  The problem is that Latin and Greek character codes are not mapped to a single glyph run.

    I presume you explicitly have sets of lookups for the latn and grek scripts.  It's just conceivable that the kerning failure might go way if you didn't explicitly support those scripts, but put the lookups in the DFLT script.  I don't know that it will work - the renderer might very well process the scripts separately regardless.

    I've tried changing to a default script, and I'm still unable to change the kerning at tall for Theta. I may have misunderstood how you were saying that in general. I'm going to assume that any input of the Theta symbol is going to result in the same manner?
    Because it's not just the kerning that's an issue, as with the word ˈθʌndər (thunder), it should be producing ˈθ+ʌ n d+ər. As it is, it's not seeing Theta as a consonant, even when I've got it in a consonant group set, so it's producing ˈθ ʌ+n d+ər, switching the ʌ to it's alternate, when it should be overlaid on top of Theta.
  • Thomas Phinney
    Thomas Phinney Posts: 2,897
    None of your OpenType layout processing will work across writing systems, unfortunately. Which is to say, Theta simply cannot interact, in terms of OpenType code, with IPA glyphs. (As I think you have discovered, Richard’s suggestion will not resolve the problem.)

    Your viable options include:
    - get rid of the Theta
    - encode all your characters in a private use area instead of using pre-existing codepoints