Hebrew need for another mark in unicode - Sheva Na, Dagesh Hazeq

Yeshurun Kubi
Yeshurun Kubi Posts: 14
edited December 2018 in Font Technology
On Hebrew there is a need for another mark on the unicode, but there no formal unicode for it.
Actually this need is more like a stylish need, cause it not a new form - just the same form that notes the way of pronouncing it on a context condition.
For example: the 'a' letter sounds different at the word 'car' and at the word 'cat'. So i have made a spacial 'a' to note the reader that on the word 'cat' is sounds more like an 'e':

This is not a real example, just a way of looking to understand this thing on Hebrew.
The Unicode.org did that on Hebrew for the standard mark QAMATZ [sounds like 'a'] adding Qamats Qatan [sounds like 'o'] - and really it is just a stylish different and not real new mark form:

On the Unicode.org:
05B8 ◌ָ HEBREW POINT QAMATS
used generically or as qamats gadol in orthography which distinguishes that from qamats qatan
05C7 ◌ׇ hebrew point qamats qatan
But there is another two marks that have the same need:
For the foraml SHEVA [u+05B0] there is a need for SHEVA NA - to note that it sounds like 'e'.

And i have found a letter to the Unicode.org asking them to do so.
The second is for the formal DAGESH [u+05BC] there is a need for DAGESH HAZAQ to note that it sounds more 'strong'.

The question:

My question is what should font designers do meanwhile - as long as there is no formal unicode for it?
It it not good just putting it on Private Area category, cause it will not get the LTR bidi direction.

One most appreciated font designer for Hebrew made a 'liga' feature that replaces a sequence of two SHEVA marks to SHEVA NA that he designed on the General Punctuation category [at U+200C that formaly is Zero Width Non-Joiner].
So it looks not like the right way of doing it.

I thought that the proper way for that is putting it on the Private Area category, and applying a Swash [swsh] feature for it.

So... what to you say what is the best way of doing it?

«1

Comments

  • There is another nikkud (Hebrew for vocalization mark) that should be first on the list: the chataf kamatz katan ("reduced kamatz katan"), which is considered an essential “balancing” phoneme in liturgical Hebrew (see attached sample). Typographically speaking, the kamatz katan has little value if there is no chataf kamatz katan—a situation parallel to the regular (called gadol = greater) kamatz and the chataf kamatz, both of which are in the standard Hebrew Unicode.


    The kamatz katan, chataf kamatz katan, sh’va na, and dagesh hazak were not used in the Masoretic texts of the Hebrew Bible, which were written in what is referred to as the Tiberian System of vocalization (nikkudot = vowels, plural.) and cantillation (taamim = chanting trope). The system was developed in Tiberias in the 9th-11th centuries C.E. They are, rather, latter-day notions. No such typographic differentiations appeared before the 19th century, and only then in grammatical texts (Gesenius, et al.) The earliest grammatical work to mention them are the ones included in the 1546 collection Dikdukim by Elijah Levita, which includes six or so major grammatical treatises of the early 16th century. But from these you can come away with ideas that suggest there may be as many as 10 forms of the sh’va! It’s very esoteric and, for modern users, might make matters more obscure than they are already.

    In my work for the recent official prayerbooks for the Conservative and Reform congregations (worldwide, though mostly U.S.), we have included the chataf kamatz katan and, in the Conservative prayerbook for Sabbath and festivals, the sh’va na—however, not the dagesh hazak. Since all of the Hebrew fonts were made by me and are held privately, I found my own workarounds, using ccmp and GSUB with glyphs in the Private Area. Since the fonts are not for sale, I felt no need to petition the Unicode Consortium, though I did look into it. Because we use modern punctuation with Hebrew liturgical (not biblical) texts, I always add a Hebrew set that operates through a locl feature. I use special typing sequences to access the out-of-unicode glyphs via GSUB.

    To give credit where credit is due, I should say that I worked on some of these solutions with Ben Kiel.

    One more thing: readers of liturgical Hebrew outside of Israel require the use of the meteg (see sample above) to mark non-ultimate syllable stresses. Israeli type designers who include nikkud never include it, expect in the rare instances of fonts with cantillation makes, in which the meteg has a different function (and is called a siluq). They should.


  • John Hudson
    John Hudson Posts: 3,227
    In the absence of an encoding solution, relying only on glyph processing options and possibly higher level text markup (e.g. XML), I would suggest using a Stylistic Set or Character Variant feature applied to the existing sheva and dagesh mark characters. As you say, these are stylistic variants of the vowel and gemination marks intended to indicate different punctuation, so it makes sense to handle them in stylistic glyph features in the absence of a standardised encoding.
  • Yeshurun and Scott-Martin: you can propose the addition of these characters directly to Unicode. The Consortium site has a page explaining how to do this. You can also contact organizations that work with Hebrew language to prepare a joint proposal.

    It is good to search existing proposals both to verify if the same characters are not already proposed and to see examples of successful proposals. Maybe a direct contact with Michael Everson —the researcher who has more proposals approved by Unicode— could provide additional information and guidance.

    And, while no Unicode addition is available, Hudson's idea about Stylistic Set or Character Variant is surely the best provisory solution.

  • For I am not a font designer, if find it hard to understand which feature to use:
    1. Why not using the 'swash' feature? [as the font designer might want several designs for it]
    2. Why not using the 'salt' [Stylistic Alternates] feature?
    3. Character Variant feature - did you mean 'cv01' - 'cv99'?
    4. I think not all typing software's supports this Character Variant feature 
  • Thank you, John and Igor. The Character Variant feature is certainly an option, while the use of Stylistic Sets is not, as it does not offer a case-by-case option. All the of characters mentioned by Yeshurun and me coexist with the standard set, not “in place of” throughout a document or paragraph.

    Yeshurun, I have used the Swash feature in some fonts to engage alternative punctuation. It’s perfectly effective, though I don’t think it would work for alternative diacritics for the reason stated above regarding Stylistic Sets.

    There may be yet one more nikkud that should be included in the Unicode: the patach g’nuvah, which is a regular patach (05B7) that’s offset to the right when it appears under a hei or chet that is the final letter in a word. I have automated the placement through substitution strings (hei-patach-spacehei-patach-period[etc.]). It may be a more elegant solution than having a separate Unicode designation. But then again, the Unicode for Hebrew contains many glyphs that could have been achieved by other means. I suspect that that was the result of the Unicode having come into existence before OpenType, from which it tries to remain independent.

  • John Hudson
    John Hudson Posts: 3,227
    The Character Variant feature is certainly an option, while the use of Stylistic Sets is not, as it does not offer a case-by-case option. All the of characters mentioned by Yeshurun and me coexist with the standard set, not “in place of” throughout a document or paragraph.
    In this regard, the Character Variant and Stylistic Set features would function in exactly the same way: you would have to selectively apply the feature only in those places where you wanted the variant form. This could be done either manually or, if phonological rules can be defined, using scripting.

    To be clear, once again, I am only suggesting this as a work around in the absence of distinct Unicode codepoints for these variant marks. Obviously the better way to handle this would be to write proposal documents to have these variant marks encoded.

  • John, there is an important aspect of working with type in large documents that type designers, OT engineers, and application designers seldom take into account: the ease of access of certain glyphs and features in the workflow. In InDesign, where I spend much of my time, there are three steps to get to a Stylistic Set: a pull-down, a secondary pull-down, then the selection of the desired Stylistic Set. It’s a lot faster to simply make a selection in the glyph palette, and easier still to add an extraneous glyph (an equals sign, for example) to each occurrence of a special glyph that will actuate an automatic substitution. Of course, one has to add this to the substitution table, but that needs to be done just once.

    Poor implementation of OT features in InDesign is a major problem for type designers. I remember that, some years ago, there was an organized effort to get Adobe to change their OT interfaces, but nothing came of it. I also remember that the panel they convened did not include a single person who one might consider to be an expert typesetter.

  • In my opinion the fastest wan to pus a shevana or dagesh hazak is:
    sheva + sheva = shevana
    dagesh + dagesh = dagesh hazak
    Both in CCMP feature.
    This work in InDesign but not in other programs.
  • John Hudson
    John Hudson Posts: 3,227
    In InDesign, where I spend much of my time, there are three steps to get to a Stylistic Set: a pull-down, a secondary pull-down, then the selection of the desired Stylistic Set.

    In InDesign, you can assign a Stylistic Set feature to a Character Style, which gives you a single click from a high level menu to apply the style. I believe it is even possible to map a Character Style to a keyboard shortcut.
  • You can also just map all OpenType features to keyboard shortcuts and ignore going through the menu hierarchies.
  • Yeshurun Kubi
    Yeshurun Kubi Posts: 14
    edited December 2018
    In my opinion the fastest wan to pus a shevana or dagesh hazak is:
    sheva + sheva = shevana
    dagesh + dagesh = dagesh hazak
    Both in CCMP feature.
    This work in InDesign but not in other programs.
    This is not a good way:
    1. Messing and changing the text, just to get a certain glyph.
    2. Formally it is not a valid combination, 2 diacritic one after the other - check here.
  • Am I the only one who considers it a bad idea to assign meaning to subtly different stylistic elaborations of a mark? Why not chose something unambiguous? E.g., replace the dots with horizontal lines in sh'va and dagesh, and give the kamatz a second horizontal line at its bottom...
  • Yeshurun Kubi
    Yeshurun Kubi Posts: 14
    edited January 2019
    Am I the only one who considers it a bad idea to assign meaning to subtly different stylistic elaborations of a mark? Why not chose something unambiguous? E.g., replace the dots with horizontal lines in sh'va and dagesh, and give the kamatz a second horizontal line at its bottom...
    That is an option that have been used in that kind of way on some prayer books years ago.
    But in the past 10 years it have been changed for it is more comfortable for the 'reader'.
    So now it is a regular thing that most people are use to it on prayer books and Bible
    This way people are more aware to this change, that is important in prayer and Bible, but less important while speaking.
  • This is not a good way:
    1. Messing and changing the text, just to get a certain glyph.
    2. Formally it is not a valid combination, 2 diacritic one after the other - check here.
    As I sad this work in InDesign and does not and other programs.
    Yes, this changes the texts but it is easy to fix them using find/change.
    I can’t see a fastest way to do this.
  • Yeshurun and Scott-Martin: you can propose the addition of these characters directly to Unicode. The Consortium site has a page explaining how to do this. You can also contact organizations that work with Hebrew language to prepare a joint proposal.

    It is good to search existing proposals both to verify if the same characters are not already proposed and to see examples of successful proposals. Maybe a direct contact with Michael Everson —the researcher who has more proposals approved by Unicode— could provide additional information and guidance.

    And, while no Unicode addition is available, Hudson's idea about Stylistic Set or Character Variant is surely the best provisory solution.

    Hi Igor,
    It's already done and already rejected:
    https://unicode.org/L2/L2016/16086-sheva-na.pdf
    Feliz ano novo para você e todos aqui.
    Happy New Year to all here.
  • Igor Freiberger
    Igor Freiberger Posts: 279
    edited January 2019
    This may be not as bad as it seems.

    The rejected proposal does not follow the standard Unicode asks for. To be properly considered, a proposal needs to (1) present a set of information about what, where, how and why would be encoded, with as much usage references as possible; and (2) fill a form with a set of information accordingly the internal Unicode terms and standards.

    Maybe a new proposal for this, but following the Unicode conditions, could be accepted. Two samples, a simple and a complex, are attached to give a better idea of how they expect to be a proposal.

    A great 2019 also for you and all fellows in TD! (And I hope the encoding idea of these Hebrew marks could advance and become fruitful.)
  • The style of that proposal "blew me away" :)
  • bdenckla
    bdenckla Posts: 12
    My personal "most wanted" code point for Hebrew would be ḥolam malei for alef. Without it, Hebrew fonts have to do inference trickery to synthesize it from context. This is rather similar to the bad old days before Unicode introduced ḥolam ḥaser for vav.

    I cover ḥolam malei for alef and some other topics relevant to this thread in the documentation for my just-released Taamey D font.

    In Taamey D I supply a ḥataf qamats qatan glyph via a stylistic set, an idea discussed in this thread. Unlike some of the other glyphs mentioned in this thread, a ḥataf qamats qatan glyph would typically be used for all ḥataf qamats code points in a document or not used at all. So it is, perhaps, a stylistic variant in the purest sense. It would typically be used for all ḥataf qamats code points in a document that uses the qamats qatan code point. I'm not implying that other uses of the OpenType stylistic sets mechanism are abuses; I just mean to point out that a ḥataf qamats qatan glyph is a particularly good/easy fit for that mechanism. Indeed, it is such a good fit, that I wonder whether a new code point could really be justified!
  • Ben, regarding the ḥolam malei for alef:

    There is a major phonemic difference between the circumstances of olam aser vav and your suggestion of a olam malei alef glyph. The sound of olam malei vav, which predated the inclusion of olam aser vav in the Unicode, is “o,” whereas the sound of the holam haser vav is “vo”—that is, with a voiced vav. Holam aser vav appears, for example, in the word “mitzvot” (commandments), which is amongst the key tenets and most frequently appearing words in the Hebrew Bible. It is, therefore, an orthographic necessity. A olam malei for alef, on the other hand, would only make a differentiation of interpretive grammar, the need for which is entirely arguable, and I would suggest is unnecessary as a matter of typography. If one were to require such a thing for say, an educational text, it would be entirely appropriate to achieve it through manual intervention or a custom font. Moreover, the issue as you present it in your GitHub post is, in my view, more a matter of poor letterform shapes and, more directly, poor anchor positioning.

    In my view, this last point applies to a number of the issues you present on GitHub. Your Taamey D font is, more or less, an iteration of the Frank-Ruehl design. (For those of you not familiar with Hebrew typography, Frank-Ruehl is a design released 1908-1910 by the C.F. Rühl foundry in Leipzig, which was taken over in the next decade by Berthold.) The proportions of the letters do not lend themselves to good performance with the entire panoply of Hebrew diacritics used in Bible typesetting, or even in liturgy. That the font remained as popular for so long has much to do with an accident of history (the events of 1932-1948) that led to a kind of inertia. It’s time to move on and stop trying to make these letterforms accommodate features that they physically cannot. (Sefaria.org seems to get around some of the problems by tracking out or adding side bearing space to the letters—not so nice, but workable.)

    The need for ataf kamatz katan in the Unicode is unequivocal, for reasons that I pointed out some years ago, above. To include the kamatz katan, but not the ataf kamatz katan, is the equivalent of not including any of the ataf-vowels (all the others are in the Unicode since Day 1). The ataf kamatz katan is used in all of the prayerbooks I have produced for the Reform and Conservative congregations since 2008. As these now number nearly 1.8 million in regular use by more than 4 million users, it can be said that they reflect the majority position. Moreover, the ataf kamatz katan appears also in the Israeli Koren publications and others. Case closed. It’s a reality that’s beyond the need for a proposal. The same is true for sh’va na and probably for dagesh azak, as well.

    NB: All crazy people write long posts, but not all long posts are written by crazy people.

  • bdenckla
    bdenckla Posts: 12
    edited March 2023
    @Scott-Martin Kosofsky thanks for engaging with my post here on this thread and my Taamey D documentation.

    I either don't agree with or don't understand your points about my idea of a need for a Unicode code point for a ḥolam malei dot for alef.

    Correct me if I'm wrong, but what you're saying seems to rely on the pervasive but incorrect idea that alef is always silent, or at least that it is never a "real" consonant. As you are well aware (but I will state for other readers) alef, like vav, is sometimes a (silent) mater lectionis and sometimes it represents a consonant sound. In the case of alef, that consonant sound just happens to be a glottal plosive. If we use the right half ring symbol for alef, the difference between "o" and "
    ʾo" is no less real than the difference between "o" and "vo". But you seem to privilege the "vo/o" difference as more real than the "ʾo/o" difference? Again, correct me if I have misunderstood you and therefore misrepresent you.

    Perhaps a glottal plosive alef is more subtle to hear than a voiced labiodental fricative vav. But it is no less real, either to linguists or chanters. It may seem less real by accident of the fact that we have no letter for it in the Latin alphabet. So, lacking a letter for alef, we have to resort to small, exotic marks like the right half ring. Or we resort to punctuation like the dash commonly used in "uh-oh".

    To summarize the situation:
    • We do have a code point for a ḥolam ḥaser dot on alef: it is just U-HOLAM, the way a ḥolam ḥaser dot is represented on any letter other than vav.
    • We don't have a code point for a ḥolam male dot on alef. Instead, we have to infer when a U-HOLAM on the letter preceding the alef is in fact intended to be a ḥolam male dot on the alef.
    If this is acceptable, why didn't vav get the same treatment, i.e., why did we need ḥolam ḥaser for vav? Why didn't the powers that be have U-HOLAM be, very nicely, a ḥolam ḥaser dot for any letter including vav? This was actually a proposal floating around in those bad old days: see, for example, "Holam Male as <HOLAM, VAV>" in Holam Male (gentlewisdom.org). But it was rejected for the same reason I see a problem with alef today: the inference is too tricky.
  • Ben, I do not disagree with you about the sometimes consonantal function of alef. Where I do disagree is whether it requires a formal typographic differentiation, especially as one is already available to you through the use of the OT mark (mk) feature. All you would have to do, then, is add an anchor and change the typing order. It doesn’t matter which dot you use(!!), so long as it has the right anchors. 

    You ask, again, why is this different from vav cholam chaser?” Technically speaking, it isn’t, but historically speaking (I’m referring to the history of Unicode and OpenType), it is. All of us who work with accented scripts in OpenType suffer with the historical fact that many (most?) of the Unicode code sheets came into existence well before the advent and widespread implementation of OpenType features. Had these developments happened in reverse chronological order, it is likely that most precomposed accented glyphs would never have come into existence. In Hebrew, for example, there would never have been a need for the range of dagesh characters or the precomposed accented Yiddish glyphs, and so on. But by the time sufficiently featured OT fonts were possible, there was already a vast amount of standing editorial matter that was composed with older fonts, and applications that were slow to implement newer features. (Word for Mac still lacks adequate support for right-to-left languages.) Isn’t that what GSUB tables are for? Yes, for sure, but there are still residual issues from text typed in certain apps with certain fonts. Many of the most commonly used Hebrew fonts don’t work very well with their diacritics. In the tech world, the word “legacy” often means “old, burdensome junk that precludes better solutions.

  • bdenckla
    bdenckla Posts: 12
    @Scott-Martin Kosofsky I don't understand your suggestion that to solve my problem, I could "add an anchor and change the typing order."

    I'll restate my problem without reference to fonts, since reference to fonts may have confused the issue. My problem is:

    I would like three different encodings for the following three different phenomena:
    1. ḥolam ḥaser dot on an alef
    2. ḥolam male dot on an alef
    3. ḥolam ḥaser dot on a letter preceding an alef
    But today's Unicode does not give me a way to distinguish #2 from #3.

    I think you feel that some (all?) of these distinctions are unneeded, or rarely needed. That is a valid debate to have with respect to the editorial policies of a particular publication or application. But ideally Unicode should enable all of those distinctions to be made, and then users of Unicode can decide which distinctions are appropriate for their publication or application.

    Although I am critical of BHS in many ways, in this case I think the fact that BHS makes these distinctions gives considerable weight to the idea that some publications or applications may reasonably want to make these distinctions.

    I conclude with the BHS version of four of the words I use in my Taamey D documentation, showing BHS's distinction between phenomena #2 and #3. Sorry for the blurriness of some of them.


  • Ben, just create an anchor set for where you want the cholam to attach and, voilà, you’ll have it. Why ask the Unicode to add a code point when you can get what you want with a simple tweak to the font? (When the vav cholam chaser was added there was no choice—John Hudson, please correct me about the chronology if I’m wrong about that.)

    By the way, the cholam chaser on the vet and resh, in the third and fourth examples, are too far to the right and therefore outside the typographic tradition, adding to your troubles. It’s the same elsewhere in this font. They should be fixed. Also, the shin dot is too large—which is likely why the cholam chaser is in the wrong place. It should be smaller than the cholam dots.

    This is too boring and esoteric for this forum. Get in touch with me offline if you wish.


  • John Hudson
    John Hudson Posts: 3,227
    edited March 2023
    There’s a lot to unpick here, and it is a long time since I worked on Hebrew, so apologies if I am misunderstanding anything.

    The olam male versus vav ḥaluma distinction at the mark encoding level was necessary because in both cases the combining mark is applied after the vav letter, so there is no way to distinguish between the semantics of a single dot character on the basis of its position, because there was no way to distinguish its position at the font level without additional information. So Unicode, recognising that the existing ḥolam dot U+05B9 was already functioning as ḥolam ḥaser on most letters but as ḥolam male on vav, added the ḥolam ḥaser for vav character U+05BA. If one were starting Unicode encoding from scratch, it would make more sense of course to have separate ḥoman ḥaser and ḥolam male characters, and avoid the dual role of U+05B9 that later required disunification.

    The situation with alef is different. The U+05B9 character applied to alef functions as ḥolam ḥaser, as it does on all other letters except vav, so the same disunification did not apply. As Ben notes, this means that there is no ḥolam male dot in Unicode specifically for use on alef, rather it falls to the font to grab the ḥolam U+05B9 off a preceding letter in order to contextually position it on the right side of the alef when in a ḥolam male role. Is this ideal? Probably not, in terms of the complexity it requires in Hebrew fonts, but it works.

    I would like three different encodings for the following three different phenomena:
    1. ḥolam ḥaser dot on an alef
    2. ḥolam male dot on an alef
    3. ḥolam ḥaser dot on a letter preceding an alef
    But today's Unicode does not give me a way to distinguish #2 from #3.
    If you need #3 in a situation where the context rules would otherwise move the dot into a ḥolam male position on the alef, have you tried inserting a U+200C ZWNJ control character before the alef? That should break the context that would move the dot over the right side of following alef, so leave it on the first letter. A quick test of this in Pages on Mac using SBL Hebrew:

    Test string: לרֹ‍אי

    Oh, and I think this discussion is 100% appropriately boring and esoteric for this forum. :)

  • bdenckla
    bdenckla Posts: 12
    @John Hudson thanks for engaging on this topic. I haven't found a case where my font's context rules fail, but the idea that they might fail keeps me up at night. So, your ZWNJ suggestion is helpful, if only to let me sleep a little better because at least now I have an idea of what workaround to suggest if my font does fail ;). Actually, I ran across the same ZWNJ suggestion recently in Peter Kirk's classic document, “Issues in the Representation of Pointed Hebrew in Unicode.” But it helps to hear you support the idea in a more modern context.

    I was also interested to see Peter Kirk's (rejected) proposal to use ZWNJ with plain old HOLAM to represent ḥolam ḥaser for vav.

    Along those lines, looking at that and other old proposals reminds me that there are ways to use ZWJ and/or ZWNJ to address this alef issue without adding a code point, which is good, because it seems astronomically unlikely to get a code point added for this.
  • OK—boring we shall be!

    Ben, here’s a more specific and simple solution for you: As the olam aser for vav (05BA) has only one usage, it would be simpler to give it an additional usage, with alef, than it would be to create a new Unicode code point or try to use the regular olam (05B9) for a secondary purpose, as it is already used in the “normal” way with alef. See the picture, below.

    The second line shows the traditional typographic placement of the olam—off the upper left corner of the character. Your Frank-Ruehl designs do not do this and I believe it they are examples of an inferior practice. (Note that I didn’t say “incorrect.”) The great Renaissance master Guillaume Le Bé, who cut more Hebrew types than anyone, ever, sometimes cast letters with both preceding and succeeding olamim. The dots were not in the punches, but rather were added to the justifieds. It appears to me, having carefully examined his materials at the Plantin-Moretus Museum, that he used a small hand drill with a depth stop (what jewelers refer to as a “dori” drill) for these dots, as he did for shin and sin as well as the dagesh. Whereas is Bible typesetting, in which the compositor was working on three lines simultaneously (upper nikkud and taamim; the main character; lower nikkud and taamim), other types of literature did not require the top line. Because the olam dots were cast as overhanging kerns, the compositor could easily break off the one that was not required. This alleviated any problems in casting an imbalanced number of sorts and reduced the casting time.

    The digital font I use here is after Le Bé, which I made with the participation of Matthew Carter.


    (By the way, the glyph with two olamim has no purpose in the digital world!)

    Most of the issues raised by Peter Kirk in 2003 have been—or could be—easily addressed. The (furtive) patach g’nuvah can be achieved as a series of substitutions and put into a stylistic set (or some other feature). I have done this successfully. It’s my opinion that issues of Qere/k’tiv should be addressed parenthetically in digital typesetting. If you try to accommodate it within the words, you are doing no favors for the reader. Keep in mind that, in best practices, the position of the nikkud must be stable, not shifting left or right when overburdened. You have to achieve a size/weight balance. Don't throw out the baby with the bath water—and don't knock your head against the wall! I won't go on . . . 

  • bdenckla
    bdenckla Posts: 12
    @Scott-Martin Kosofsky I suggest the same general idea as an "aside" in my Taamey D documentation:
    What if we just started interpreting U-HOLAM HASER FOR VAV as if it applied to alef as well? Then there would be no need for inference trickery on U-HOLAM when it appears before alef.
    Although I don't spell it out, when I wrote that I was thinking of something far more invasive than your suggestion. I was thinking of giving U-HOLAM & U-HHFV on alef the same meaning as they have on vav. To me, that is, in general, the thing that should have been done long ago: treat alefvav the same way, regardless of what exact encoding is used, in terms of Hebrew-block code points or joiners or whatever. But too much water has passed under the bridge for such pretty but invasive proposals (pretty because they preserve symmetry between alefvav).

    Your suggestion has the great advantage of being far less invasive, by not changing the meaning of any code point sequence whose meaning is currently defined: you are just defining the meaning U-HHFV on alef, whose meaning is currently not defined.
  • John Hudson
    John Hudson Posts: 3,227
    edited March 2023
    As the ḥolam ḥaser for vav (05BA) has only one usage, it would be simpler to give it an additional usage, with alef, than it would be to create a new Unicode code point or try to use the regular ḥolam (05B9) for a secondary purpose, as it is already used in the “normal” way with alef.
    That is an idea that could be taken to Unicode (I advise against just going ahead and implementing it as a non-standardised behaviour, as it might not be correctly supported or result in the same outcome in all environments). It introduces a confusing ambiguity for U+05BA—ḥolam ḥaser for vav and ḥolam male for alef—but the biggest objection is likely to be that it introduces a significant encoding change. U+05BA for vav is an optional character for documents that want to make a distinction between ḥolam male versus vav ḥaluma instead of having U+05B9 serve for both in a single location. But U+05B9 on alef is always ḥolam ḥaser, and ḥolam male on alef never shares the same dot position as ḥolam ḥaser. So existing practices for handling ḥolam male on alef always encode it after the preceding letter, and switching to U+05BA to encode it would require a change in how documents are created and stored, with complex equivalency issues for searching, indexing and sorting, as well as security concerns.
  • Thanks, John, for these interesting considerations. To be clear about what I proposed and demonstrated: the typing order is alef first, then holam, as usual, only that the anchor points place the holam at the upper right. What would prevent that from working in any environment? I’ve always believed that the anchor positions were agnostic and I’ve always treated them accordingly, with great success. After all, there are plenty of examples in which similar things happen with the upper taamim (e.g., geresh muqdam, 059D), and the lower ones, too (e.g., yetiv, 059A). They certainly work in InDesign and also when I save texts in html. As you know, my anchor array is much, much simpler than yours, so there's no tripping over the shoelaces.

  • John Hudson
    John Hudson Posts: 3,227
    To be clear about what I proposed and demonstrated: the typing order is alef first, then holam, as usual, only that the anchor points place the holam at the upper right. What would prevent that from working in any environment?
    Potential shaping engine assumptions about the application of U+05BA only to vav, which at worst might result in insertion of a dotted circle between the alef and U+05BA to indicate that it is an invalid sequence. I am not saying that this will definitely happen in any particular shaping engine—one might test all the current ones and find out that this is not a problem—, but the point remains that unless the character behaviour is specified in Unicode, there are no certainties what a shaping engine should do if U+05BA is applied to any letter other than vav.