On Hebrew there is a need for another mark on the unicode, but there no formal unicode for it.
Actually this need is more like a stylish need, cause it not a new form - just the same form that notes the way of pronouncing it on a context condition.
For example: the 'a
' letter sounds different at the word 'car
' and at the word 'cat
'. So i have made a spacial 'a' to note the reader that on the word 'cat' is sounds more like an 'e
This is not a real example, just a way of looking to understand this thing on Hebrew.
The Unicode.org did that on Hebrew for the standard mark QAMATZ [sounds like 'a'] adding Qamats Qatan [sounds like 'o'] - and really it is just a stylish different and not real new mark form:
On the Unicode.org
05B8 ◌ָ HEBREW POINT QAMATS
used generically or as qamats gadol in orthography which distinguishes that from qamats qatan
05C7 ◌ׇ hebrew point qamats qatan
But there is another two marks that have the same need:
For the foraml SHEVA [u+05B0] there is a need for SHEVA NA - to note that it sounds like 'e'.
And i have found a letter to the Unicode.org
asking them to do so.
The second is for the formal DAGESH [u+05BC]
there is a need for DAGESH HAZAQ to note that it sounds more 'strong'.
My question is what should font designers do meanwhile - as long as there is no formal unicode for it?
It it not good just putting it on Private Area category, cause it will not get the LTR bidi direction.
One most appreciated font designer for Hebrew made a 'liga' feature that replaces a sequence of two SHEVA marks to SHEVA NA that he designed on the General Punctuation category [at U+200C that formaly is Zero Width Non-Joiner].
So it looks not like the right way of doing it.
I thought that the proper way for that is putting it on the Private Area category, and applying a Swash [swsh] feature for it.
So... what to you say what is the best way of doing it?
There is another nikkud (Hebrew for vocalization mark) that should be first on the list: the chataf kamatz katan ("reduced kamatz katan"), which is considered an essential “balancing” phoneme in liturgical Hebrew (see attached sample). Typographically speaking, the kamatz katan has little value if there is no chataf kamatz katan—a situation parallel to the regular (called gadol = greater) kamatz and the chataf kamatz, both of which are in the standard Hebrew Unicode.
The kamatz katan, chataf kamatz katan, sh’va na, and dagesh hazak were not used in the Masoretic texts of the Hebrew Bible, which were written in what is referred to as the Tiberian System of vocalization (nikkudot = vowels, plural.) and cantillation (taamim = chanting trope). The system was developed in Tiberias in the 9th-11th centuries C.E. They are, rather, latter-day notions. No such typographic differentiations appeared before the 19th century, and only then in grammatical texts (Gesenius, et al.) The earliest grammatical work to mention them are the ones included in the 1546 collection Dikdukim by Elijah Levita, which includes six or so major grammatical treatises of the early 16th century. But from these you can come away with ideas that suggest there may be as many as 10 forms of the sh’va! It’s very esoteric and, for modern users, might make matters more obscure than they are already.
In my work for the recent official prayerbooks for the Conservative and Reform congregations (worldwide, though mostly U.S.), we have included the chataf kamatz katan and, in the Conservative prayerbook for Sabbath and festivals, the sh’va na—however, not the dagesh hazak. Since all of the Hebrew fonts were made by me and are held privately, I found my own workarounds, using ccmp and GSUB with glyphs in the Private Area. Since the fonts are not for sale, I felt no need to petition the Unicode Consortium, though I did look into it. Because we use modern punctuation with Hebrew liturgical (not biblical) texts, I always add a Hebrew set that operates through a locl feature. I use special typing sequences to access the out-of-unicode glyphs via GSUB.
To give credit where credit is due, I should say that I worked on some of these solutions with Ben Kiel.
One more thing: readers of liturgical Hebrew outside of Israel require the use of the meteg (see sample above) to mark non-ultimate syllable stresses. Israeli type designers who include nikkud never include it, expect in the rare instances of fonts with cantillation makes, in which the meteg has a different function (and is called a siluq). They should.
It is good to search existing proposals both to verify if the same characters are not already proposed and to see examples of successful proposals. Maybe a direct contact with Michael Everson —the researcher who has more proposals approved by Unicode— could provide additional information and guidance.
And, while no Unicode addition is available, Hudson's idea about Stylistic Set or Character Variant is surely the best provisory solution.
Thank you, John and Igor. The Character Variant feature is certainly an option, while the use of Stylistic Sets is not, as it does not offer a case-by-case option. All the of characters mentioned by Yeshurun and me coexist with the standard set, not “in place of” throughout a document or paragraph.
Yeshurun, I have used the Swash feature in some fonts to engage alternative punctuation. It’s perfectly effective, though I don’t think it would work for alternative diacritics for the reason stated above regarding Stylistic Sets.
There may be yet one more nikkud that should be included in the Unicode: the patach g’nuvah, which is a regular patach (05B7) that’s offset to the right when it appears under a hei or chet that is the final letter in a word. I have automated the placement through substitution strings (hei-patach-space, hei-patach-period[etc.]). It may be a more elegant solution than having a separate Unicode designation. But then again, the Unicode for Hebrew contains many glyphs that could have been achieved by other means. I suspect that that was the result of the Unicode having come into existence before OpenType, from which it tries to remain independent.
To be clear, once again, I am only suggesting this as a work around in the absence of distinct Unicode codepoints for these variant marks. Obviously the better way to handle this would be to write proposal documents to have these variant marks encoded.
John, there is an important aspect of working with type in large documents that type designers, OT engineers, and application designers seldom take into account: the ease of access of certain glyphs and features in the workflow. In InDesign, where I spend much of my time, there are three steps to get to a Stylistic Set: a pull-down, a secondary pull-down, then the selection of the desired Stylistic Set. It’s a lot faster to simply make a selection in the glyph palette, and easier still to add an extraneous glyph (an equals sign, for example) to each occurrence of a special glyph that will actuate an automatic substitution. Of course, one has to add this to the substitution table, but that needs to be done just once.
Poor implementation of OT features in InDesign is a major problem for type designers. I remember that, some years ago, there was an organized effort to get Adobe to change their OT interfaces, but nothing came of it. I also remember that the panel they convened did not include a single person who one might consider to be an expert typesetter.
sheva + sheva = shevana
dagesh + dagesh = dagesh hazak
Both in CCMP feature.
This work in InDesign but not in other programs.
In InDesign, you can assign a Stylistic Set feature to a Character Style, which gives you a single click from a high level menu to apply the style. I believe it is even possible to map a Character Style to a keyboard shortcut.
But in the past 10 years it have been changed for it is more comfortable for the 'reader'.
So now it is a regular thing that most people are use to it on prayer books and Bible
This way people are more aware to this change, that is important in prayer and Bible, but less important while speaking.
Yes, this changes the texts but it is easy to fix them using find/change.
I can’t see a fastest way to do this.
It's already done and already rejected:
Feliz ano novo para você e todos aqui.
Happy New Year to all here.
The rejected proposal does not follow the standard Unicode asks for. To be properly considered, a proposal needs to (1) present a set of information about what, where, how and why would be encoded, with as much usage references as possible; and (2) fill a form with a set of information accordingly the internal Unicode terms and standards.
Maybe a new proposal for this, but following the Unicode conditions, could be accepted. Two samples, a simple and a complex, are attached to give a better idea of how they expect to be a proposal.
A great 2019 also for you and all fellows in TD! (And I hope the encoding idea of these Hebrew marks could advance and become fruitful.)
I cover ḥolam malei for alef and some other topics relevant to this thread in the documentation for my just-released Taamey D font.
In Taamey D I supply a ḥataf qamats qatan glyph via a stylistic set, an idea discussed in this thread. Unlike some of the other glyphs mentioned in this thread, a ḥataf qamats qatan glyph would typically be used for all ḥataf qamats code points in a document or not used at all. So it is, perhaps, a stylistic variant in the purest sense. It would typically be used for all ḥataf qamats code points in a document that uses the qamats qatan code point. I'm not implying that other uses of the OpenType stylistic sets mechanism are abuses; I just mean to point out that a ḥataf qamats qatan glyph is a particularly good/easy fit for that mechanism. Indeed, it is such a good fit, that I wonder whether a new code point could really be justified!
Ben, regarding the ḥolam malei for alef:
There is a major phonemic difference between the circumstances of ḥolam ḥaser vav and your suggestion of a ḥolam malei alef glyph. The sound of ḥolam malei vav, which predated the inclusion of ḥolam ḥaser vav in the Unicode, is “o,” whereas the sound of the holam haser vav is “vo”—that is, with a voiced vav. Holam ḥaser vav appears, for example, in the word “mitzvot” (commandments), which is amongst the key tenets and most frequently appearing words in the Hebrew Bible. It is, therefore, an orthographic necessity. A ḥolam malei for alef, on the other hand, would only make a differentiation of interpretive grammar, the need for which is entirely arguable, and I would suggest is unnecessary as a matter of typography. If one were to require such a thing for say, an educational text, it would be entirely appropriate to achieve it through manual intervention or a custom font. Moreover, the issue as you present it in your GitHub post is, in my view, more a matter of poor letterform shapes and, more directly, poor anchor positioning.
In my view, this last point applies to a number of the issues you present on GitHub. Your Taamey D font is, more or less, an iteration of the Frank-Ruehl design. (For those of you not familiar with Hebrew typography, Frank-Ruehl is a design released 1908-1910 by the C.F. Rühl foundry in Leipzig, which was taken over in the next decade by Berthold.) The proportions of the letters do not lend themselves to good performance with the entire panoply of Hebrew diacritics used in Bible typesetting, or even in liturgy. That the font remained as popular for so long has much to do with an accident of history (the events of 1932-1948) that led to a kind of inertia. It’s time to move on and stop trying to make these letterforms accommodate features that they physically cannot. (Sefaria.org seems to get around some of the problems by tracking out or adding side bearing space to the letters—not so nice, but workable.)
The need for ḥataf kamatz katan in the Unicode is unequivocal, for reasons that I pointed out some years ago, above. To include the kamatz katan, but not the ḥataf kamatz katan, is the equivalent of not including any of the ḥataf-vowels (all the others are in the Unicode since Day 1). The ḥataf kamatz katan is used in all of the prayerbooks I have produced for the Reform and Conservative congregations since 2008. As these now number nearly 1.8 million in regular use by more than 4 million users, it can be said that they reflect the majority position. Moreover, the ḥataf kamatz katan appears also in the Israeli Koren publications and others. Case closed. It’s a reality that’s beyond the need for a proposal. The same is true for sh’va na and probably for dagesh ḥazak, as well.
NB: All crazy people write long posts, but not all long posts are written by crazy people.
I either don't agree with or don't understand your points about my idea of a need for a Unicode code point for a ḥolam malei dot for alef.
Correct me if I'm wrong, but what you're saying seems to rely on the pervasive but incorrect idea that alef is always silent, or at least that it is never a "real" consonant. As you are well aware (but I will state for other readers) alef, like vav, is sometimes a (silent) mater lectionis and sometimes it represents a consonant sound. In the case of alef, that consonant sound just happens to be a glottal plosive. If we use the right half ring symbol for alef, the difference between "o" and "ʾo" is no less real than the difference between "o" and "vo". But you seem to privilege the "vo/o" difference as more real than the "ʾo/o" difference? Again, correct me if I have misunderstood you and therefore misrepresent you.
Perhaps a glottal plosive alef is more subtle to hear than a voiced labiodental fricative vav. But it is no less real, either to linguists or chanters. It may seem less real by accident of the fact that we have no letter for it in the Latin alphabet. So, lacking a letter for alef, we have to resort to small, exotic marks like the right half ring. Or we resort to punctuation like the dash commonly used in "uh-oh".
To summarize the situation:
If this is acceptable, why didn't vav get the same treatment, i.e., why did we need ḥolam ḥaser for vav? Why didn't the powers that be have U-HOLAM be, very nicely, a ḥolam ḥaser dot for any letter including vav? This was actually a proposal floating around in those bad old days: see, for example, "Holam Male as <HOLAM, VAV>" in Holam Male (gentlewisdom.org). But it was rejected for the same reason I see a problem with alef today: the inference is too tricky.
- We do have a code point for a ḥolam ḥaser dot on alef: it is just U-HOLAM, the way a ḥolam ḥaser dot is represented on any letter other than vav.
- We don't have a code point for a ḥolam male dot on alef. Instead, we have to infer when a U-HOLAM on the letter preceding the alef is in fact intended to be a ḥolam male dot on the alef.
Ben, I do not disagree with you about the sometimes consonantal function of alef. Where I do disagree is whether it requires a formal typographic differentiation, especially as one is already available to you through the use of the OT mark (mk) feature. All you would have to do, then, is add an anchor and change the typing order. It doesn’t matter which dot you use(!!), so long as it has the right anchors.
You ask, again, why is this different from vav cholam chaser?” Technically speaking, it isn’t, but historically speaking (I’m referring to the history of Unicode and OpenType), it is. All of us who work with accented scripts in OpenType suffer with the historical fact that many (most?) of the Unicode code sheets came into existence well before the advent and widespread implementation of OpenType features. Had these developments happened in reverse chronological order, it is likely that most precomposed accented glyphs would never have come into existence. In Hebrew, for example, there would never have been a need for the range of dagesh characters or the precomposed accented Yiddish glyphs, and so on. But by the time sufficiently featured OT fonts were possible, there was already a vast amount of standing editorial matter that was composed with older fonts, and applications that were slow to implement newer features. (Word for Mac still lacks adequate support for right-to-left languages.) Isn’t that what GSUB tables are for? Yes, for sure, but there are still residual issues from text typed in certain apps with certain fonts. Many of the most commonly used Hebrew fonts don’t work very well with their diacritics. In the tech world, the word “legacy” often means “old, burdensome junk that precludes better solutions.
I'll restate my problem without reference to fonts, since reference to fonts may have confused the issue. My problem is:
I would like three different encodings for the following three different phenomena:
But today's Unicode does not give me a way to distinguish #2 from #3.
- ḥolam ḥaser dot on an alef
- ḥolam male dot on an alef
- ḥolam ḥaser dot on a letter preceding an alef
I think you feel that some (all?) of these distinctions are unneeded, or rarely needed. That is a valid debate to have with respect to the editorial policies of a particular publication or application. But ideally Unicode should enable all of those distinctions to be made, and then users of Unicode can decide which distinctions are appropriate for their publication or application.
Although I am critical of BHS in many ways, in this case I think the fact that BHS makes these distinctions gives considerable weight to the idea that some publications or applications may reasonably want to make these distinctions.
I conclude with the BHS version of four of the words I use in my Taamey D documentation, showing BHS's distinction between phenomena #2 and #3. Sorry for the blurriness of some of them.
Ben, just create an anchor set for where you want the cholam to attach and, voilà, you’ll have it. Why ask the Unicode to add a code point when you can get what you want with a simple tweak to the font? (When the vav cholam chaser was added there was no choice—John Hudson, please correct me about the chronology if I’m wrong about that.)
By the way, the cholam chaser on the vet and resh, in the third and fourth examples, are too far to the right and therefore outside the typographic tradition, adding to your troubles. It’s the same elsewhere in this font. They should be fixed. Also, the shin dot is too large—which is likely why the cholam chaser is in the wrong place. It should be smaller than the cholam dots.
This is too boring and esoteric for this forum. Get in touch with me offline if you wish.
The ḥolam male versus vav ḥaluma distinction at the mark encoding level was necessary because in both cases the combining mark is applied after the vav letter, so there is no way to distinguish between the semantics of a single dot character on the basis of its position, because there was no way to distinguish its position at the font level without additional information. So Unicode, recognising that the existing ḥolam dot U+05B9 was already functioning as ḥolam ḥaser on most letters but as ḥolam male on vav, added the ḥolam ḥaser for vav character U+05BA. If one were starting Unicode encoding from scratch, it would make more sense of course to have separate ḥoman ḥaser and ḥolam male characters, and avoid the dual role of U+05B9 that later required disunification.
The situation with alef is different. The U+05B9 character applied to alef functions as ḥolam ḥaser, as it does on all other letters except vav, so the same disunification did not apply. As Ben notes, this means that there is no ḥolam male dot in Unicode specifically for use on alef, rather it falls to the font to grab the ḥolam U+05B9 off a preceding letter in order to contextually position it on the right side of the alef when in a ḥolam male role. Is this ideal? Probably not, in terms of the complexity it requires in Hebrew fonts, but it works.
But today's Unicode does not give me a way to distinguish #2 from #3.
- ḥolam ḥaser dot on an alef
- ḥolam male dot on an alef
- ḥolam ḥaser dot on a letter preceding an alef
Test string: לרֹאי
Oh, and I think this discussion is 100% appropriately boring and esoteric for this forum.
I was also interested to see Peter Kirk's (rejected) proposal to use ZWNJ with plain old HOLAM to represent ḥolam ḥaser for vav.
Along those lines, looking at that and other old proposals reminds me that there are ways to use ZWJ and/or ZWNJ to address this alef issue without adding a code point, which is good, because it seems astronomically unlikely to get a code point added for this.
OK—boring we shall be!
Ben, here’s a more specific and simple solution for you: As the ḥolam ḥaser for vav (05BA) has only one usage, it would be simpler to give it an additional usage, with alef, than it would be to create a new Unicode code point or try to use the regular ḥolam (05B9) for a secondary purpose, as it is already used in the “normal” way with alef. See the picture, below.
The second line shows the traditional typographic placement of the ḥolam—off the upper left corner of the character. Your Frank-Ruehl designs do not do this and I believe it they are examples of an inferior practice. (Note that I didn’t say “incorrect.”) The great Renaissance master Guillaume Le Bé, who cut more Hebrew types than anyone, ever, sometimes cast letters with both preceding and succeeding ḥolamim. The dots were not in the punches, but rather were added to the justifieds. It appears to me, having carefully examined his materials at the Plantin-Moretus Museum, that he used a small hand drill with a depth stop (what jewelers refer to as a “dori” drill) for these dots, as he did for shin and sin as well as the dagesh. Whereas is Bible typesetting, in which the compositor was working on three lines simultaneously (upper nikkud and taamim; the main character; lower nikkud and taamim), other types of literature did not require the top line. Because the ḥolam dots were cast as overhanging kerns, the compositor could easily break off the one that was not required. This alleviated any problems in casting an imbalanced number of sorts and reduced the casting time.
The digital font I use here is after Le Bé, which I made with the participation of Matthew Carter.
(By the way, the glyph with two ḥolamim has no purpose in the digital world!)
Most of the issues raised by Peter Kirk in 2003 have been—or could be—easily addressed. The (furtive) patach g’nuvah can be achieved as a series of substitutions and put into a stylistic set (or some other feature). I have done this successfully. It’s my opinion that issues of Qere/k’tiv should be addressed parenthetically in digital typesetting. If you try to accommodate it within the words, you are doing no favors for the reader. Keep in mind that, in best practices, the position of the nikkud must be stable, not shifting left or right when overburdened. You have to achieve a size/weight balance. Don't throw out the baby with the bath water—and don't knock your head against the wall! I won't go on . . .
Your suggestion has the great advantage of being far less invasive, by not changing the meaning of any code point sequence whose meaning is currently defined: you are just defining the meaning U-HHFV on alef, whose meaning is currently not defined.
Thanks, John, for these interesting considerations. To be clear about what I proposed and demonstrated: the typing order is alef first, then holam, as usual, only that the anchor points place the holam at the upper right. What would prevent that from working in any environment? I’ve always believed that the anchor positions were agnostic and I’ve always treated them accordingly, with great success. After all, there are plenty of examples in which similar things happen with the upper taamim (e.g., geresh muqdam, 059D), and the lower ones, too (e.g., yetiv, 059A). They certainly work in InDesign and also when I save texts in html. As you know, my anchor array is much, much simpler than yours, so there's no tripping over the shoelaces.