Latin Extended-B Languages?

Hey there, 

I wanted to know which languages are part of the Latin Extended-B set. I can find the glyphs but not the languages that they support.

Are they very 'important'? Or should I focus on other languages like Cyrillic/Greek?

Up to now, no client has ever asked for Latin Extended-B, never the less, a lot of 'pro' typefaces seem to have his character set covered.

Thank you

Comments

  • Thanks John and Igor, this really helps!
  • edited September 2015
    Yup. Don’t do the whole block just for the sake of doing the whole block, unless you’re aiming to cover all Latin characters that are in Unicode.

    There are two things you need to consider when looking at Latin Extended-B.

    As John says, this is a mix of characters from various sources. The way they are grouped is also problematic, in particular the 'Non-european and historic' group which contains both characters used by millions and character used only in historical documents or document related to them.
    As a client, it would make more sense to look for specific characters rather than the whole block.

    The other thing is that some of the letters in Latin Extended-B have their uppercase or lowercase in a different Unicode Block, or some letters are only used in orthographies that also use some characters in other Unicode Blocks : IPA Extensions, Latin Extended-C, Latin Extended-D, Combining Marks.

  • ivan louetteivan louette Posts: 156
    Is there a list of the language tags corresponding to latin extended A and B ? Otherwise what must I do to for example if I want that all my subcaps work for any language ?
  • ivan louetteivan louette Posts: 156
    Thanks ! That's a good idea. But for Roman language for example it seems that some characters are lacking in Latin Extended A and are included in Latin Extended B. Thus I should probably check which set of characters are needed for the language I would support.
  • ivan louetteivan louette Posts: 156
    edited June 19
    I added the scommaaccent and tcommaaccent (which are part of Extended B ) and now my small caps in Roman language work fine. Thanks again !
  • Mark SimonsonMark Simonson Posts: 947
    Do you mean "Romanian"?
  • ivan louetteivan louette Posts: 156
    @Mark Simonson Yes sorry :smile:
  • Vasil StanevVasil Stanev Posts: 204
    I use the OTM by URW++ to check which languages are covered by my font.
  • ivan louetteivan louette Posts: 156
    @Vasil Stanev & @Paul Miller  thanks a lot for these resources ! I am a newbie in this area and that's so valuable !
  • John HudsonJohn Hudson Posts: 1,442
    edited June 20
    OTM and several other tools/resources use Unicode CLDR database. It's a good starting point for mapping characters and languages, but should be used with caution if trying to determine what characters are needed for a language. A good example is the legacy digraph characters that CLDR identifies as Croatian: these were inherited into Unicode from Yugoslav 8-bit encodings that were designed to enable one-to-one transcription between Cyrillic and Latin orthographies for Serbo-Croatian. So in a sense, yes, these are Croatian characters in that they represent sequences of letters used in the Croatian Latin orthography to write phonemes represented by a single letter in Cyrillic, but these legacy characters are not actually needed for Croatian, don't appear on Croatian keyboards and so forth.
  • ivan louetteivan louette Posts: 156
    Thanks again.

    And a related question (which could even open another topic) : is it possible to define several different language specific kerning sets into the same font ?
  • Paul MillerPaul Miller Posts: 126
    I have defined different kerning sets for different languages with Font Creator, I expect it is also possible with Fontographer.
  • John HudsonJohn Hudson Posts: 1,442
    You can define different kerning sets for different languages using OpenType GPOS kerning. In basic terms, you create separate lookups and assign them to different language system tags. Note, however, that application of language-specific kerning is dependent on a) text language being correctly tagged, and b) software recognising the tags and knowing to apply the specific OT language system instead of the default.
  • Mark SimonsonMark Simonson Posts: 947
    ...Fontographer?
  • ivan louetteivan louette Posts: 156
    @John Hudson Thanks a lot ! That's exactly what I expected but I wasn't absolutely sure of that.
  • Thomas PhinneyThomas Phinney Posts: 1,092
    Fontographer certainly does not handle that. I think you could supplement/replace the usual kerning with manual coding in FontLab 5 or VI to get language-specific variations, if you wished, but you'd need to take care not to overwrite your special code with an auto-generated kern feature.
  • I use the OTM by URW++ […]

    For the record: DTL OTMaster (OTM) is a product of the Dutch Type Library (DTL). OTM is jointly developed with URW Type Foundry (formerly URW++). The programming is done at URW in Hamburg, Germany. DTL and URW work together since 1991.

  • Paul MillerPaul Miller Posts: 126
    ...Fontographer?
    ... or Font Lab or Font Studio or whatever it's called.  The other one that isn't Glyphs, you know the one I mean !!
  • ivan louetteivan louette Posts: 156
    @Thomas Phinney Thanks, at the moment I work with FontForge and it can do it (and I am astonished about what it can do). I had Fontographer in the past and I liked it a lot. But there was a long gap (before OTF) where it was unusable by Windows users… and after that time I went to Linux. For some time I was also a (very newbie) Fontlab IV user, and I installed it also successfully on Linux with Wine, but I was already fairly comfortable with FF on Linux and its very useful cut and paste capabilities with Inkscape.

    I tried but never used auto-generated kerning in any program even if I understand that it can be an effective first step.
  • Kent LewKent Lew Posts: 782
    I think what Thomas meant by the “auto-generated kern feature” comment was that most commercial tools compile the {kern} feature out of the source file’s kerning data at the time of generation, and if you’re manually writing a {kern} feature into your .fea file in order to implement some kind of language-specific kerning, you may need to take care to see that the compiler gives precedence to your manual {kern} feature and doesn’t replace it with whatever native kerning data you might have in the source file when compiling.
  • @John Hudson Where does the CLDR identify those legacy characters as Croatian? Maybe that used to be the case but has been fixed since? The default characters exemplar sets of Bosnian, Croatian and Latin Serbian are all [a b c č ć d {dž} đ e f g h i j k l {lj} m n {nj} o p r s š t u v z ž], and the auxiliary characters exemplar sets of Bosnian and Croatian are [q w x y] and that of Latin Serbian [å q w x y].


  • John HudsonJohn Hudson Posts: 1,442
    edited June 21
    I did see the digraphs recently reported as Croatian characters in tools that I believe use CLDR. It may be an issue of how those tools choose to interpret the {} inclusions.
  • Michel BoyerMichel Boyer Posts: 107
    edited June 21
    What I understand from the Unicode collation chart for Croatian is that the characters between braces need to be taken as a whole in alphabetical ordering of Croatian words.

    I seem to be able to select individual characters (I mean select L and J individually) in that chart. On the other hand, if I copy the first line, paste it in a text editor and dump the unicode content of that file, I get those characters 
      01C4  LATIN CAPITAL LETTER DZ WITH CARON
      01C5  LATIN CAPITAL LETTER D WITH SMALL LETTER Z WITH CARON
      01C6  LATIN SMALL LETTER DZ WITH CARON
      01C7  LATIN CAPITAL LETTER LJ
      01C8  LATIN CAPITAL LETTER L WITH SMALL LETTER J
      01C9  LATIN SMALL LETTER LJ
      01CA  LATIN CAPITAL LETTER NJ
      01CB  LATIN CAPITAL LETTER N WITH SMALL LETTER J
      01CC  LATIN SMALL LETTER NJ
    
    I don't quite understand what is going on. I just checked and those characters appear to be hidden. They are not those that I could select as individual characters (LJ, lj etc).

    PS: If I select only the section between LJ and m, I get the following output from hidden text (with blanks removed)
      01C7  LATIN CAPITAL LETTER LJ
      006C  LATIN SMALL LETTER L
      0135  LATIN SMALL LETTER J WITH CIRCUMFLEX
      004C  LATIN CAPITAL LETTER L
      0135  LATIN SMALL LETTER J WITH CIRCUMFLEX
      004C  LATIN CAPITAL LETTER L
      0134  LATIN CAPITAL LETTER J WITH CIRCUMFLEX
      006C  LATIN SMALL LETTER L
      01F0  LATIN SMALL LETTER J WITH CARON
      004C  LATIN CAPITAL LETTER L
      01F0  LATIN SMALL LETTER J WITH CARON
    
    Three hours later: On a better screen at home I can now see those "hidden" characters (and much better with a large font; my eyesight is no longer what it once was).
  • Michel BoyerMichel Boyer Posts: 107
    edited June 22
    The MySQL 8.0 documentation clearly says
     Croatian collations are tailored for these Croatian letters: Č, Ć, Dž, Đ, Lj, Nj, Š, Ž.
    Three of those letters are digraphs, i.e. composed of two unicode characters (implying, so it seems, that we need to distinguish not only glyphs from characters, but also characters from letters!). There is no guarantee that the corresponding NFKC precomposed characters would be handled properly by MySQL when sorting on fields containing Croatian text and, for inter operability, I would not use the Unicode characters in the 01C4,01CC range in input files.
  • Thomas PhinneyThomas Phinney Posts: 1,092
    @Kent Lew Yes, exactly what I was trying to say. Thank you for putting it so clearly.
  • Igor FreibergerIgor Freiberger Posts: 117
    edited June 23
    Hudson's alert about CLDR is quite important. The CLDR has many errors. Some take it as main reference, like Underware, what makes their Latin Pro inconsistent. Take Portuguese, for example:
     

     
    1. The ò is not part of Portuguese alphabet. It can be called auxiliary to support older (pre-1973) orthographies, but not more than this.
    2. The ü is part of alphabet for countries which still did not adopt the 1990 reform, like Angola or Moçambique. Thus, to list it as auxiliary is wrong.
    3. The other characters (marked) are definetively not used.
     
Sign In or Register to comment.