New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
please delete Adobe-Identity cidmap, or treat it specially. #3084
Comments
This is also the main cause of #3079 . |
I think only cid-key fonts embedded in pdf's should be treated as unicode direct - standalone opentype fonts with CFF outlines should use the cmap, and only the cmap, for coding purposes. |
One suggestion I might make would be to make it a user preference. Anyway, my solution was to rename the file temporarily so that fontforge cannot find it. This comes about because Ubuntu ships it separate as extra, and can convert the font somewhat more correctly than Fedora's. Fedora ships fontforge complete as one package. See the entire May/June traffic ( http://lists.nongnu.org/archive/html/cjk-list/ ) . |
See also comment from @kenlunde http://typedrawers.com/discussion/comment/28483#Comment_28483
|
When processing a font that uses the Adobe-Identity-0 ROS, such as the open source Source Han Sans and Source Han Serif families, along with Kazuraki, it is prudent not to assume anything about its glyph set, and instead depend on the mappings in the 'cmap' table to derive Unicode mappings. In other words, Adobe-Identity-0 ROS OpenType/CFF fonts should be treated like typical TrueType fonts with regard to how their glyphs correspond to Unicode code points or sequences. |
@HinTak, if we fix the multiple encoding problem correctly, it would fix this too, right? |
No - Fontforge seems to merge Adobe-Identity as an extra cmap. Alternatively I suppose the answer is yes - to fix the multiple encoding problem correctly in the practical sense (the common case of trying to edit Adobe San CJK) , Fontforge needs to be able to ignore Adobe-Identity cidmap somehow... So deleting/ignoring Adobe-Identity cidmap is part of the steps towards fixing the multiple encoding problem in Adobe San CJK. |
It seems that I need to re-state what I wrote on June 14th:
This means that the Unicode mappings should be derived only from the font's 'cmap' table. |
I think in the past or even current, Adobe Acrobat Reader, and also the (x)dvipdfmx/XeTeX people treat(ed) the Adobe-Identity cidmap as a direct cid/gid to Unicode mapping, for the purpose of text extraction and what not.
Well, the Adobe Source CJK fonts, and Google Noto CJK fonts definitely not - in their usage, it is just "because I said so" custom encoding and carries no meaning whatsoever in relation to Unicode.
Unfortunately fontforge takes the former interpretation, and this causes the corruption of the encoding vector during Re-encode ( #3080 ). Simply deleting the file and forces fontforge not to treat Adobe Identity as direct cid->Unicode mapping, forces fontforge to treat the cmap properly and fixes the problem seen in #3080 .
So I suggest either simply deleting the cidmap, or at least provide an scripting API to selectively disable its use. Otherwise you'll not be able to process Adobe Source CJK / Google Noto CJK properly.
The text was updated successfully, but these errors were encountered: