please delete Adobe-Identity cidmap, or treat it specially. #3084

HinTak · 2017-06-09T13:24:26Z

I think in the past or even current, Adobe Acrobat Reader, and also the (x)dvipdfmx/XeTeX people treat(ed) the Adobe-Identity cidmap as a direct cid/gid to Unicode mapping, for the purpose of text extraction and what not.

Well, the Adobe Source CJK fonts, and Google Noto CJK fonts definitely not - in their usage, it is just "because I said so" custom encoding and carries no meaning whatsoever in relation to Unicode.

Unfortunately fontforge takes the former interpretation, and this causes the corruption of the encoding vector during Re-encode ( #3080 ). Simply deleting the file and forces fontforge not to treat Adobe Identity as direct cid->Unicode mapping, forces fontforge to treat the cmap properly and fixes the problem seen in #3080 .

So I suggest either simply deleting the cidmap, or at least provide an scripting API to selectively disable its use. Otherwise you'll not be able to process Adobe Source CJK / Google Noto CJK properly.

HinTak · 2017-06-09T13:35:56Z

This is also the main cause of #3079 .

HinTak · 2017-06-10T00:49:53Z

I think only cid-key fonts embedded in pdf's should be treated as unicode direct - standalone opentype fonts with CFF outlines should use the cmap, and only the cmap, for coding purposes.

HinTak · 2017-06-12T17:31:54Z

One suggestion I might make would be to make it a user preference.

Anyway, my solution was to rename the file temporarily so that fontforge cannot find it.

This comes about because Ubuntu ships it separate as extra, and can convert the font somewhat more correctly than Fedora's. Fedora ships fontforge complete as one package. See the entire May/June traffic ( http://lists.nongnu.org/archive/html/cjk-list/ ) .

HinTak · 2017-06-15T02:22:53Z

See also comment from @kenlunde http://typedrawers.com/discussion/comment/28483#Comment_28483

 The Identity-H encoding is used to refer to glyphs by their CIDs regardless of their ROS (Registry, Ordering, and Supplement). Per the PDF Language Reference Manual, it maps two-byte character codes ranging from 0 to 65,535 to the same two-byte CID value, interpreted high-order byte first. It has nothing to do with a mapping from Unicode. That mapping is handled via explicit Unicode mappings, or via a ToUnicode mapping table, which maps said Identity-H CIDs to meaningful Unicode values.

kenlunde · 2017-06-15T02:40:51Z

When processing a font that uses the Adobe-Identity-0 ROS, such as the open source Source Han Sans and Source Han Serif families, along with Kazuraki, it is prudent not to assume anything about its glyph set, and instead depend on the mappings in the 'cmap' table to derive Unicode mappings. In other words, Adobe-Identity-0 ROS OpenType/CFF fonts should be treated like typical TrueType fonts with regard to how their glyphs correspond to Unicode code points or sequences.

frank-trampe · 2017-11-29T16:27:09Z

@HinTak, if we fix the multiple encoding problem correctly, it would fix this too, right?

HinTak · 2017-12-03T22:47:16Z

No - Fontforge seems to merge Adobe-Identity as an extra cmap.

Alternatively I suppose the answer is yes - to fix the multiple encoding problem correctly in the practical sense (the common case of trying to edit Adobe San CJK) , Fontforge needs to be able to ignore Adobe-Identity cidmap somehow... So deleting/ignoring Adobe-Identity cidmap is part of the steps towards fixing the multiple encoding problem in Adobe San CJK.

kenlunde · 2017-12-04T02:34:36Z

It seems that I need to re-state what I wrote on June 14th:

In other words, Adobe-Identity-0 ROS OpenType/CFF fonts should be treated like typical TrueType fonts with regard to how their glyphs correspond to Unicode code points or sequences.

This means that the Unicode mappings should be derived only from the font's 'cmap' table.

This was referenced Jun 9, 2017

Reencode() silently mis-encodes type1 fonts with Unused/detached glyphs #3080

Open

Misleading type 1 /uni* glyph names, last part is glyph ids, not unicode code points. #3079

Open

dscorbett mentioned this issue Apr 18, 2022

Use 'cmap' for Adobe-Identity-0 CID fonts #4993

Merged

jtanx closed this as completed in #4993 Apr 24, 2022

ctrlcctrlv mentioned this issue Oct 21, 2022

Restore Adobe-Identity-0.cidmap for those who need it. #5131

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

please delete Adobe-Identity cidmap, or treat it specially. #3084

please delete Adobe-Identity cidmap, or treat it specially. #3084

HinTak commented Jun 9, 2017

HinTak commented Jun 9, 2017

HinTak commented Jun 10, 2017

HinTak commented Jun 12, 2017

HinTak commented Jun 15, 2017 •

edited

kenlunde commented Jun 15, 2017

frank-trampe commented Nov 29, 2017

HinTak commented Dec 3, 2017

kenlunde commented Dec 4, 2017

please delete Adobe-Identity cidmap, or treat it specially. #3084

please delete Adobe-Identity cidmap, or treat it specially. #3084

Comments

HinTak commented Jun 9, 2017

HinTak commented Jun 9, 2017

HinTak commented Jun 10, 2017

HinTak commented Jun 12, 2017

HinTak commented Jun 15, 2017 • edited

kenlunde commented Jun 15, 2017

frank-trampe commented Nov 29, 2017

HinTak commented Dec 3, 2017

kenlunde commented Dec 4, 2017

HinTak commented Jun 15, 2017 •

edited