Been struggling to assemble a set of test / specimen text samples that cover the set of Unicode scripts. Might be a useful resource for font developers.
I'm
not looking for an exhaustive character set (which could be auto-generated) but something more user-friendly (for a specimen book) and serve as a lightweight initial test of a font with extended script coverage. Yes, there are issues of which language to use for a given script, and regional variations, but I'm thinking as a lightweight test, a set of available texts might be useful for font work.
Some options I've looked at:
- Article 1 of the UDHR (Universal Declaration of Human Rights - "All human beings are born free ...") available at https://github.com/unicode-org/udhr. Good length. However, the 500-ish translations cover only 43 of the 150 Unicode scripts (I'm still on Unicode v12.1).
- Genesis 11:1 ("Now the whole world had one language and a common speech"). A bit short. I've collected about 76 scripts (only 50%), but and many of the "under-served" scripts have only images available, not Unicode text.
- Pangrams. Would need development for less-used scripts, and that is daunting.
- Representative Characters. A small set of characters that demonstrate the typographic attributes of that script. This might be useful, but is typically very short, does not represent body text, and does not give the 'feel' of the script from a user perspective.
- Character Strings. For scripts I do not have any of the above, I have been falling back on a character string of the first hundred or so assigned characters, excluding combining diacritics and other oddballs, with some random spaces thrown in to approximate body text. Pretty poor substitute for body text, but that's all I've come up with ...
Any thought or suggestions in this area would be appreciated!
Comments
a few suggestions: ‘woman’ ‘man’ ‘earth’ ‘wind’ ‘fire’ ‘water’ ‘language’ ‘history’ ‘constitution’
Also, in chrome settings you can set your default sans font to the font you are working on. Then you can just press "random article" for new material :0
To transfer a wikipedia article to a word document, I copy and paste it into TextEdit, then use shift+command+T twice to remove all hyperlinks quickly. Then I use find and replace to find all instances of ' [ ' to remove all the notations. Then I fix the formatting of paragraphs if necessary.
Here's the one I'm currently using as a pdf (I guess copy paste out of it? I can't post pages documents on typedrawers).
I also have a lot of sample texts for spacing special characters like ŋ and ѭ which I post as pdf or make a google doc or something if you want lol
Over 99%of all species on earth rotates about 29%of anaerobic and the number of earth today vary widely;most species on earth's surface is tilted with other fresh water, life may have arisen as an earth, earth's gravity interacts with other objects in the densest planet from the arctic ice pack.
Over many millions of the densest planet from the densest planet from the four rocky planets.
Over many millions of the sun and the sun and the moon causes tides, earth orbits around the third planet in space, which all species on earth's history have arisen as early as early as early as 4.
The proliferation of the first billion years.
7 billion years ago.
256 sidereal year has gone through long periods of the surface is the combination of earth today vary widely;most massive of earth's interior remains active with respect to affect earth's only natural satellite.
5 billion years ago.
Since then, a period known to harbor life on earth's only astronomical object known to its axis, aerobic organisms.
Earth and other fresh water, earth is land consisting of the sun in the majority of rotation is land consisting of expansion, later, occasionally punctuated by oceans but also lakes, that generates earth's polar regions are covered in the sun and the moon, later, rivers and thrive.
The world has 366.
Earth's surface over 99%of species that drives plate tectonics.
The third planet from the proliferation of the majority of continents and the densest planet in ice pack.
Since then, the remaining 71%is land consisting of earth is, physical properties and the remaining 71%of the only astronomical object known as 4.
According to radiometric dating and depend on earth, life may have arisen as an earth.
Earth sidereal year has around the surface, a convecting mantle that drives plate tectonics.
You can use word lists and create random (senseless) texts using some probability method and take care, that nearly all characters and most frequent letter-bigrams appear. Something like this is used to generate training data for OCR-Systems. Of course you need (or not) punctuation and numbers.
http://crubadan.org/ has wordlists with frequencies for 2,228 languages, max. 50,000 different words per language. But they are only living languages. E. g. Latin and ancient Greek is missing. Generally it will be hard to get texts for ancient scripts like Cuneiform.
Another text available in many languages is the Lord's Prayer.
http://www.krassotkin.ru/sites/prayer.su/other/all-languages.html has ~370 versions.
https://wikisource.org/wiki/The_Lord%27s_Prayer has 108 versions.