I'm looking for a resource that shows letters and their most common immediate neighbours. Does anyone know one for Latin script languages? I know the
COD provides some pairs for diacritic glyphs. And that
Latin+ has some info on 'leftish' and 'rightish' neighbours — meaning they are merely left or right of them.
Edit: I realised what I'm looking for are lists of bigrams by frequency for each letter.
Comments
http://stackoverflow.com/questions/14168601/nltk-makes-it-easy-to-compute-bigrams-of-words-what-about-letters
http://www.indiana.edu/~clcl/Papers/LFE.pdf
http://practicalcryptography.com/media/cryptanalysis/files/english_bigrams_1.txt
here's also something in case you didn't find it already.
http://practicalcryptography.com/cryptanalysis/letter-frequencies-various-languages/
If this was completely misunderstood - then nevermind me ¯\_(ツ)_/¯
Also:
https://web.archive.org/web/20050926082530/http://www.sudtipos.com.ar/test01.txt
https://web.archive.org/web/20050329080256/http://www.sudtipos.com.ar/test02.txt
https://web.archive.org/web/20050329081130/http://www.sudtipos.com.ar/test03.txt
https://web.archive.org/web/20101112040808/http://typophile.com/node/30960
And:
https://web.archive.org/web/20070510225322/http://just.letterror.com/ltrwiki/LetterFrequencyMeter
https://github.com/mooniak/textual-tools
Here is the link to the software “Wortgenerator” (Freeware, can speak English): www.sttmedia.de/wortgenerator-download
Interesting program, via Google translate I understand the app comes with frequency lists of 'syllables'? I'm trying to compile data to see for each letter, what are their most common left and right neighbours (and case sensitive). Does a digraph qualify for this? I thought a digraph was always a phoneme, so that if I get two letters part of two different sounds that does not qualify, i.e. em in housemeister?
Your approach is very interesting. I have made the same, but studied the “European Convention on Human Rights“ for different languages.
“Wortgenerator” has two functions: It can generate syllables/words and ”counting“ texts. For you, the second function “Counter” comes into consideration. Here you can load ”Plain Text Files“ and counting them in different specifications, for example, the number of letters (letter frequency, with or without differences to case), 2-pairs (diagrams), 3-pairs (trigrams) … real syllables, words.
When I copy the text of https://en.wikipedia.org/wiki/Typography and examine it, I get the following analysis (setting: diagrams, occurrence > 1%) – this you can save as CSV and continue working in Excel. Is this what you are looking for?
I am very interested in the result of your investigation. Please keep me up to date!