A Probabilistic Approach to Diachronic Phonology
Alexandre Bouchard-Cote, Percy Shuo Liang, Daniel Klein and Thomas Griffiths
NDSEG Fellowship, Microsoft, National Science Foundation and FQRNT
Languages evolve over time, with words changing in form, meaning, and the ways in which they can be combined into sentences. Several centuries of linguistic analysis have shed light on some of the key properties of this evolutionary process, but many open questions remain. The study of how languages change over time is known as diachronic (or historical) linguistics.
Most of what we know about language change comes from the comparative method, in which words from different languages are compared in order to identify their relationships. The goal is to identify regular sound correspondences between languages, and use these to infer the forms of proto-languages and the phylogenetic relationships between languages. The motivation for basing the analysis on sounds is that phonological changes are generally more systematic than syntactic or morphological changes. Comparisons of words from different languages are traditionally carried out by hand, introducing an element of subjectivity into diachronic linguistics. Early attempts to quantify the similarity between languages made drastic simplifying assumptions that drew strong criticism from diachronic linguists. In particular, many of these approaches simply represent the appearance of a word in two languages with a single bit, rather than allowing for gradations based on correspondences between sequences of phonemes. We take a quantitative approach to diachronic linguistics that alleviates this problem by operating at the phoneme level. Our approach combines the advantages of the classical, phoneme-based, comparative method with the robustness of corpus-based probabilistic models. The model is fully generative, and thus can be used to solve a variety of problems. For example, we can reconstruct ancestral word forms or inspect the rules learned along each branch of a phylogeny to identify sound laws. Alternatively, we can observe a word in one or more modern languages, say French and Spanish, and query the corresponding word form in another language, say Italian. Finally, models of this kind can potentially be used as a building block in a system for inferring the topology of phylogenetic trees.
Figure 1: We estimate a contextualized model of phonological change expressed as a probability distribution over rules applied to individual phonemes
Figure 2: An example of reconstruction of phonological rules (see paper for details)
- A. Bouchard-Côté, P. Liang, T. Griffiths, and D. Klein, "A Probabilistic Approach to Diachronic Phonology," Proceedings of EMNLP, 2007.