Frequenza, lunghezza e omonimia. Un’analisi degli omonimi nel vocabolario di base italiano


Abstract


Abstract - My aim in this paper is to explore the relationship between word frequency, word length, and homonymy, through an analysis of the about 7,000 highest frequency lexemes that constitute the basic vocabulary in Italian (Vocabolario di Base, VDB). Data confirm that the development of homonymy is strongly related to word length: both in the overall lexicon and within VDB, word forms that are involved in homonymy are shorter than those that are not. At the same time, a strong correlation arises between word frequency and homonymy, since VDB lexemes are involved in homonymy to a greater extent than others: the percentage of lexemes whose forms have homonyms is much higher for the VDB (55%) than for less frequent lexemes (in the range of 10%-24%). Word length and word frequency seem to behave as two independent variables in favoring homonymy: the frequency being equal, shorter words have more homonyms; and the length being equal, more frequent words have more homonyms. This finding seems to support the hypothesis that the richness of homonymy in high frequency lexicon is not only due to the shortness of these words (i.e., the fact that the shorter the word, the more likely it is to find another word of accidentally the same form), but to an organization principle of language – that is, given the disambiguating power of context, language might assign a greater amount of ambiguity to words that are easiest to process, i.e. shorter and more frequent words.

Abstract - In questo lavoro mi propongo di indagare la relazione tra lunghezza e frequenza delle parole da un lato e sviluppo di omonimie dall’altro, tramite un’analisi del vocabolario di base (VDB) italiano. I dati confermano che la lunghezza è cruciale per lo sviluppo di omonimie: sia nel lessico complessivo che all’interno del VDB, le forme coinvolte in omonimie sono più brevi di quelle che non hanno omonimi. Allo stesso tempo emerge una forte relazione tra frequenza e omonimia, poiché i lessemi del VDB risultano coinvolti in omonimie in misura molto maggiore rispetto a quelli di tutte le altre fasce d’uso: la percentuale di lessemi le cui forme hanno omonimi è del 55% nel VDB, mentre nelle altre fasce d’uso oscilla tra il 10% e il 24%. Frequenza e lunghezza sembrano agire come variabili indipendenti nel favorire l’omonimia: a parità di frequenza le parole più brevi hanno più omonimi e a parità di lunghezza le parole più frequenti hanno più omonimi. Questo risultato è coerente con l’ipotesi secondo cui la ricchezza di omonimie nel lessico di alta frequenza delle lingue si spiega non solo con la brevità delle forme in questione (cioè con il fatto, del tutto accidentale, che queste trovano più facilmente degli omonimi), ma con un principio di organizzazione del codice linguistico; cioè con il fatto che, data la capacità disambiguante del contesto, le lingue sfruttino la possibilità di assegnare un maggior carico di ambiguità alle forme lessicali le cui caratteristiche ne facilitano l’elaborazione, ovvero quelle più brevi e di maggior frequenza.


DOI Code: 10.1285/i22390359v19p61

Keywords: Homonymy; Word frequency; Word length; Lexical semantics; Statistical linguistics

References


Alinei M. 1974, Semantic density in linguistic geography, in Weijnen A.A. and Alinei M. (eds.), The wheel in the Atlas Linguarum Europae: heteronyms and semantic density, North-Holland, Amsterdam, pp. 16-28.

Barlow M. and Kemmer S. (eds.) 2000, Usage Based Models of Language, University of Chicago Press, Chicago.

Bloomfield L. 1933, Language, Allen & Unwin, London; trad. it. di Antinucci F., Cardona G. 1974, Il linguaggio, Il Saggiatore, Milano.

Bybee J. 1985, Morphology: A study on the relation between meaning and form, Benjamins, Amsterdam.

Bybee J. 2001, Phonology and Language Use, Cambridge University Press, Cambridge.

Bybee J. 2007, Frequency of Use and the Organization of Language, Oxford University Press, Oxford.

Bybee J., Hopper P. (eds.) 2001, Frequency and the Emergence of Linguistic Structure, Benjamins, Amsterdam.

Casadei F. 2014, La polisemia nel vocabolario di base dell’italiano, in “Lingue e Linguaggi” 12, pp. 35-52.

Casadei F. 2016, L’omonimia nel lessico italiano, in “Studi di Lessicografia Italiana” 33, pp. 187-228.

Fellbaum C. (ed.) 1998, WordNet: An Electronic Lexical Database, The MIT Press, Cambridge.

Fenk-Oczlon G. and Fenk A. 2010a, The association between word frequency and polysemy: a chicken and egg problem?, in Solovyev V. and Polyakov V. (eds.), Proceedings of the XIIth International Conference “Cognitive Modeling in Linguistics”, Kazan State University Press, Kazan, pp. 167-170.

Fenk-Oczlon G. and Fenk A. 2010b, Frequency effects on the emergence of polysemy and homophony, in “International Journal of Information Technologies and Knowledge” 4 [2], pp. 103-109.

Gahl S. 2008, Time and thyme are not homophones: the effect of lemma frequency on word durations in spontaneous speech, in “Language” 84 [3], pp. 474-496.

Gilliéron J. 1921, Pathologie et thérapeutique verbale, Champion, Paris.

Gradit = Grande Dizionario Italiano dell’Uso ideato e diretto da Tullio De Mauro, 6 voll., UTET, Torino, 1999 (2a ed. 8 voll., ivi, 2007).

Greenberg J.H. 1966, Language Universals: With Special Reference to Feature Hierarchies, De Gruyter, Berlin/New York.

Gries S.Th. and Divjak D. (eds.) 2012a, Frequency Effects in Language Learning and Processing, De Gruyter, Berlin/New York.

Gries S.Th. and Divjak D. (eds.) 2012b, Frequency Effects in Language Representation, De Gruyter, Berlin/New York.

Grzybek P. 2015, Word Length, in Taylor J.R. (ed.), The Oxford Handbook of the Word, Oxford University Press, Oxford, pp. 89-119.

Henrick J. 2008, On word-length and dictionary size. http://www.thefreelibrary.com/On+word-length+and+dictionary+size.-a0189832222 (23.6.2016).

Jespersen O. 2010, Monosyllabism in English, in Jespersen O., Selected Writings of Otto Jespersen, Routledge, New York, pp. 325-341 (1a ed. 1929, in Proceedings of the British Academy, vol. 14, Milford, London).

Ke J. 2006, A cross-linguistic quantitative study of homophony, in “Journal of Quantitative Linguistics” 13, pp. 129-159.

Köhler R. 1986, Zur linguistischen Synergetik. Struktur und Dynamik der Lexik, Universitätsverlag Brockmeyer, Bochum.

Köhler R. 1990, Elemente der synergetischen Linguistik, in Hammerl R. (Hrsg.), Glottometrika 12, Universitätsverlag Brockmeyer, Bochum, pp. 179-188.

Köhler R. 2005, Synergetic Linguistics, in Köhler R., Altmann G. and Piotrowski R.G. (eds.), Quantitative Linguistik. Ein internationales Handbuch. Quantitative Linguistics. An International Handbook , De Gruyter, Berlin/New York, pp. 760-775.

Langacker R.W. 1987, Foundations of Cognitive Grammar, vol. I, Theoretical Prerequisites, Stanford University Press, Stanford.

Lyons J. 1968, Introduction to theoretical linguistics, Cambridge University Press, Cambridge; trad. it. Antinucci F., Mannucci E. 1971, Introduzione alla linguistica teorica, Laterza, Bari.

Miller G.A., Newman E.B. and Friedman E.A. 1958, Length-Frequency Statistics for Written English, in “Information and Control” 1, pp. 370-389.

Németh G. and Zainkó C. 2001, Word unit based multilingual comparative analysis of text corpora, in INTERSPEECH 2001, 7th European Conference on Speech Communication and Technology, pp. 2035-2038 (electronic edition http://www.isca-speech.org/archive/archive_papers/eurospeech_2001/e01_2035.pdf).

Newmeyer F.J. 1998, Language Form and Language Function, The MIT Press, Cambridge.

Newmeyer F.J. 2003, Grammar is grammar and usage is usage, in “Language” 79 [4], pp. 682-707.

Norvig P. 2013, English Letter Frequency Counts: Mayzner Revisited. http://norvig.com/mayzner.html (23.06.2016).

Parick R. 2015, Distribution of Word Lengths in Various Languages. http://www.ravi.io/language-word-lengths (23.06.2016).

Piantadosi S.T., Tily H. and Gibson E. 2011, Word lengths are optimized for efficient communication, in “Proceedings of the National Academy of Sciences” 108 [9], pp. 3526-3529.

Piantadosi S.T., Tily H. and Gibson E. 2015, The communicative function of ambiguity in language, in “Cognition” 122 [3], pp. 280-291.

Poddubbny V. and Polikarpov A.A. 2015, Evolutionary derivation of laws for polysemic and age-polysemic distributions of language signs ensembles, in Tuzzi A., Benešová M. and Macutek J. (eds.), Recent Contributions to Quantitative Linguistics, De Gruyter, Berlin/New York, pp. 115-124.

Polikarpov A.A. 1997, Some factors and regularities of analytic/synthetic development of language systems, paper presented at the XIII International Conference on Historical Linguistcs, 10-17 August 1997, Duesseldorf, Heinrich Heine Universitaet. http://www.philol.msu.ru/~lex/articles/fact_reg.htm (23.06.2016).

Polikarpov A.A. 1999, Cognitive Model of Lexical System Evolution and its Verification, in HumLang. A Site on General Linguistics, Laboratory for General and Computational Lexicology and Lexicography, Moscow Lomonosov State University. http://www.philol.msu.ru/~humlang/articles/h_cyc_n.htm (23.06.2016).

Smith R. 2012, Distinct word length frequencies: distributions and symbol entropies, in “Glottometrics” 23, pp. 7-22.

Strauss U., Grzybek P. and Altmann G. 2007, Word Length and Word Frequency, in Grzybek P. (ed.), Contributions to the Science of Text and Language: Word Length Studies and Other Issues, Springer, Dordrecht, pp. 277-294.

Tomasello M. 2003, Constructing a Language: A Usage-Based Theory of Language Acquisition, Harvard University Press, Cambridge (MA).

Ullmann S. 1966, Semantic universals, in Greenberg J.H. (ed.), Universals of language, The MIT Press, Cambridge, pp. 172-207.

Wasow Th., Perfors A. and Beaver D. 2003, The Puzzle of ambiguity, in Orgun O. and Sells P. (eds.), Morphology and The Web of Grammar: Essays in Memory of Steven G. Lapointe, CSLI Publications, Stanford University, Stanford.

Wasow Th. 2015, Ambiguity avoidance is overrated, in Winkler S. (ed.,) Ambiguity: Language and Communication, De Gruyter, Berlin/New York, pp. 29-47.

Zipf G.K. 1936, The Psycho-Biology of Language, Routledge & Sons, London.

Zipf G.K. 1945, The meaning-frequency relationship of words, in “Journal of General Psychology” 33, pp. 251-256.

Zipf G.K. 1949, Human behaviour and the principle of least effort. An introduction to human ecology, Addison-Wesley Press, Cambridge.


Full Text: pdf

Refbacks

  • There are currently no refbacks.
کاغذ a4

Creative Commons License
This work is licensed under a Creative Commons Attribuzione - Non commerciale - Non opere derivate 3.0 Italia License.