phonItalia   A Phonological lexicon for Italian

PhonItalia is an open access lexical database that provides phonological representations for 120,000 Italian word-forms. Each of the entries is provided with a comprehensive range of information including syllable boundary and stress markings, uniqueness points, neighbourhood estimates, and other measures, including written word-frequency and part of speech markers provided by the Colfis orthographic corpus (Laudanna, Thorton, Brown, Burani, & Marconi, 1995; Bertinetto et al., 2005). Using data derived from this core lexicon an additional range of databases have also been compiled to provide frequency of use statistics for Italian phonemes, syllables, syllable onsets and codas, plus character and phoneme bigrams.

Further information on the methods and details of the database, additional summarising lexical statistics, and a demonstration of an application of the data to aphasic speech errors are available in the following publication, please use this to cite phonItalia.

Goslin,J., Galluzzi,C., & Romani, C. (2013). PhonItalia: a phonological lexicon for Italian, Behavior Research Methods, DOI: 10.3758/s13428-013-0400-8

PhonItalia and all derived databases are freely available for non-commercial research use, under a creative commons license. These have been made available as Excel ( ,xlsx ) and tab-delimited text format ( .txt ) below.

phonItalia version 1.1 includes modifications to syllable stress position information for 11523 word-form representations (9.6% of all word-forms), and to the phonological representations of 79 word-forms.These have been provided by Giacomo Spinelli. All lexical statistics and derived databases have been recalculated based upon these modified word-forms.

 

phonItalia version 1.10

release date:  16th July 2014

Information

Brief description of the fields and phonemic alphabet used in the lexicon

phonItalia

Main lexicon and all derived databases in text and Excel format.

Word Forms

120,000 Italian word forms with their phonological representations

Phones

The 29 Italian phones used in the lexicon with total frequency of use statistics, as well as statistics specific to particular syllable position (syllable onset, nucleus, or coda).

Syllables

All 3631 phonological syllables found in the lexicon with total frequency of use statistics, along with statistics for the occurrence of syllables in particular word position (monosyllable, word onset, medial, and final).

Onsets

All 131 syllable onsets with total frequency of use statistics, and statistics relative to particular word position (word onset, medial, and in geminate).

Codas

All 60 syllable codas with total frequency of use statistics, and statistics relative to particular word position (word final, medial, and in geminate).

Biphones

All 576 biphones with total frequency of use statistics, and statistics relative to word and syllable position (word initial, medial, and final, syllable initial, medial, final, and cross-syllable).

Characters

Frequency of use statistics for 27 characters use in the word-forms.

Bigrams

All 477 character bigrams with total frequency of use statistics, and statistics relative to word position (word initial, medial, and final).

 

 

 

 

italianLexicon Program

Program and source code used to generate all lexical statistics in phonItalia

 

 

All enquires, corrections, or requests for further information can be directed to information@phonitalia.org

phonItalia was originally developed by Jeremy Goslin1, Claudia Galuzzi2, & Cristina Romani3

1School of Psychology, University of Plymouth,UK.

2Fondazione Santa Lucia, i.r.c.s.s., Roma , Italy.

3School of Life and Health Sciences, Aston University, UK.


Further contributions to the development of this resource have been provided by:
Giacomo Spinelli (syllable stress position and phonological representations)
 

 

Creative Commons License

phonItalia is distributed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.