This website is dedicated to spoken corpora, mainly for Albanian, Bosnian and Serbian. But there are also other corpora and other resources for these languages here.
I am currently working on making the corpora available in TEI XML and to integrate NoSketchEngine and SpoCo into the website in order to make all corpora searchable online. Some corpora already exist in SketchEngine and can be shared on request.
The following spoken corpora are available at the moment:
- BosCo – Spoken Corpus of Bosnian
- CALD – Dialect Corpus of Albanian
- CRONUS – A corpus for the analysis of Serbian spoken narratives
- SrMaCo – Spoken Corpus of the Serbian minority in Hungary
The following written corpora are available or under construction:
- Serbian parliamentary debates
- Newspaper corpora for Albanian, Bosnian, Hungarian and Montenegrin
- Law corpus for Albanian, Croatian and Serbian
Furthermore, the following resources aer provided here:
- MULTEXT-East specifications for Albanian