TY - GEN
T1 - TOWARDS A MULTILINGUAL DICTIONARY OF DISCOURSE MARKERS Automatic extraction of units from parallel corpus
AU - Renau, Irene
AU - Nazar, Rogelio
N1 - Publisher Copyright:
© 2022, European Association for Lexicography. All rights reserved.
PY - 2022
Y1 - 2022
N2 - This paper presents a multilingual dictionary project of discourse markers. During its first stage, consisting of collecting the list of headwords, we used a parallel corpus to automatically extract units from texts written in Spanish, Catalan, English, French and German. We also applied a method to create a taxonomy structure for automatically organising the markers in clusters. As a result, we obtain an extensive, corpus-driven list of headwords. We present a prototype of the microstructure of the dictionary in the form of a standard XML database and describe the procedure to automatically fill in most of its fields (e. g., the type of DM, the equivalents in other languages, etc.), before human intervention.
AB - This paper presents a multilingual dictionary project of discourse markers. During its first stage, consisting of collecting the list of headwords, we used a parallel corpus to automatically extract units from texts written in Spanish, Catalan, English, French and German. We also applied a method to create a taxonomy structure for automatically organising the markers in clusters. As a result, we obtain an extensive, corpus-driven list of headwords. We present a prototype of the microstructure of the dictionary in the form of a standard XML database and describe the procedure to automatically fill in most of its fields (e. g., the type of DM, the equivalents in other languages, etc.), before human intervention.
KW - Computational lexicography
KW - corpus-driven lexicography
KW - discourse markers
KW - multilingual lexicography
UR - http://www.scopus.com/inward/record.url?scp=85140448847&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:85140448847
SN - 9783937241876
T3 - EURALEX Proceedings
SP - 262
EP - 272
BT - Proceedings of the 20th EURALEX International Congress, 2022
A2 - Klosa-Kückelhaus, Annette
A2 - Engelberg, Stefan
A2 - Möhrs, Christine
A2 - Storjohann, Petra
PB - European Association for Lexicography
T2 - 20th EURALEX International Congress, 2022
Y2 - 12 July 2022 through 16 July 2022
ER -