TY - JOUR
T1 - Towards advanced collocation error correction in Spanish learner corpora
AU - Ferraro, Gabriela
AU - Nazar, Rogelio
AU - Alonso Ramos, Margarita
AU - Wanner, Leo
N1 - Funding Information:
partially funded by the Spanish Ministry of Science and Innovation under the contract numbers FFI2008-06479-C02-01/02 and FFI2011-30219-CO2-01/02.
PY - 2014/3
Y1 - 2014/3
N2 - Collocations in the sense of idiosyncratic binary lexical co-occurrences are one of the biggest challenges for any language learner. Even advanced learners make collocation mistakes in that they literally translate collocation elements from their native tongue, create new words as collocation elements, choose a wrong subcategorization for one of the elements, etc. Therefore, automatic collocation error detection and correction is increasingly in demand. However, while state-of-the-art models predict, with a reasonable accuracy, whether a given co-occurrence is a valid collocation or not, only few of them manage to suggest appropriate corrections with an acceptable hit rate. Most often, a ranked list of correction options is offered from which the learner has then to choose. This is clearly unsatisfactory. Our proposal focuses on this critical part of the problem in the context of the acquisition of Spanish as second language. For collocation error detection, we use a frequency-based technique. To improve on collocation error correction, we discuss three different metrics with respect to their capability to select the most appropriate correction of miscollocations found in our learner corpus.
AB - Collocations in the sense of idiosyncratic binary lexical co-occurrences are one of the biggest challenges for any language learner. Even advanced learners make collocation mistakes in that they literally translate collocation elements from their native tongue, create new words as collocation elements, choose a wrong subcategorization for one of the elements, etc. Therefore, automatic collocation error detection and correction is increasingly in demand. However, while state-of-the-art models predict, with a reasonable accuracy, whether a given co-occurrence is a valid collocation or not, only few of them manage to suggest appropriate corrections with an acceptable hit rate. Most often, a ranked list of correction options is offered from which the learner has then to choose. This is clearly unsatisfactory. Our proposal focuses on this critical part of the problem in the context of the acquisition of Spanish as second language. For collocation error detection, we use a frequency-based technique. To improve on collocation error correction, we discuss three different metrics with respect to their capability to select the most appropriate correction of miscollocations found in our learner corpus.
KW - CALL
KW - Collocation
KW - Collocation error
KW - Collocation error correction
KW - Collocation error detection
KW - Miscollocation
UR - http://www.scopus.com/inward/record.url?scp=84897029145&partnerID=8YFLogxK
U2 - 10.1007/s10579-013-9242-3
DO - 10.1007/s10579-013-9242-3
M3 - Article
AN - SCOPUS:84897029145
SN - 1574-020X
VL - 48
SP - 45
EP - 64
JO - Language Resources and Evaluation
JF - Language Resources and Evaluation
IS - 1
ER -