TY - GEN
T1 - Two-step flow in bilingual lexicon extraction from unrelated corpora
AU - Nazar, Rogelio
AU - Wanner, Leo
AU - Vivaldi, Jorge
PY - 2008
Y1 - 2008
N2 - This paper presents a language independent methodology for automatically extracting bilingual lexicon entries from the web without the need of resources like parallel or comparable corpora, POS tagging, nor an initial bilingual lexicon. It is suitable for specialized domains where bilingual lexicon entries are scarce. The input for the process is a corpus in the source language to use as example of real usage of the units we need to translate. It is a two-step flow process because first we extract single-word units from the source language and then the multi-word units where the initial single units are instantiated. For each of the multi-word units, we see if they appear in texts from the web in the target language. The unit of the target language that appears more frequently across the sets of multi-word units is usually the correct translation of the initial single-word source language entry.
AB - This paper presents a language independent methodology for automatically extracting bilingual lexicon entries from the web without the need of resources like parallel or comparable corpora, POS tagging, nor an initial bilingual lexicon. It is suitable for specialized domains where bilingual lexicon entries are scarce. The input for the process is a corpus in the source language to use as example of real usage of the units we need to translate. It is a two-step flow process because first we extract single-word units from the source language and then the multi-word units where the initial single units are instantiated. For each of the multi-word units, we see if they appear in texts from the web in the target language. The unit of the target language that appears more frequently across the sets of multi-word units is usually the correct translation of the initial single-word source language entry.
KW - Bilingual lexicon extraction
KW - Corpus linguistics
KW - Knowledge-poor methods
KW - Machine translation
KW - Specialized terminology
KW - Statistical methods
UR - http://www.scopus.com/inward/record.url?scp=76749087320&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:76749087320
SN - 9783000257704
T3 - Proceedings of the 12th European Association for Machine Translation Conference, EAMT 2008
SP - 140
EP - 149
BT - Proceedings of the 12th European Association for Machine Translation Conference, EAMT 2008
T2 - 12th Conference of the European Association for Machine Translation, EAMT 2008
Y2 - 22 September 2008 through 23 September 2008
ER -