Two-step flow in bilingual lexicon extraction from unrelated corpora

ROGELIO ANTONIO NAZAR, Leo Wanner, Jorge Vivaldi

Resultado de la investigación: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

2 Citas (Scopus)

Resumen

This paper presents a language independent methodology for automatically extracting bilingual lexicon entries from the web without the need of resources like parallel or comparable corpora, POS tagging, nor an initial bilingual lexicon. It is suitable for specialized domains where bilingual lexicon entries are scarce. The input for the process is a corpus in the source language to use as example of real usage of the units we need to translate. It is a two-step flow process because first we extract single-word units from the source language and then the multi-word units where the initial single units are instantiated. For each of the multi-word units, we see if they appear in texts from the web in the target language. The unit of the target language that appears more frequently across the sets of multi-word units is usually the correct translation of the initial single-word source language entry.

Idioma originalInglés
Título de la publicación alojadaProceedings of the 12th European Association for Machine Translation Conference, EAMT 2008
Páginas140-149
Número de páginas10
EstadoPublicada - 2008
Publicado de forma externa
Evento12th Conference of the European Association for Machine Translation, EAMT 2008 - Hamburg, Alemania
Duración: 22 sep 200823 sep 2008

Serie de la publicación

NombreProceedings of the 12th European Association for Machine Translation Conference, EAMT 2008

Conferencia

Conferencia12th Conference of the European Association for Machine Translation, EAMT 2008
PaísAlemania
CiudadHamburg
Período22/09/0823/09/08

Huella Profundice en los temas de investigación de 'Two-step flow in bilingual lexicon extraction from unrelated corpora'. En conjunto forman una huella única.

Citar esto