A suite to compile and analyze an LSP corpus

Rogelio Nazar, Jorge Vivaldi, Teresa Cabré

Producción científica: Capítulo del libro/informe/acta de congresoContribución a la conferenciarevisión exhaustiva

4 Citas (Scopus)

Resumen

This paper presents a series of tools for the extraction of specialized corpora from the web and its subsequent analysis mainly with statistical techniques. It is an integrated system of original as well as standard tools and has a modular conception that facilitates its re-integration on different systems. The first part of the paper describes the original techniques, which are devoted to the categorization of documents as relevant or irrelevant to the corpus under construction, considering relevant a specialized document of the selected technical domain. Evaluation figures are provided for the original part, but not for the second part involving the analysis of the corpus, which is composed of algorithms that are well known in the field of Natural Language Processing, such as Kwic search, measures of vocabulary richness, the sorting of n-grams by frequency of occurrence or by measures of statistical association, distribution or similarity.

Idioma originalInglés
Título de la publicación alojadaProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
EditorialEuropean Language Resources Association (ELRA)
Páginas1164-1169
Número de páginas6
ISBN (versión digital)2951740840, 9782951740846
EstadoPublicada - 2008
Publicado de forma externa
Evento6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, Marruecos
Duración: 28 may. 200830 may. 2008

Serie de la publicación

NombreProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

Conferencia

Conferencia6th International Conference on Language Resources and Evaluation, LREC 2008
País/TerritorioMarruecos
CiudadMarrakech
Período28/05/0830/05/08

Huella

Profundice en los temas de investigación de 'A suite to compile and analyze an LSP corpus'. En conjunto forman una huella única.

Citar esto