Sunday, August 19, 2007

A talk given by Prof. Mitch Marcus on last Friday

> タイトル:Unsupervised induction of morphological structure
> 概要:
> We will discuss the problem of unsupervised morphological and part of
> speech (POS) acquisition in realistic settings. From studies of tagged
> corpora, we show that there is a sparse data problem in morphology,
> which raises the question of how rare forms may be learned. We then show
> that it is often the case that the base form of a word is present among
> the different inflections of a lexeme, which suggests that rare forms
> can be learned by association with a base form. We introduce new
> representations for morphological structure which express the
> morphophonological transduction behavior of these base forms, and
> present
> an algorithm to acquire these structures automatically from an unlabeled
> corpus. We apply the algorithm to a range of Indo-European languages
> including Slovene, English, and Spanish.

1. met the same group of people (well, I mean young researchers basically) again.
2. I asked two questions on how to deal with sparse data. Based on what I understood:
a) To prune the space by analyzing features of the data.
b) To add back ground knowledge.
3. Jin said he is the 牛魔王 in their field! Orz...

