Sunday, July 22, 2007

Conditional Random Fields (CRFs)


Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees and lattices. The underlying idea is that of defining a conditional probability distribution over label sequences given a particular observation sequence, rather than a joint distribution over both label and observation sequences. The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs in order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional Markov models based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-world tasks in many fields, including bioinformatics, computational linguistics and speech recognition.

Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania, 2004.

Tuesday, July 10, 2007

Semantics, Syntax and pragmatics

from: Semantic Tagging - Susanne Ekeklint
The difference between syntactic tagging and semantic tagging is that the categories that are used to mark the entities in the latter case are of a semantic kind. Semantics has to do with intentions and meanings. POS tagging may sometimes be considered semantic but is usually seen as syntactic tagging. By tradition there is a separation between the form side (syntax) and the content side (semantics) of a phrase and the intention of this is to make a distinction between what is being said to how it is being said. Levison (1983) describes the historical background of the terms syntax, semantics and pragmatics by referring to Charles Morries's(1971) distinctions, within the sudy of "the relations of signs", or semiotics.

Syntactics (or syntax) being the study of "the formal relation of signs to one another", semantics the study of "the relations of signs to the objects to which the signs are applicable" (their designata), and pragmatics, the study of "the relation of signs to interpreters". (Morris 1938:6, quoted in Levinson 1938:1)

Levinson says that there is a "pure study" in each one of the three ares; it is however a known fact that in practice the areas often overlaps. Semantics includes the study of syntax and pragmatics includes the study of semantics (Allwood, 1993)

Levinson Stephen C.(1983) Pragmatics. Cambridge University Press
Morris, Charles W. (1971) Writings on the General Theory of Signs. The Hague: Mouton.
Morris 1938:6

Sunday, July 1, 2007

The slide about text catgorization

Mainly about the content of the chapter 16 of the book Foundations of Statistical Natural Language Processing

Here is the slide.