Thursday, March 6, 2008





int main(int argc, char ** argv)
/* print a string "Hello world!"
printf("Hello world!\n");

return 0;



Friday, February 8, 2008

Comments on Rethinking the Semantic Web

Yesterday I read the article Rethinking the Semantic Web (part1, part2) by Rob mcCool.
It points out several fundamental defects underlying the Semantic Web idea and proposes an Named Entity Web as the solution. Published in the year of 2005, the issues discussed are becoming widely realized and arousing great interests.

Briefly speaking, I list the features, from the human-understandability and machine processibility's point of view, of the Semantic Web as follows.
  • 1. all information is in (RDF) triples.
  • 2. The meaning of triples is interpreted by ontologies that they cite.
  • 3. In order to allow machines to process triples more cleverly, OWL, basing on the Frame logic, is introduced.

The latter two issues relate closely to Knowledge representation. And in his article, Rob mcCool discussed the defection of the Semantic Web from its KR origination. I would like to quote the statements in the paper as below:
  1. KR uses the fundamental mathematics of Codd's theory to translate information, which human represent with natural language(, into sets of tables that use well defined schema to define what can be entered in the rows and columns).
    --[comment] the originate
  2. Because information theory removes nearly all context from information, KR represents only fact.
    --[comment] Where web users are mostly interested in context related information.
  3. Complex relationships, exceptions to rules and ideas that resist simplistic classifications pose significant design challenges to information bases.

Thus, they pose a fundamental barrier, in terms of richness of representation as well as creation and maintenance, compared to the written language that people use.

I totally agree with the author in that “New representations must be easy to translate to and from natural language” and that "any other approach ignores the representation problem, assumes that context-free facts and logical rules are sufficient, and will fail."

In part2 of the paper, the author proposed an name-entity-web(NEW). It removes classes, relations and triples from Semantic Web formats in order to provide a less ambitious version of the Semantic Web which is more feasible. In the proposal of NEW
1. the basic element an entity which can be thought of as taking a simple business-card style.
2. the entities doesn't need the consistency and formalism that ontologies work so hard to ensure
3. The entities can be created by, for example, users or manufactures, for themselves.
4. The entities are embedded in HTML files, thus is connected to its context.
5. Semantics of the entity can be clarified when necessary.
6. Problems related to consistency, semantics or trusts can be solved by current techniques like page rank, search engine and so on.

I agree with 2, 3, 5 and 6. But I am still doubting that whether entity is a better representation frame (besides the paper doesn't give enough details) than triples. As for point 4, I don't think entities being embedded into the HTML files is the only approach to connect machine readable
information with its context.

What I am trying to do is:

Sunday, January 27, 2008


\includegraphics[width=100pt, bb=0 0 94.19mm 94.19mm]{fig.eps}
\makeatletter\def\@captype{figure}\makeatother\caption{\scriptsize An infobox of the Wikipedia page.}
\multicolumn{2}{|c|}{Classes from Wikipedia structure}\\
% & Feature & Example \\
genre 70 & members 57\\
origin 48 &member role 48 \\
years active 44 & a music group 39\\
member occupations 26& album Name 18 \\
lyrics 13 &album no. 10 \\
single 10 & born in 8\\
\multicolumn{2}{|c|}{extra classes from corpus}\\
die in 7 & former Members 4 \\
alias 3 &instruments 3\\
awards 3 & label 2 \\
associate act 1 & birth name 1 \\
\makeatletter\def\@captype{table}\makeatother\caption{Relation classes.}

Wednesday, January 16, 2008

one figure over two columns &&figure array


\caption{hello} \label{hello}

Monday, January 14, 2008

latex 单行或多行公式的排版

from :





$$公式 \eqno 编号 $$ 将编号放在右边
$$公式 \leqno 编号 $$ 将编号放在左边


例如,$$a^2+b^2=c^2 \eqno (**)$$

一般情况下,行间公式 $$…$$也可以用\[…\]表示


a &= b \\
c &= d



左 & 中 & 右\\
左 & 中 & 右\\




1) gather环境
a &= b \\
c &= d \\

2) align环境
3) 以上几种方程组环境,无论每个公式多小,都会占满一行。使用相应的\gathered,\aligned环境,则只占据公式的实际宽度,整体作为一个特大的符号与其他符号一同处理。

a &= b+c \\
d &= b+c
\qquad a=d







Saturday, December 29, 2007

feature selection

因为想看看random forest,找到了这本书:
Feature Extraction, Foundations and Applications


Tuesday, December 4, 2007

Friday, November 16, 2007

Notes about SVM & the Kernel Method

[1] An Introduction to Support Vector Machines and other kernel-based learning methods
Nello Cristianini, John Shawer-Taylor
[2] 数据挖掘中的新方法-支持向量机
邓乃扬, 田英杰
[3] 机器学习
Tom M. Mitchell
曾华军 张银奎 等译
[5] An introduction to Kernel-Based Learning Algorithms
Klaus-Robert Muller, Sebastian Mika, Gunnar Ratsch, Koji Tsuda, and Bernhard Scholkopf
1.Some background of computational learning theory
Empirical Risk Minimization (ERM)
Probably Approximately Correct Learning (PAC)
--risk function and expected risk ([2] p131)
Vapnik-Chervonenkis dimension
[1] Chapter4 Generalization Theory
[2] p161 section4.8
[3] chapter seven

*They tend to solve the problem:


Roughly speaking, the VC dimension measures how many (training) points can be shattered (i.e., separated) for all possible labelings using functions of the class. ([5] section II)
2.Optimization Theory
[1] chapter 5
[2] chapter 1
The problems addressed in 1 have a similar form: the hypothesis function should be chosen to minimize (or maximize) a certain functional. Optimization theory is the branch of mathematics concerned with characterizing the solutions of classes of such problems, and developing effective algorithms for finding them.

3. kernel
*Frequently the target concept cannot be expressed as a simple linear combination of the given attributes, but in general requires that more abstract features of the data be exploited. Kernel representations offer alternative solution by projecting the data into a high dimensional feature space to increase the computational power of the linear learning machines. [1]p26

[comment:] "the advantage of using the machines in the dual representation derives from the fact that in this representation the number of tunable parameters does not depend on the number of attributes being used."
-- The number of parameters depends on the definition of the kernel. if you define a "bad" kernel, there will be many parameters as well.

With kernels we can compute the linear hyperplane classifier in a Hilbert space with higher dimension. We prefer linear models because:

1) there is the intuition that a "simple" (e.g., linear) function that explains most of the data is preferable to a complex one (Occam's razor). ([5] section II)

2) In practice the bound on the general expected error computed by VC dimension is often neither easily computable nor very helpful. Typical problems are that the upper bound on the expected test error might be trivial (i.e., larger than one), the VC dimension of the function class is unknown or it is infinite (in which case one would need an infinite amount of training data). ([5], section II )

For linear methods, the VC dimension can be estimated as the number of free parameters, but for most other methods (including k-nearest neighbors), accurate estimates for VC-dimension are not available.

3) for the class of hyperplanes the VC dimension itself can be bounded in terms of an other quantity, ther margin. (the margin is defined as trhe minimal distance of a sample to the decision surface) ([5] section II )

4. Existing Kernels
  • Polynomial (homogeneous): k(\mathbf{x},\mathbf{x}')=(\mathbf{x} \cdot \mathbf{x'})^d
  • Polynomial (inhomogeneous): k(\mathbf{x},\mathbf{x}')=(\mathbf{x} \cdot \mathbf{x'} + 1)^d
  • Radial Basis Function: k(\mathbf{x},\mathbf{x}')=\exp(-\gamma \|\mathbf{x} - \mathbf{x'}\|^2), for γ > 0
  • Gaussian Radial basis function: k(\mathbf{x},\mathbf{x}')=\exp\left(- \frac{\|\mathbf{x} - \mathbf{x'}\|^2}{2 \sigma^2}\right)
  • Sigmoid: k(\mathbf{x},\mathbf{x}')=\tanh(\kappa \mathbf{x} \cdot \mathbf{x'}+c), for some (not every) κ > 0 and c <>

In short, not the mimensionalitybut the complexity of the function class matters. , all one needs for separation is a linear hyperplane. However, it becomes rather tricky to control the feature space for large real-world problems. So even i one could control the statistical complexity of this function class, one would still run into intractability problems while executing an algorithm in this space. Fortunately, for certain feature spaces F and corresponding mapping phai, there is a highly effective trick for computing scalar products in feature spaces using kernel functions. ([5] section II)

Wednesday, October 24, 2007

Latex 公式

首先你要先使用宏包 ntheorem

% 定理类环境宏包,其中 amsmath 选项
% 用来兼容 AMS LaTeX 的宏包

%=== 配合上面的ntheorem宏包产生各种定理结构,重定义一些
%正文相关标题 ===
\theorembodyfont{\normalfont\rm\CJKfamily{kai}} \theoremindent0em
\theoremseparator{\hspace{1em}} \theoremnumbering{arabic}
%\theoremsymbol{} %定理结束时自动添加的标志
%\newtheorem{definition}{\hei 定义}[section]
\theorembodyfont{\normalfont \rm \CJKfamily{song}} \theoremindent0em
\theoremseparator{\hspace{1em}} \theoremsymbol{$\blacksquare$}


Friday, September 7, 2007

Jensen-Shannon divergence

definition given by wikipedia

In probability theory and statistics, the Jensen-Shannon divergence is a popular method of measuring the similarity between two probability distributions.

used as a similarity measure of entities in
Chen, J. and D. Ji, et al. (2006). Relation Extraction Using Label Propagation Based Semi-Supervised Learning. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia.

Sunday, August 26, 2007

Lazy Learning

In artificial intelligence, lazy learning is a learning method in which generalization beyond the training data is delayed until a query is made to the system, as opposed to in eager learning, where the system tries to generalize the training data before receiving queries.


Hausdorff distance

what is Hausdorff distance ?
An introduction

Named after Felix Hausdorff (1868-1942), Hausdorff distance is the « maximum distance of a set to the nearest point in the other set »

Sunday, August 19, 2007

A talk given by Prof. Mitch Marcus on last Friday

> タイトル:Unsupervised induction of morphological structure
> 概要:
> We will discuss the problem of unsupervised morphological and part of
> speech (POS) acquisition in realistic settings. From studies of tagged
> corpora, we show that there is a sparse data problem in morphology,
> which raises the question of how rare forms may be learned. We then show
> that it is often the case that the base form of a word is present among
> the different inflections of a lexeme, which suggests that rare forms
> can be learned by association with a base form. We introduce new
> representations for morphological structure which express the
> morphophonological transduction behavior of these base forms, and
> present
> an algorithm to acquire these structures automatically from an unlabeled
> corpus. We apply the algorithm to a range of Indo-European languages
> including Slovene, English, and Spanish.

1. met the same group of people (well, I mean young researchers basically) again.
2. I asked two questions on how to deal with sparse data. Based on what I understood:
a) To prune the space by analyzing features of the data.
b) To add back ground knowledge.
3. Jin said he is the 牛魔王 in their field! Orz...

Friday, August 17, 2007



\centering \subfigure[subCaption_1 ]{\includegraphics[width=200pt]{1.eps}}
\label{fig:selFilter} \subfigure[subCaption_2]{\includegraphics[width=200pt]{2.eps}}
\caption{Query} \label{Fig:CellDropRates}

Tuesday, August 7, 2007

Maxium Likelihood Estimate (MLE)

1. from wikipedia:

Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution from a given data set. That is to say, you have a sample of data

X_{1}, \dots, X_{n} \!

and some kind of model for data, and you want to estimate parameters of the distribution.

2. from
Maximum likelihood estimation begins with the mathematical expression known as a likelihood function of the sample data. Loosely speaking, the likelihood of a set of data is the probability of obtaining that particular set of data given the chosen probability model. This expression contains the unknown parameters. Those values of the parameter that maximize the sample likelihood are known as the maximum likelihood estimates.

the dis/advantages are discussed, as well as the software.

3. about smoothed maximum likelihood estimates.
One purpose of the smoothed estimates is too account for sparseness in counts for distributions with a lot of history by backing off to less sparse estimates.
(McDonald, R. (2005). Extracting Relations from Unstructured Text, Department of Computer and Information Science, University of Pennsylvania.)

Sunday, July 22, 2007

Conditional Random Fields (CRFs)


Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees and lattices. The underlying idea is that of defining a conditional probability distribution over label sequences given a particular observation sequence, rather than a joint distribution over both label and observation sequences. The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs in order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional Markov models based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-world tasks in many fields, including bioinformatics, computational linguistics and speech recognition.

Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania, 2004.

Tuesday, July 10, 2007

Semantics, Syntax and pragmatics

from: Semantic Tagging - Susanne Ekeklint
The difference between syntactic tagging and semantic tagging is that the categories that are used to mark the entities in the latter case are of a semantic kind. Semantics has to do with intentions and meanings. POS tagging may sometimes be considered semantic but is usually seen as syntactic tagging. By tradition there is a separation between the form side (syntax) and the content side (semantics) of a phrase and the intention of this is to make a distinction between what is being said to how it is being said. Levison (1983) describes the historical background of the terms syntax, semantics and pragmatics by referring to Charles Morries's(1971) distinctions, within the sudy of "the relations of signs", or semiotics.

Syntactics (or syntax) being the study of "the formal relation of signs to one another", semantics the study of "the relations of signs to the objects to which the signs are applicable" (their designata), and pragmatics, the study of "the relation of signs to interpreters". (Morris 1938:6, quoted in Levinson 1938:1)

Levinson says that there is a "pure study" in each one of the three ares; it is however a known fact that in practice the areas often overlaps. Semantics includes the study of syntax and pragmatics includes the study of semantics (Allwood, 1993)

Levinson Stephen C.(1983) Pragmatics. Cambridge University Press
Morris, Charles W. (1971) Writings on the General Theory of Signs. The Hague: Mouton.
Morris 1938:6

Sunday, July 1, 2007

The slide about text catgorization

Mainly about the content of the chapter 16 of the book Foundations of Statistical Natural Language Processing

Here is the slide.

Sunday, June 17, 2007

Generative Model and Disriminative Model

Ref: MLWiki
A generative model is one which explicitly states how the observations are assumed to have been generated. Hence, it defines the joint probability of the data and latent variables of interest.

See also: Generative Model from wikipedia.

ref: A simple comparison
generative model (model likelihood and prior) <-- NB..
and discriminative model (model posterior) <-- SVM

ref: Classify Semantic Relations in Bioscience Texts.

Generative models learn the prior probability of the class and the probability of the features given the class; they are the natural choice in cases with hidden variables (partially observed or missing data). Since labeled data is expensive to collect, these models may be useful when no labels are available. However, in this paper we test the generative models on fully observed data and show that , although not as accurate as the discriminative model, their performance is promising enough to encourage their use for the case of partially observed data.

Discriminative models learn the probability of the class given the features. When we have fully observed data and we just need to learn the mapping from features to classes(classification), a discriminative approach may be more appropriate.

It must be pointed out that the neural network (discriminative model) is much slower than the graphical models (HMM-like generative models), and requires a great deal of memory.

Wednesday, June 6, 2007


Chap 5 of foundations of statistical natural language processing

- a collocation is an expression consisting of two or more words that correspond to some conventional say of saying things.

- Collocations are characterized by limited compositionality.
(we call a natural language expression compositional if the meaning of the expression can be predicted from the meaning of the parts.)
Collocations are note fully compositional in that there is usually an element of meaning added to the combination.
--> non-compositionality
--> non-substitutability
--> non-modifiability

-term: the word term has a different meaning in information retrieval. There it refers to both words and phrases.

- a number of approaches to finding collocations:
a)selections by frequency,
raw frequency doesn't work.
With part of speech tag patterns, one gets surprisingly good result.<-- Justeson and Katz' method. hints: a simple quantitative technique combined with a small amount of linguistic knowledge goes a longway.
works well for fixed phrases.

b)selection based on mean and variance of the distance between focal word collocating word
scenario: the distance between two words in not constant so a fixed phrase approach would not work.

collocational window (usually a window of 3 to 4 words on each side fo a word)

Mean and variance o the offsets between two words in a corpus.

c)hypothesis testing (********)

in b) we can not make for sure that the high frequency and low variance of two words can be accidental. So we are also taking into account how much data we have seen. Even if there is a remarkable pattern, we will discount it if we haven't seen enough data to be certain that it couldn't be due to chance.

-->null hypothesis.
-->t test : assume that the probabilities are approximately normally distributed.
The t test looks at the mean and variance of a sample of measurements, where the null hypothesis is that the sample is drawn from a distribution with mean miu. The test looks at the difference between the observed and expected means, scaled by the variance of the data, and tells us how likely one is to get a sample of that mean and variance ( or a more extreme mean and variance) assuming that the sample is drawn from a normal distribution with mean miu.

--> Chi-square test
The essence of the test is to compare the observed frequencies in the table with the frequencies expected for independence. If the difference between observed and expected frequencies is large, then we can eject the null hypothesis of independence.
d)mutual information

--> likelihood ratios:
more appropriate for sparse data than the chi-square test

Controlled Language

What are Controlled Natural Languages?
"Controlled Natural Languages are subsets of natural languages whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity. Traditionally, controlled languages fall into two major categories: those that improve readability for human readers, particularly non-native speakers, and those that improve computational processing of the text."


Sunday, May 6, 2007


更改latex 中item的样子


e.g.change the "dot" to "-"

\item aaaaaa
\item bbbbbbb

Examples of Math

from Information Retrieval in Folksonomies: Search and Ranking
Andreas Hotho, Robert Jaschke, Christoph Schmitz, Gerd Stumme

Thursday, May 3, 2007


\centering \mbox{
\subfigure[title for sub figure 1]{
a sentence\\
a sentence\\
a sentence\\
a sentence\\

\subfigure[title for sub figure b]{
a sentence\\
a sentence\\
a sentence\\
a sentence\\
a sentence\\
\label{fig:tripleTagsB} }
} \caption{Triple tags}
\label{fig:label for sub figure B}


Wednesday, May 2, 2007


\documentclass[times, 10pt,twocolumn]{article}
\usepackage{amsmath,amsthm,amssymb} % 引入 AMS 數學環境g
\newtheorem{theorem}{Theorem}(for 公理)

\begin{definition}[Name of the Definition]
content of the definition


Monday, April 30, 2007

About RDF query

This document provides a survey of RDF query language and implementations and describes their capabilities in terms described in RDF Query and Rules Framework.

2.SPARQL: A query platform for Web 2.0 and the Semantic Web

A full list of different query language implementations can bee seen at But these languages lack both a common syntax and a common semantics. In fact, the existing query languages cover a significant semantic range: from declarative, SQL-like languages, to path languages, to rule or production-like systems. And SPARQL had to fill this gap.

Wednesday, April 25, 2007


1. generative v.s. discriminative
2. inductive v.s. recursive

Monday, April 9, 2007

about Kernel method

tutorial on kernel method

Bernhard Schölkopf. Statistical learning and kernel methods. MSR-TR 2000-23, Microsoft Research, 2000.

Kernel methods retain te original representation of objects and use the object in algorithms only via computing a kernel function between a pair of objects. A kernel function is a similarity function satisfying certain properties. More precisely, a kernel function K over the object space X is binary function K:X*X ->[0, infinite] mapping a pair of objects x,y \in X to their similarity score K(x,y). A kernel function is required to be symmetric and positive-semidefinite.

Monday, April 2, 2007

NOTE: enable curl extension from PHP


@在php.ini中找到有extension=php_curl.dll, 去掉前面的注释.
@将:libeay32.dll, ssleay32.dll, php5ts.dll, php_curl.dll 都拷贝到system32目录下 (或将这些文件的目录加入系统的path变量中)


<? php

Thursday, March 29, 2007

Saturday, March 24, 2007


a) svo
b) event


2. 找了三个parser搭后台环境
a) syntex parser - charniak
b) dependency parser



3.搭建一个client / server的环境。

4.所以要了connexor parser的windows版。我很想要linux版的。但是因为没有静态可外部访问的ip。只好要了windows版的,这其中的原因,苦不堪言……



Thursday, March 22, 2007



T-FaNT 07 (Tokyo Forum on Advanced NLP and TM)


Saturday, March 17, 2007

为什么会出现link2005 (zz)



Thursday, March 8, 2007

Dependency Parsing

A Fundamental Algorithm for Dependency Parsing

Notes of Content:

  • 1. phrase-structure (constituency) parser v.s. dependency parsing.
  • 2. constituency grammar v.s. dependency grammar.
  • 3. Dependency tree:
    • if two words are connected by a dependency relation:they are head and dependent, connected by the link
    • in the dependency tree, constituents (phrases) still exist.
  • 4. year of 1965, it is proved that dependency grammar and constituency grammar are strongly equivalent - that they can generate the same sentences and make the same structural claims about them - provided the constituency grammar is restricted in a particular way.

Wednesday, March 7, 2007

Introduction to Syntactic Parsing

Introduction to Syntactic Parsing
A good introduction for the beginners.

by Roxana Girju
Novermber 18, 2004

Tuesday, March 6, 2007

Five basic sentence patterns in English

First of all, thanks to Jin for his help
Reference 1
Reference 2
Reference 3

From reference 2.
Now, I'm sure you know that there are five basic sentence patterns, consisting of necessay elements which are S(subject), V(verb), O(object), and C(complement).

I list out the 5 patterns here:


Wednesday, February 28, 2007

Good starting point

A new start, although comes a little bit late. But so much better than never.

Today I talked with professor and was told I *can* decide to work on the topic he assigned or the one I have been wanting to do for so long.

But the bad news is, seems it's difficult for a foreigner with little knowledge of Japanese to find a research position in

Saturday, February 17, 2007


Friday, February 16, 2007




Thursday, February 15, 2007

harmonic series and Riemann Zeta Function

>>Harmonic series: sigma (1/n) n = 0 .. infinit

>>Riemann Zeta Function

the most common form of Riemann Zeta Function:


Monday, January 29, 2007

inf (glb) and sup

from mathforum

inf means "infimum," or "greatest lower bound." This is
slightly different from minimum in that the greatest lower bound is
defined as:

x is the infimum of the set S [in symbols, x = inf (S)] iff:

a) x is less than or equal to all elements of S
b) there is no other number larger than x which is less than or equal
to all elements of S.

Basically, (a) means that x is a lower bound of S, and (b) means that
x is greater than all other lower bounds of S.

This differs from min (S) in that min (S) has to be a member of S.
Suppose that S = {2, 1.1, 1.01, 1.001, 1.0001, 1.00001, ...}. This
set has no smallest member, no minimum. However, it's trivial to show
that 1 is its infimum; clearly all elements are greater than or equal
to 1, and if we thought that something greater than 1 was a lower
bound, it'd be easy to show some member of S which is less than it.

So that's the difference between inf and min. It's worth noting that
every set has an inf (assuming minus infinity is okay), and that the
two concepts are the same for finite sets.

glb is another way of writing inf (sort for "greatest lower bound")

sup and lub, which are short for "supremum" and "least upper bound."

Saturday, January 20, 2007

Latent Semantic Index (LSI)

1) here is a great explaination to a layman.

or latent semantic analysis

"The term ’semantics’ is applied to the science and study of meaning in language, and the meaning of characters, character strings and words. Not just the language and words themselves, but the true meaning being conveyed in the context in which they are being used.

In 2002 a company called Applied Semantics, an innovator in the use of semantics in text processing, launched a program known as AdSense, which was a form of contextual advertising whereby adverts were placed on website pages which contained text that was relevant to the subject of the adverts.

The matching up of text and adverts was carried out by software in the form of mathematical formulae known as algorithms. It was claimed that these formulae used semantics to analyze the meaning of the text within the web page. In fact, what it initially seemed to do was to match keywords within the page with keywords used in the adverts, though some further interpretation of meaning was evident in the way that some relevant adverts were correctly placed without containing the same keyword character string as used on the web page.


2)from Patterns in Unstructured Data -Discovery, Aggregation, and Visualization

Wednesday, December 27, 2006

model-theoretic semantics

>>"The basic of model-theoretic semantics can be roughly described as the following.
For a formal language L, a model M consists of descriptions about
objects and their factual relations in a domain. The descriptions are written in
another language Lm, which is a meta-language, and can either be a natural
language, like English, or another formal language. An interpretation I maps
the words in L onto the objects and relations in M. According to this theory,
the meaning of a word in L is defined as its image in M under I, and whether
a statement in L is true is determined by whether it is mapped by I onto a
fact in M."

from "Experience-Grounded Semantics:A theory for intelligent systems"

>>According to model-theoretic semantics, for any formal language L, the necessary
and sufficient condition for its terms to have meaning and for its statements to have truth value is the existence of a model. In different models, the meaning of terms and truth value of statements may change; however, these changes are not caused by using the language. A reasoning system R that processes sentences in L does not depend on the semantics of L when the system runs. That means, on the one hand, that R needs no access to the meanings of terms and truth values of statements — it can distinguish terms only by their forms, and derive statements from other statements only according to its (syntactically defined) inference rules, but it puts little constraint on how the language can be interpreted. On the other hand, what knowledge R has and what operations R performs have no influence on the meaning and truth
value of the terms and sentences involved. Such a treatment is desired in pure mathematics.

Tuesday, December 26, 2006

syntax && semantic relation.

the work of NLP V.S. semantic theory.

Syntax is an independent filed of study concerned with determining what the grammatical structures are

semantic theory interprets those structures.


What is semantic? - beginning

"semantic web, semantic web, semantic web..."
"Stop, may I ask a question? what do you mean by 'semantic'?"

I hope one day I can answer this question clearly.
So from now on I will write down related pieces I met, the points I understood. And discussions are warmly welcomed. If you are interested in.

I will create a new label with its name "WhatIsSemantic", so everyone (of course, mainly me, I guess) can check all the posts easily. Thus, the titles of the future posts will not necessarily be "What is semantic(#)".

Tuesday, December 19, 2006

C++ memo

>>default constructor & copy constructor & copy assign constructor

>>int* p1, p2 = {int *p;int p2}
int *p1, *p2 = {int *p1; int *p2}

>>pointer & reference
[comment] basically, reference is an alais.

>>const int* p v.s int * const p
const int *p 指向常量的指针变量
int * const p 指向变量的指针常量
const int* const p 指向常量的指针常量