## Thursday, March 6, 2008

### 用Listings高亮代码

http://en.wikibooks.org/wiki/LaTeX/Packages/Listings
pdf

\documentclass{article}
\usepackage{listings}

\begin{document}

\begin{lstlisting}[language=C]
int main(int argc, char ** argv)
{
/* print a string "Hello world!"
printf("Hello world!\n");

return 0;
}
\end{lstlisting}

\end{document}

## Friday, February 8, 2008

### Comments on Rethinking the Semantic Web

Yesterday I read the article Rethinking the Semantic Web (part1, part2) by Rob mcCool.
It points out several fundamental defects underlying the Semantic Web idea and proposes an Named Entity Web as the solution. Published in the year of 2005, the issues discussed are becoming widely realized and arousing great interests.

Briefly speaking, I list the features, from the human-understandability and machine processibility's point of view, of the Semantic Web as follows.
• 1. all information is in (RDF) triples.
• 2. The meaning of triples is interpreted by ontologies that they cite.
• 3. In order to allow machines to process triples more cleverly, OWL, basing on the Frame logic, is introduced.

The latter two issues relate closely to Knowledge representation. And in his article, Rob mcCool discussed the defection of the Semantic Web from its KR origination. I would like to quote the statements in the paper as below:
1. KR uses the fundamental mathematics of Codd's theory to translate information, which human represent with natural language(, into sets of tables that use well defined schema to define what can be entered in the rows and columns).
--[comment] the originate
2. Because information theory removes nearly all context from information, KR represents only fact.
--[comment] Where web users are mostly interested in context related information.
3. Complex relationships, exceptions to rules and ideas that resist simplistic classifications pose significant design challenges to information bases.

Thus, they pose a fundamental barrier, in terms of richness of representation as well as creation and maintenance, compared to the written language that people use.

I totally agree with the author in that “New representations must be easy to translate to and from natural language” and that "any other approach ignores the representation problem, assumes that context-free facts and logical rules are sufficient, and will fail."

In part2 of the paper, the author proposed an name-entity-web(NEW). It removes classes, relations and triples from Semantic Web formats in order to provide a less ambitious version of the Semantic Web which is more feasible. In the proposal of NEW
1. the basic element an entity which can be thought of as taking a simple business-card style.
2. the entities doesn't need the consistency and formalism that ontologies work so hard to ensure
3. The entities can be created by, for example, users or manufactures, for themselves.
4. The entities are embedded in HTML files, thus is connected to its context.
5. Semantics of the entity can be clarified when necessary.
6. Problems related to consistency, semantics or trusts can be solved by current techniques like page rank, search engine and so on.

I agree with 2, 3, 5 and 6. But I am still doubting that whether entity is a better representation frame (besides the paper doesn't give enough details) than triples. As for point 4, I don't think entities being embedded into the HTML files is the only approach to connect machine readable
information with its context.

What I am trying to do is:
http://www.miv.t.u-tokyo.ac.jp/papers/yangj-WI07.pdf

## Sunday, January 27, 2008

### 用Minipage并排图形和表格

\begin{minipage}[htb]{.45\textwidth}
\begin{minipage}{.4\textwidth}
\centering
\includegraphics[width=100pt, bb=0 0 94.19mm 94.19mm]{fig.eps}
\label{fig:myfig}
\end{minipage}
\begin{minipage}{.45\textwidth}
\centering
\tiny
\begin{tabular}{|cc|}
\hline
\multicolumn{2}{|c|}{Classes from Wikipedia structure}\\
\hline
% & Feature & Example \\
genre 70 & members 57\\
origin 48 &member role 48 \\
years active 44 & a music group 39\\
member occupations 26& album Name 18 \\
lyrics 13 &album no. 10 \\
single 10 & born in 8\\
\hline
\multicolumn{2}{|c|}{extra classes from corpus}\\
\hline
die in 7 & former Members 4 \\
alias 3 &instruments 3\\
awards 3 & label 2 \\
associate act 1 & birth name 1 \\
\hline
\end{tabular}
\makeatletter\def\@captype{table}\makeatother\caption{Relation classes.}
\label{tbl:mytbl}
\end{minipage}
\end{minipage}

## Wednesday, January 16, 2008

### one figure over two columns &&figure array

\begin{figure*}[tb]实现跨行

\usepackage{graphicx}

---------------------------------------------------------------
\begin{figure*}[tb]
\centering
\subfigure[1]{\includegraphics[width=400pt]{D:/0work/res/pic/1.eps}}
%\mbox{\hspace{0.5cm}}
\\
\subfigure[2]{\includegraphics[width=140pt]{D:/0work/res/pic/2.eps}}
\subfigure[3]{\includegraphics[width=140pt]{D:/0work/res/pic/3.eps}}
%\mbox{\hspace{0.5cm}}
\subfigure[4]{\includegraphics[width=140pt]{D:/0work/res/pic/4.eps}}
\renewcommand{\figurename}{figure}
\caption{hello} \label{hello}
\end{figure*}

## Monday, January 14, 2008

### latex 单行或多行公式的排版

from : http://www.boyeut.com/2007/05/equation.html

1.自动编号的单行公式环境是



$$公式 \eqno 编号$$ 将编号放在右边
$$公式 \leqno 编号$$ 将编号放在左边

2.单个公式很长，需要换行，但仅允许生成一个编号时，可以用split命令

\begin{split}
a &= b \\
c &= d
\end{split}

3.多行公式：

\begin{eqnarray}

\end{eqnarray}

\setcounter{equation}{数}

4.方程组的排版：

1) gather环境

\begin{gather}
a &= b \\
c &= d \\

\end{gather}
>>1.如果其中某几行使用同一个编号，则需要内嵌一个split环境。
>>2.命令\notag可使当前行不编号。
2) align环境

3) 以上几种方程组环境，无论每个公式多小，都会占满一行。使用相应的\gathered，\aligned环境，则只占据公式的实际宽度，整体作为一个特大的符号与其他符号一同处理。

\left.
\begin{aligned}[b]
a &= b+c \\
d &= b+c
\end{aligned}
\right\}
\Longrightarrow

【参考文献】
1.陈志杰等，LATEX入门与提高（第二版），高等教育出版社，2006.5

Latex-定理编号的引用

Latex-case环境

Latex-定理定义的排版

## Saturday, December 29, 2007

### feature selection

Feature Extraction, Foundations and Applications

See here

## Friday, November 16, 2007

### Notes about SVM & the Kernel Method

Reference:
[1] An Introduction to Support Vector Machines and other kernel-based learning methods
Nello Cristianini, John Shawer-Taylor
[2] 数据挖掘中的新方法-支持向量机

[3] 机器学习
Tom M. Mitchell

[4] http://en.wikipedia.org/wiki/Support_vector_machine
[5] An introduction to Kernel-Based Learning Algorithms
Klaus-Robert Muller, Sebastian Mika, Gunnar Ratsch, Koji Tsuda, and Bernhard Scholkopf
----------------------------------------------------------------------------------
Issues:
1.Some background of computational learning theory
Empirical Risk Minimization (ERM)
Probably Approximately Correct Learning (PAC)
--risk function and expected risk ([2] p131)
Vapnik-Chervonenkis dimension
refer:
[1] Chapter4 Generalization Theory
[2] p161 section4.8
[3] chapter seven

*They tend to solve the problem:

*求解分类问题，也许我们会想到如下途径：首先用训练集估计概率分布，然后根据所得到的概率分布求得决策函数。但是这样做违反了利用有限数量心细解决问题的一项基本原则。这项原则就是：在求解一个问题过程中，不应把解决另一个更为一般的问题作为其中的一个步骤。当前求解的问题是分类问题。而概率分布的估计是一个更为一般的问题。事实上，在统计学中，知道了概率分布几乎可以认为知道了一切，因为根据它便能解决各种各样的问题。因此我么内部应该通过估计概率分布而求解分类问题。[2]p141

*学习算法中我们需要选择适当的假设集F。实际上这里的关键因素是假设集F的大小，或F的丰富程度，或者说F的“表达能力”。VC维是对这种“表达能力”的一种描述。
Roughly speaking, the VC dimension measures how many (training) points can be shattered (i.e., separated) for all possible labelings using functions of the class. ([5] section II)
*
2.Optimization Theory
[1] chapter 5
[2] chapter 1
The problems addressed in 1 have a similar form: the hypothesis function should be chosen to minimize (or maximize) a certain functional. Optimization theory is the branch of mathematics concerned with characterizing the solutions of classes of such problems, and developing effective algorithms for finding them.

3. kernel
*Frequently the target concept cannot be expressed as a simple linear combination of the given attributes, but in general requires that more abstract features of the data be exploited. Kernel representations offer alternative solution by projecting the data into a high dimensional feature space to increase the computational power of the linear learning machines. [1]p26

[comment:] "the advantage of using the machines in the dual representation derives from the fact that in this representation the number of tunable parameters does not depend on the number of attributes being used."
-- The number of parameters depends on the definition of the kernel. if you define a "bad" kernel, there will be many parameters as well.

**
With kernels we can compute the linear hyperplane classifier in a Hilbert space with higher dimension. We prefer linear models because:

1) there is the intuition that a "simple" (e.g., linear) function that explains most of the data is preferable to a complex one (Occam's razor). ([5] section II)

2) In practice the bound on the general expected error computed by VC dimension is often neither easily computable nor very helpful. Typical problems are that the upper bound on the expected test error might be trivial (i.e., larger than one), the VC dimension of the function class is unknown or it is infinite (in which case one would need an infinite amount of training data). ([5], section II )

For linear methods, the VC dimension can be estimated as the number of free parameters, but for most other methods (including k-nearest neighbors), accurate estimates for VC-dimension are not available.

3) for the class of hyperplanes the VC dimension itself can be bounded in terms of an other quantity, ther margin. (the margin is defined as trhe minimal distance of a sample to the decision surface) ([5] section II )

4. Existing Kernels
• Polynomial (homogeneous): $k(\mathbf{x},\mathbf{x}')=(\mathbf{x} \cdot \mathbf{x'})^d$
• Polynomial (inhomogeneous): $k(\mathbf{x},\mathbf{x}')=(\mathbf{x} \cdot \mathbf{x'} + 1)^d$
• Radial Basis Function: $k(\mathbf{x},\mathbf{x}')=\exp(-\gamma \|\mathbf{x} - \mathbf{x'}\|^2)$, for γ > 0
• Gaussian Radial basis function: $k(\mathbf{x},\mathbf{x}')=\exp\left(- \frac{\|\mathbf{x} - \mathbf{x'}\|^2}{2 \sigma^2}\right)$
• Sigmoid: $k(\mathbf{x},\mathbf{x}')=\tanh(\kappa \mathbf{x} \cdot \mathbf{x'}+c)$, for some (not every) κ > 0 and c <>
([4])

In short, not the mimensionalitybut the complexity of the function class matters. , all one needs for separation is a linear hyperplane. However, it becomes rather tricky to control the feature space for large real-world problems. So even i one could control the statistical complexity of this function class, one would still run into intractability problems while executing an algorithm in this space. Fortunately, for certain feature spaces F and corresponding mapping phai, there is a highly effective trick for computing scalar products in feature spaces using kernel functions. ([5] section II)

## Wednesday, October 24, 2007

### Latex 公式

from http://blog.sina.com.cn/s/blog_53be544e010003hn.html

\usepackage[amsmath,thmmarks]{ntheorem}
% 定理类环境宏包，其中 amsmath 选项
% 用来兼容 AMS LaTeX 的宏包

%=== 配合上面的ntheorem宏包产生各种定理结构,重定义一些
%正文相关标题 ===
\theoremstyle{plain}
\theorembodyfont{\normalfont\rm\CJKfamily{kai}} \theoremindent0em
\theoremseparator{\hspace{1em}} \theoremnumbering{arabic}
%\theoremsymbol{} %定理结束时自动添加的标志
\newtheorem{definition}{\hspace{2em}定义}[section]
%\newtheorem{definition}{\hei 定义}[section]
%!!!注意当section为中国数字时，[section]不可用！
\newtheorem{proposition}{\hspace{2em}命题}[section]
\newtheorem{property}{\hspace{2em}性质}[section]
\newtheorem{lemma}{\hspace{2em}引理}[section]
%\newtheorem{lemma}[definition]{引理}
\newtheorem{theorem}{\hspace{2em}定理}[section]
\newtheorem{axiom}{\hspace{2em}公理}[section]
\newtheorem{corollary}{\hspace{2em}推论}[section]
\newtheorem{exercise}{\hspace{2em}习题}[section]
\theoremsymbol{$\blacksquare$}
\newtheorem{example}{\hspace{2em}例}[section]
\theoremstyle{nonumberplain}
\theorembodyfont{\normalfont \rm \CJKfamily{song}} \theoremindent0em
\theoremseparator{\hspace{1em}} \theoremsymbol{$\blacksquare$}
\newtheorem{proof}{\hspace{2em}证明}

## Friday, September 7, 2007

### Jensen-Shannon divergence

definition given by wikipedia

In probability theory and statistics, the Jensen-Shannon divergence is a popular method of measuring the similarity between two probability distributions.

used as a similarity measure of entities in
Chen, J. and D. Ji, et al. (2006). Relation Extraction Using Label Propagation Based Semi-Supervised Learning. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, Sydney, Australia.

## Sunday, August 26, 2007

### Lazy Learning

http://en.wikipedia.org/wiki/Lazy_learning

In artificial intelligence, lazy learning is a learning method in which generalization beyond the training data is delayed until a query is made to the system, as opposed to in eager learning, where the system tries to generalize the training data before receiving queries.

...

### Hausdorff distance

what is Hausdorff distance ?
An introduction

Named after Felix Hausdorff (1868-1942), Hausdorff distance is the « maximum distance of a set to the nearest point in the other set »

## Sunday, August 19, 2007

### A talk given by Prof. Mitch Marcus on last Friday

> タイトル：Unsupervised induction of morphological structure
>
> 概要：
> We will discuss the problem of unsupervised morphological and part of
> speech (POS) acquisition in realistic settings. From studies of tagged
> corpora, we show that there is a sparse data problem in morphology,
> which raises the question of how rare forms may be learned. We then show
> that it is often the case that the base form of a word is present among
> the different inflections of a lexeme, which suggests that rare forms
> can be learned by association with a base form. We introduce new
> representations for morphological structure which express the
> morphophonological transduction behavior of these base forms, and
> present
> an algorithm to acquire these structures automatically from an unlabeled
> corpus. We apply the algorithm to a range of Indo-European languages
> including Slovene, English, and Spanish.
>

comment:
1. met the same group of people (well, I mean young researchers basically) again.
2. I asked two questions on how to deal with sparse data. Based on what I understood:
a) To prune the space by analyzing features of the data.
b) To add back ground knowledge.
3. Jin said he is the 牛魔王 in their field! Orz...

## Friday, August 17, 2007

### Latex中纵向排列两个子图

\usepackage{subfigure}

\begin{figure}[tb]
\centering \subfigure[subCaption_1 ]{\includegraphics[width=200pt]{1.eps}}
\label{fig:selFilter} \subfigure[subCaption_2]{\includegraphics[width=200pt]{2.eps}}
\label{fig:tripleQuery}
\caption{Query} \label{Fig:CellDropRates}
\end{figure}

## Tuesday, August 7, 2007

### Maxium Likelihood Estimate (MLE)

1. from wikipedia:

Maximum likelihood estimation (MLE) is a popular statistical method used to make inferences about parameters of the underlying probability distribution from a given data set. That is to say, you have a sample of data

X_{1}, \dots, X_{n} \!

and some kind of model for data, and you want to estimate parameters of the distribution.

2. from http://www.itl.nist.gov/div898/handbook/eda/section3/eda3652.htm
Maximum likelihood estimation begins with the mathematical expression known as a likelihood function of the sample data. Loosely speaking, the likelihood of a set of data is the probability of obtaining that particular set of data given the chosen probability model. This expression contains the unknown parameters. Those values of the parameter that maximize the sample likelihood are known as the maximum likelihood estimates.

the dis/advantages are discussed, as well as the software.

3. about smoothed maximum likelihood estimates.
One purpose of the smoothed estimates is too account for sparseness in counts for distributions with a lot of history by backing off to less sparse estimates.
(McDonald, R. (2005). Extracting Relations from Unstructured Text, Department of Computer and Information Science, University of Pennsylvania.)

## Sunday, July 22, 2007

### Conditional Random Fields (CRFs)

from
http://www.inference.phy.cam.ac.uk/hmw26/crf/

Conditional random fields (CRFs) are a probabilistic framework for labeling and segmenting structured data, such as sequences, trees and lattices. The underlying idea is that of defining a conditional probability distribution over label sequences given a particular observation sequence, rather than a joint distribution over both label and observation sequences. The primary advantage of CRFs over hidden Markov models is their conditional nature, resulting in the relaxation of the independence assumptions required by HMMs in order to ensure tractable inference. Additionally, CRFs avoid the label bias problem, a weakness exhibited by maximum entropy Markov models (MEMMs) and other conditional Markov models based on directed graphical models. CRFs outperform both MEMMs and HMMs on a number of real-world tasks in many fields, including bioinformatics, computational linguistics and speech recognition.

tutorial:
Hanna M. Wallach. Conditional Random Fields: An Introduction. Technical Report MS-CIS-04-21. Department of Computer and Information Science, University of Pennsylvania, 2004.

## Tuesday, July 10, 2007

### Semantics, Syntax and pragmatics

from: Semantic Tagging - Susanne Ekeklint
--------------
The difference between syntactic tagging and semantic tagging is that the categories that are used to mark the entities in the latter case are of a semantic kind. Semantics has to do with intentions and meanings. POS tagging may sometimes be considered semantic but is usually seen as syntactic tagging. By tradition there is a separation between the form side (syntax) and the content side (semantics) of a phrase and the intention of this is to make a distinction between what is being said to how it is being said. Levison (1983) describes the historical background of the terms syntax, semantics and pragmatics by referring to Charles Morries's(1971) distinctions, within the sudy of "the relations of signs", or semiotics.

Syntactics (or syntax) being the study of "the formal relation of signs to one another", semantics the study of "the relations of signs to the objects to which the signs are applicable" (their designata), and pragmatics, the study of "the relation of signs to interpreters". (Morris 1938:6, quoted in Levinson 1938:1)

Levinson says that there is a "pure study" in each one of the three ares; it is however a known fact that in practice the areas often overlaps. Semantics includes the study of syntax and pragmatics includes the study of semantics (Allwood, 1993)

--------------
Levinson Stephen C.(1983) Pragmatics. Cambridge University Press
Morris, Charles W. (1971) Writings on the General Theory of Signs. The Hague: Mouton.
Morris 1938:6

## Sunday, July 1, 2007

### The slide about text catgorization

Mainly about the content of the chapter 16 of the book Foundations of Statistical Natural Language Processing

Here is the slide.

## Sunday, June 17, 2007

### Generative Model and Disriminative Model

==
Ref: MLWiki
A generative model is one which explicitly states how the observations are assumed to have been generated. Hence, it defines the joint probability of the data and latent variables of interest.

==

==
ref: A simple comparison
of
generative model (model likelihood and prior) <-- NB..
and discriminative model (model posterior) <-- SVM

==
ref: Classify Semantic Relations in Bioscience Texts.

Generative models learn the prior probability of the class and the probability of the features given the class; they are the natural choice in cases with hidden variables (partially observed or missing data). Since labeled data is expensive to collect, these models may be useful when no labels are available. However, in this paper we test the generative models on fully observed data and show that , although not as accurate as the discriminative model, their performance is promising enough to encourage their use for the case of partially observed data.

Discriminative models learn the probability of the class given the features. When we have fully observed data and we just need to learn the mapping from features to classes(classification), a discriminative approach may be more appropriate.

It must be pointed out that the neural network (discriminative model) is much slower than the graphical models (HMM-like generative models), and requires a great deal of memory.

## Wednesday, June 6, 2007

### Collocations

Chap 5 of foundations of statistical natural language processing

- a collocation is an expression consisting of two or more words that correspond to some conventional say of saying things.

- Collocations are characterized by limited compositionality.
(we call a natural language expression compositional if the meaning of the expression can be predicted from the meaning of the parts.)
Collocations are note fully compositional in that there is usually an element of meaning added to the combination.
--> non-compositionality
--> non-substitutability
--> non-modifiability

-term: the word term has a different meaning in information retrieval. There it refers to both words and phrases.

- a number of approaches to finding collocations:
a)selections by frequency,
raw frequency doesn't work.
With part of speech tag patterns, one gets surprisingly good result.<-- Justeson and Katz' method. hints: a simple quantitative technique combined with a small amount of linguistic knowledge goes a longway.
works well for fixed phrases.

b)selection based on mean and variance of the distance between focal word collocating word
scenario: the distance between two words in not constant so a fixed phrase approach would not work.

collocational window (usually a window of 3 to 4 words on each side fo a word)

Mean and variance o the offsets between two words in a corpus.

c)hypothesis testing (********)

in b) we can not make for sure that the high frequency and low variance of two words can be accidental. So we are also taking into account how much data we have seen. Even if there is a remarkable pattern, we will discount it if we haven't seen enough data to be certain that it couldn't be due to chance.

-->null hypothesis.
-->t test : assume that the probabilities are approximately normally distributed.
The t test looks at the mean and variance of a sample of measurements, where the null hypothesis is that the sample is drawn from a distribution with mean miu. The test looks at the difference between the observed and expected means, scaled by the variance of the data, and tells us how likely one is to get a sample of that mean and variance ( or a more extreme mean and variance) assuming that the sample is drawn from a normal distribution with mean miu.
(todo)

--> Chi-square test
The essence of the test is to compare the observed frequencies in the table with the frequencies expected for independence. If the difference between observed and expected frequencies is large, then we can eject the null hypothesis of independence.
d)mutual information

--> likelihood ratios:
more appropriate for sparse data than the chi-square test

### Controlled Language

What are Controlled Natural Languages?
"Controlled Natural Languages are subsets of natural languages whose grammars and dictionaries have been restricted in order to reduce or eliminate both ambiguity and complexity. Traditionally, controlled languages fall into two major categories: those that improve readability for human readers, particularly non-native speakers, and those that improve computational processing of the text."

## Sunday, May 6, 2007

### 更改latex 中item的样子

\renewcommand{\labelitemi}{$\star$}

e.g.change the "dot" to "-"

\renewcommand{\labelitemi}{-}
\begin{itemize}
\item aaaaaa
\item bbbbbbb
\end{itemize}
\end{definition}

### Examples of Math

from Information Retrieval in Folksonomies: Search and Ranking
Andreas Hotho, Robert Jaschke, Christoph Schmitz, Gerd Stumme

## Thursday, May 3, 2007

### Latex并列子图中使用minipage

\begin{figure}[htb]
\centering \mbox{
\subfigure[title for sub figure 1]{
\begin{minipage}[c]{3.7cm}
\small
a sentence\\
a sentence\\
a sentence\\
a sentence\\
\end{minipage}
\label{fig:labelOfSubFigureA}

\subfigure[title for sub figure b]{
\begin{minipage}[c]{3.7cm}
\small
a sentence\\
a sentence\\
a sentence\\
a sentence\\
a sentence\\
\end{minipage}
\label{fig:tripleTagsB} }
} \caption{Triple tags}
\label{fig:label for sub figure B}
\end{figure}

--------------------------------------
e.g.

## Wednesday, May 2, 2007

### latex中输入定义

\documentclass[times, 10pt,twocolumn]{article}
....
\usepackage{amsmath,amsthm,amssymb} % 引入 AMS 數學環境g
....
\newtheorem{definition}{Definition}
\newtheorem{theorem}{Theorem}(for 公理)

\begin{document}
...
(如果只是definition，之用引入amssymb包)
\begin{definition}[Name of the Definition]
content of the definition
\end{definition}

....

## Monday, April 30, 2007

1. http://www.w3.org/2001/11/13-RDF-Query-Rules/
This document provides a survey of RDF query language and implementations and describes their capabilities in terms described in RDF Query and Rules Framework.

2.SPARQL: A query platform for Web 2.0 and the Semantic Web

A full list of different query language implementations can bee seen at http://www.w3.org/2001/11/13-RDF-Query-Rules/. But these languages lack both a common syntax and a common semantics. In fact, the existing query languages cover a significant semantic range: from declarative, SQL-like languages, to path languages, to rule or production-like systems. And SPARQL had to fill this gap.

## Wednesday, April 25, 2007

### vocabulary

1. generative v.s. discriminative
2. inductive v.s. recursive

## Monday, April 9, 2007

tutorial on kernel method

Bernhard Schölkopf. Statistical learning and kernel methods. MSR-TR 2000-23, Microsoft Research, 2000.

Kernel methods retain te original representation of objects and use the object in algorithms only via computing a kernel function between a pair of objects. A kernel function is a similarity function satisfying certain properties. More precisely, a kernel function K over the object space X is binary function K:X*X ->[0, infinite] mapping a pair of objects x,y \in X to their similarity score K(x,y). A kernel function is required to be symmetric and positive-semidefinite.

## Monday, April 2, 2007

### NOTE: enable curl extension from PHP

@已经内置有php_curl.dll,在ext目录下,此DLL用于支持SSL和zlib.
@在php.ini中找到有extension=php_curl.dll, 去掉前面的注释.
@设置extension_dir=c:\php\ext,
@将:libeay32.dll, ssleay32.dll, php5ts.dll, php_curl.dll 都拷贝到system32目录下 (或将这些文件的目录加入系统的path变量中)
@重启apache/IIS即可.

----------------------------------

＜? php
print_r(curl_version());
?>

## Thursday, March 29, 2007

### Conference Information

NLP,
conference from the linguist list

## Saturday, March 24, 2007

### 最近做的事情

0.考虑了一下triple模型的表达能里和局限。涉及：
a) svo
b) event

1.参加T-FaNT。给自己的启发不小。大概考虑了一下自己的试验领域和评价方法。

2. 找了三个parser搭后台环境
a) syntex parser - charniak
http://www.cfilt.iitb.ac.in/~anupama/charniak.php
b) dependency parser
minipar: http://www.cs.ualberta.ca/~lindek/minipar.htm
connexor: http://www.connexor.com/software/

minipar里面的std库和我的不一样。在windows和linux下面链接均未果。又分别找了五个朋友帮忙在他们的环境里链接一次。只有那个越南的同学编译成功了。汗。结果我就写信跟作者要源代码。当然自己也知道不是很可能。抱着万一成功的想法碰碰运气而已。

3.搭建一个client / server的环境。
DN人很好，提供了一台机器可以拿来当服务器。虽然配置不是很高。但是作为服务肯定是够了。但是我们实验室的ip虽然是静态的，被配置成不能作为服务器使用。只有内网机器可以互相访问apache/tomcat服务。

4.所以要了connexor parser的windows版。我很想要linux版的。但是因为没有静态可外部访问的ip。只好要了windows版的，这其中的原因，苦不堪言……

5.跟老板谈了一下自己的时间安排。
-----------------------------------------------

## Thursday, March 22, 2007

### T-FaNT

T-FaNT 07 (Tokyo Forum on Advanced NLP and TM)

msdn也是很有用的。

## Thursday, March 8, 2007

### Dependency Parsing

Ref:
A Fundamental Algorithm for Dependency Parsing

Notes of Content:

• 1. phrase-structure (constituency) parser v.s. dependency parsing.
• 2. constituency grammar v.s. dependency grammar.
• 3. Dependency tree:
• if two words are connected by a dependency relation:they are head and dependent, connected by the link
• in the dependency tree, constituents (phrases) still exist.
• 4. year of 1965, it is proved that dependency grammar and constituency grammar are strongly equivalent - that they can generate the same sentences and make the same structural claims about them - provided the constituency grammar is restricted in a particular way.

## Wednesday, March 7, 2007

### Introduction to Syntactic Parsing

Introduction to Syntactic Parsing
A good introduction for the beginners.

by Roxana Girju
Novermber 18, 2004

## Tuesday, March 6, 2007

### Five basic sentence patterns in English

First of all, thanks to Jin for his help
Reference 1
Reference 2
Reference 3

-----------------------
From reference 2.
"
Now, I'm sure you know that there are five basic sentence patterns, consisting of necessay elements which are S(subject), V(verb), O(object), and C(complement).

I list out the 5 patterns here:

S+V
S+V+C
S+V+O
S+V+O+C
S+V+O+O
"

## Wednesday, February 28, 2007

### Good starting point

A new start, although comes a little bit late. But so much better than never.

Today I talked with professor and was told I *can* decide to work on the topic he assigned or the one I have been wanting to do for so long.

But the bad news is, seems it's difficult for a foreigner with little knowledge of Japanese to find a research position in Japan...wow....

## Friday, February 16, 2007

### latex中插入并列子图

\begin{figure}[htb]
\centering
\mbox{
\subfigure[信元丢失率与加速因子之间的关系]{\includegraphics[scale=1.0]{temp1.e
\subfigure[信元丢失率与缓冲大小之间的关系]{\includegraphics[scale=1.0]{temp2.e
ps}}
}
\caption{影响信元丢失率的因素}
\label{Fig:CellDropRates}
\end{figure}

## Thursday, February 15, 2007

### harmonic series and Riemann Zeta Function

>>Harmonic series: sigma (1/n) n = 0 .. infinit

>>Riemann Zeta Function

the most common form of Riemann Zeta Function:

>>

## Monday, January 29, 2007

### inf (glb) and sup

from mathforum

inf means "infimum," or "greatest lower bound." This is
slightly different from minimum in that the greatest lower bound is
defined as:

x is the infimum of the set S [in symbols, x = inf (S)] iff:

a) x is less than or equal to all elements of S
b) there is no other number larger than x which is less than or equal
to all elements of S.

Basically, (a) means that x is a lower bound of S, and (b) means that
x is greater than all other lower bounds of S.

This differs from min (S) in that min (S) has to be a member of S.
Suppose that S = {2, 1.1, 1.01, 1.001, 1.0001, 1.00001, ...}. This
set has no smallest member, no minimum. However, it's trivial to show
that 1 is its infimum; clearly all elements are greater than or equal
to 1, and if we thought that something greater than 1 was a lower
bound, it'd be easy to show some member of S which is less than it.

So that's the difference between inf and min. It's worth noting that
every set has an inf (assuming minus infinity is okay), and that the
two concepts are the same for finite sets.

glb is another way of writing inf (sort for "greatest lower bound")

------------------------
sup and lub, which are short for "supremum" and "least upper bound."

## Saturday, January 20, 2007

### Latent Semantic Index (LSI)

1) here is a great explaination to a layman.

or latent semantic analysis

"The term ’semantics’ is applied to the science and study of meaning in language, and the meaning of characters, character strings and words. Not just the language and words themselves, but the true meaning being conveyed in the context in which they are being used.

In 2002 a company called Applied Semantics, an innovator in the use of semantics in text processing, launched a program known as AdSense, which was a form of contextual advertising whereby adverts were placed on website pages which contained text that was relevant to the subject of the adverts.

The matching up of text and adverts was carried out by software in the form of mathematical formulae known as algorithms. It was claimed that these formulae used semantics to analyze the meaning of the text within the web page. In fact, what it initially seemed to do was to match keywords within the page with keywords used in the adverts, though some further interpretation of meaning was evident in the way that some relevant adverts were correctly placed without containing the same keyword character string as used on the web page.

"

2)from Patterns in Unstructured Data -Discovery, Aggregation, and Visualization

## Wednesday, December 27, 2006

### model-theoretic semantics

>>"The basic of model-theoretic semantics can be roughly described as the following.
For a formal language L, a model M consists of descriptions about
objects and their factual relations in a domain. The descriptions are written in
another language Lm, which is a meta-language, and can either be a natural
language, like English, or another formal language. An interpretation I maps
the words in L onto the objects and relations in M. According to this theory,
the meaning of a word in L is defined as its image in M under I, and whether
a statement in L is true is determined by whether it is mapped by I onto a
fact in M."

from "Experience-Grounded Semantics:A theory for intelligent systems"
P4

>>According to model-theoretic semantics, for any formal language L, the necessary
and sufficient condition for its terms to have meaning and for its statements to have truth value is the existence of a model. In different models, the meaning of terms and truth value of statements may change; however, these changes are not caused by using the language. A reasoning system R that processes sentences in L does not depend on the semantics of L when the system runs. That means, on the one hand, that R needs no access to the meanings of terms and truth values of statements — it can distinguish terms only by their forms, and derive statements from other statements only according to its (syntactically defined) inference rules, but it puts little constraint on how the language can be interpreted. On the other hand, what knowledge R has and what operations R performs have no influence on the meaning and truth
value of the terms and sentences involved. Such a treatment is desired in pure mathematics.

## Tuesday, December 26, 2006

### syntax && semantic relation.

the work of NLP V.S. semantic theory.

from
http://www.cogsci.uni-osnabrueck.de/~gkatz/Semantics/SlidesPDF/Lecture%20Two.pdf
-------------------------
Syntax is an independent filed of study concerned with determining what the grammatical structures are

semantic theory interprets those structures.

-------------------------

### What is semantic? - beginning

"semantic web, semantic web, semantic web..."
"Stop, may I ask a question? what do you mean by 'semantic'?"

I hope one day I can answer this question clearly.
So from now on I will write down related pieces I met, the points I understood. And discussions are warmly welcomed. If you are interested in.

I will create a new label with its name "WhatIsSemantic", so everyone (of course, mainly me, I guess) can check all the posts easily. Thus, the titles of the future posts will not necessarily be "What is semantic(#)".

## Tuesday, December 19, 2006

### C++ memo

>>default constructor & copy constructor & copy assign constructor

>>int* p1, p2 = {int *p;int p2}
int *p1, *p2 = {int *p1; int *p2}

>>pointer & reference
http://zhidao.baidu.com/question/4616759.html
http://blog.sina.com.cn/u/4b03937a0100062c
[comment] basically, reference is an alais.

>>const int* p v.s int * const p
http://slam-dunk-for-life.spaces.live.com/blog/

const int *p 指向常量的指针变量
int * const p 指向变量的指针常量
const int* const p 指向常量的指针常量

>>kevin's
http://kevin22hao.spaces.live.com/blog/

>>http://www.cplusplus.com/doc/tutorial/

>>http://www.cplusplus.com/doc/tutorial/