Friday, February 8, 2008

Comments on Rethinking the Semantic Web

Yesterday I read the article Rethinking the Semantic Web (part1, part2) by Rob mcCool.
It points out several fundamental defects underlying the Semantic Web idea and proposes an Named Entity Web as the solution. Published in the year of 2005, the issues discussed are becoming widely realized and arousing great interests.

Briefly speaking, I list the features, from the human-understandability and machine processibility's point of view, of the Semantic Web as follows.
  • 1. all information is in (RDF) triples.
  • 2. The meaning of triples is interpreted by ontologies that they cite.
  • 3. In order to allow machines to process triples more cleverly, OWL, basing on the Frame logic, is introduced.

The latter two issues relate closely to Knowledge representation. And in his article, Rob mcCool discussed the defection of the Semantic Web from its KR origination. I would like to quote the statements in the paper as below:
  1. KR uses the fundamental mathematics of Codd's theory to translate information, which human represent with natural language(, into sets of tables that use well defined schema to define what can be entered in the rows and columns).
    --[comment] the originate
  2. Because information theory removes nearly all context from information, KR represents only fact.
    --[comment] Where web users are mostly interested in context related information.
  3. Complex relationships, exceptions to rules and ideas that resist simplistic classifications pose significant design challenges to information bases.

Thus, they pose a fundamental barrier, in terms of richness of representation as well as creation and maintenance, compared to the written language that people use.

I totally agree with the author in that “New representations must be easy to translate to and from natural language” and that "any other approach ignores the representation problem, assumes that context-free facts and logical rules are sufficient, and will fail."


In part2 of the paper, the author proposed an name-entity-web(NEW). It removes classes, relations and triples from Semantic Web formats in order to provide a less ambitious version of the Semantic Web which is more feasible. In the proposal of NEW
1. the basic element an entity which can be thought of as taking a simple business-card style.
2. the entities doesn't need the consistency and formalism that ontologies work so hard to ensure
3. The entities can be created by, for example, users or manufactures, for themselves.
4. The entities are embedded in HTML files, thus is connected to its context.
5. Semantics of the entity can be clarified when necessary.
6. Problems related to consistency, semantics or trusts can be solved by current techniques like page rank, search engine and so on.

I agree with 2, 3, 5 and 6. But I am still doubting that whether entity is a better representation frame (besides the paper doesn't give enough details) than triples. As for point 4, I don't think entities being embedded into the HTML files is the only approach to connect machine readable
information with its context.

What I am trying to do is:
http://www.miv.t.u-tokyo.ac.jp/papers/yangj-WI07.pdf
http://midi.jie.yang.googlepages.com/tita