Saturday, May 12, 2007

Semantic Web

Yes, yet another for the same course. Need to check out: RDF, KQML, KIF, Piggybank extension to Firefox, OWL, DAML, etc

Web mining

Another topic for the 'AI and Web'. Yet to start exploration...

http://infolab.stanford.edu/~ullman/mining/webmining.html

Web mining tutorial:

(also contains other tutorials: World Wide Web Personalization,
Genetic Algorithms
Robust Statistics

Prototype Based Clustering

Relational Clustering

Robust Clustering

Unsupervised Clustering

Search Engines

Notes and resources on this topic: may be relevant for the course on 'AI and Web' being planned.

Ref1 :
A good reference for google's page rank algorithm, including an intuitive/formal characterisation.
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where PR is page rank of page, d damping factor, C is count of links leaving a page. Ti are pages pointing to A. This is same as principal eigen vector of the normalised link matrix of Web. Also same as probability of a random surfer, who starts at a random page clicking on a link from the current page, visits the given page. He gets bored and leaves current trail with prob d. When bored, he restarts with another page. He never uses back button.


http://citeseer.ist.psu.edu/henzinger02challenges.html
Challenges in web search engines.

Propagating Trust and Distrust to Demote Web Spam (2006)


The Happy Searcher: Challenges in Web Information Retrieval (2004)


Web Spam Taxonomy (2005)