Saturday, May 12, 2007
Semantic Web
Yes, yet another for the same course. Need to check out: RDF, KQML, KIF, Piggybank extension to Firefox, OWL, DAML, etc
Web mining
Another topic for the 'AI and Web'. Yet to start exploration...
http://infolab.stanford.edu/~ullman/mining/webmining.html
Web mining tutorial:
(also contains other tutorials: World Wide Web Personalization, Genetic Algorithms Robust Statistics Prototype Based Clustering Relational Clustering
http://infolab.stanford.edu/~ullman/mining/webmining.html
Web mining tutorial:
(also contains other tutorials: World Wide Web Personalization,
Search Engines
Notes and resources on this topic: may be relevant for the course on 'AI and Web' being planned.
Ref1 :
A good reference for google's page rank algorithm, including an intuitive/formal characterisation.
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where PR is page rank of page, d damping factor, C is count of links leaving a page. Ti are pages pointing to A. This is same as principal eigen vector of the normalised link matrix of Web. Also same as probability of a random surfer, who starts at a random page clicking on a link from the current page, visits the given page. He gets bored and leaves current trail with prob d. When bored, he restarts with another page. He never uses back button.
http://citeseer.ist.psu.edu/henzinger02challenges.html
Challenges in web search engines.
Propagating Trust and Distrust to Demote Web Spam (2006)
The Happy Searcher: Challenges in Web Information Retrieval (2004)
Web Spam Taxonomy (2005)
Ref1 :
A good reference for google's page rank algorithm, including an intuitive/formal characterisation.
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where PR is page rank of page, d damping factor, C is count of links leaving a page. Ti are pages pointing to A. This is same as principal eigen vector of the normalised link matrix of Web. Also same as probability of a random surfer, who starts at a random page clicking on a link from the current page, visits the given page. He gets bored and leaves current trail with prob d. When bored, he restarts with another page. He never uses back button.
http://citeseer.ist.psu.edu/henzinger02challenges.html
Challenges in web search engines.
Propagating Trust and Distrust to Demote Web Spam (2006)
The Happy Searcher: Challenges in Web Information Retrieval (2004)
Web Spam Taxonomy (2005)
Subscribe to:
Posts (Atom)