Saturday, May 12, 2007

Search Engines

Notes and resources on this topic: may be relevant for the course on 'AI and Web' being planned.

Ref1 :
A good reference for google's page rank algorithm, including an intuitive/formal characterisation.
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where PR is page rank of page, d damping factor, C is count of links leaving a page. Ti are pages pointing to A. This is same as principal eigen vector of the normalised link matrix of Web. Also same as probability of a random surfer, who starts at a random page clicking on a link from the current page, visits the given page. He gets bored and leaves current trail with prob d. When bored, he restarts with another page. He never uses back button.


http://citeseer.ist.psu.edu/henzinger02challenges.html
Challenges in web search engines.

Propagating Trust and Distrust to Demote Web Spam (2006)


The Happy Searcher: Challenges in Web Information Retrieval (2004)


Web Spam Taxonomy (2005)

No comments: