Friday, June 8, 2007
Saturday, May 12, 2007
Semantic Web
Yes, yet another for the same course. Need to check out: RDF, KQML, KIF, Piggybank extension to Firefox, OWL, DAML, etc
Web mining
Another topic for the 'AI and Web'. Yet to start exploration...
http://infolab.stanford.edu/~ullman/mining/webmining.html
Web mining tutorial:
(also contains other tutorials: World Wide Web Personalization, Genetic Algorithms Robust Statistics Prototype Based Clustering Relational Clustering
http://infolab.stanford.edu/~ullman/mining/webmining.html
Web mining tutorial:
(also contains other tutorials: World Wide Web Personalization,
Search Engines
Notes and resources on this topic: may be relevant for the course on 'AI and Web' being planned.
Ref1 :
A good reference for google's page rank algorithm, including an intuitive/formal characterisation.
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where PR is page rank of page, d damping factor, C is count of links leaving a page. Ti are pages pointing to A. This is same as principal eigen vector of the normalised link matrix of Web. Also same as probability of a random surfer, who starts at a random page clicking on a link from the current page, visits the given page. He gets bored and leaves current trail with prob d. When bored, he restarts with another page. He never uses back button.
http://citeseer.ist.psu.edu/henzinger02challenges.html
Challenges in web search engines.
Propagating Trust and Distrust to Demote Web Spam (2006)
The Happy Searcher: Challenges in Web Information Retrieval (2004)
Web Spam Taxonomy (2005)
Ref1 :
A good reference for google's page rank algorithm, including an intuitive/formal characterisation.
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where PR is page rank of page, d damping factor, C is count of links leaving a page. Ti are pages pointing to A. This is same as principal eigen vector of the normalised link matrix of Web. Also same as probability of a random surfer, who starts at a random page clicking on a link from the current page, visits the given page. He gets bored and leaves current trail with prob d. When bored, he restarts with another page. He never uses back button.
http://citeseer.ist.psu.edu/henzinger02challenges.html
Challenges in web search engines.
Propagating Trust and Distrust to Demote Web Spam (2006)
The Happy Searcher: Challenges in Web Information Retrieval (2004)
Web Spam Taxonomy (2005)
Thursday, March 15, 2007
Sunday, February 18, 2007
E-learning resources
Some e-books on this:
http://books.google.co.in/books?vid=ISBN839223376X&id=NUBKhkISigAC
http://books.google.co.in/books?vid=ISBN839223376X&id=NUBKhkISigAC
Saturday, January 27, 2007
Natural Language Processing - resources...
I offered to give a talk on "open source" and "natural language processing" in Linux Asia, and was doing a scan to find what resources are available. Here are some gems I found.
Projects in NLP
http://www.cs.mu.oz.au/research/lt/student-projects.html
Toolkits, etc
Projects in NLP
http://www.cs.mu.oz.au/research/lt/student-projects.html
Toolkits, etc
- nltk.sourceforget.net - comprehensive NL toolkit including various datasets, etc.
- BOW toolkit: http://www.cs.cmu.edu/~mccallum/bow/ [A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering]
- CLUTO - a clustering toolkit. www-users.cs.umn.edu/~karypis/cluto
Subscribe to:
Posts (Atom)