Saturday, May 12, 2007

Semantic Web

Yes, yet another for the same course. Need to check out: RDF, KQML, KIF, Piggybank extension to Firefox, OWL, DAML, etc

Web mining

Another topic for the 'AI and Web'. Yet to start exploration...

http://infolab.stanford.edu/~ullman/mining/webmining.html

Web mining tutorial:

(also contains other tutorials: World Wide Web Personalization,
Genetic Algorithms
Robust Statistics

Prototype Based Clustering

Relational Clustering

Robust Clustering

Unsupervised Clustering

Search Engines

Notes and resources on this topic: may be relevant for the course on 'AI and Web' being planned.

Ref1 :
A good reference for google's page rank algorithm, including an intuitive/formal characterisation.
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where PR is page rank of page, d damping factor, C is count of links leaving a page. Ti are pages pointing to A. This is same as principal eigen vector of the normalised link matrix of Web. Also same as probability of a random surfer, who starts at a random page clicking on a link from the current page, visits the given page. He gets bored and leaves current trail with prob d. When bored, he restarts with another page. He never uses back button.


http://citeseer.ist.psu.edu/henzinger02challenges.html
Challenges in web search engines.

Propagating Trust and Distrust to Demote Web Spam (2006)


The Happy Searcher: Challenges in Web Information Retrieval (2004)


Web Spam Taxonomy (2005)

Thursday, March 15, 2007

Some amazing sites:

http://www.gurlzgroup.blogspot.com/ -- They have got some amazing pictures including roadside paintings with a real 3-D feel, nature photos, cars, etc.

Saturday, January 27, 2007

Natural Language Processing - resources...

I offered to give a talk on "open source" and "natural language processing" in Linux Asia, and was doing a scan to find what resources are available. Here are some gems I found.

Projects in NLP

http://www.cs.mu.oz.au/research/lt/student-projects.html


Toolkits, etc
  • nltk.sourceforget.net - comprehensive NL toolkit including various datasets, etc.
  • BOW toolkit: http://www.cs.cmu.edu/~mccallum/bow/ [A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering]
  • CLUTO - a clustering toolkit. www-users.cs.umn.edu/~karypis/cluto