Saturday, May 12, 2007

Semantic Web

Yes, yet another for the same course. Need to check out: RDF, KQML, KIF, Piggybank extension to Firefox, OWL, DAML, etc

Web mining

Another topic for the 'AI and Web'. Yet to start exploration...

http://infolab.stanford.edu/~ullman/mining/webmining.html

Web mining tutorial:

(also contains other tutorials: World Wide Web Personalization,
Genetic Algorithms
Robust Statistics

Prototype Based Clustering

Relational Clustering

Robust Clustering

Unsupervised Clustering

Search Engines

Notes and resources on this topic: may be relevant for the course on 'AI and Web' being planned.

Ref1 :
A good reference for google's page rank algorithm, including an intuitive/formal characterisation.
PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn)) where PR is page rank of page, d damping factor, C is count of links leaving a page. Ti are pages pointing to A. This is same as principal eigen vector of the normalised link matrix of Web. Also same as probability of a random surfer, who starts at a random page clicking on a link from the current page, visits the given page. He gets bored and leaves current trail with prob d. When bored, he restarts with another page. He never uses back button.


http://citeseer.ist.psu.edu/henzinger02challenges.html
Challenges in web search engines.

Propagating Trust and Distrust to Demote Web Spam (2006)


The Happy Searcher: Challenges in Web Information Retrieval (2004)


Web Spam Taxonomy (2005)

Thursday, March 15, 2007

Some amazing sites:

http://www.gurlzgroup.blogspot.com/ -- They have got some amazing pictures including roadside paintings with a real 3-D feel, nature photos, cars, etc.

Saturday, January 27, 2007

Natural Language Processing - resources...

I offered to give a talk on "open source" and "natural language processing" in Linux Asia, and was doing a scan to find what resources are available. Here are some gems I found.

Projects in NLP

http://www.cs.mu.oz.au/research/lt/student-projects.html


Toolkits, etc
  • nltk.sourceforget.net - comprehensive NL toolkit including various datasets, etc.
  • BOW toolkit: http://www.cs.cmu.edu/~mccallum/bow/ [A Toolkit for Statistical Language Modeling, Text Retrieval, Classification and Clustering]
  • CLUTO - a clustering toolkit. www-users.cs.umn.edu/~karypis/cluto

Thursday, January 18, 2007

IJCAI Proceedings online

All the proceedings of IJCAI conference series are now being made available online. Watch the www.ijcai.org site for details. From the www.ijcai-07.org site you can also watch all the invited talks of the ijcai-07 conference in video format.

Thursday, January 4, 2007

Doing a project with me?

If you are doing your degree project with me, you agree to the following conditions.

  1. If you are interested in just getting a signature for project, with minimal work, dont waste your time. You surely wont get the signature! Continue only if you are seriously interested in doing something different, challenging, etc and doing it well.
  2. You must sustain your interest in the project throughout the duration, and let me know if you lose interest at any point. I may be able to rekindle the interest in some cases.
  3. I will point-blank refuse to sign the report unless I am happy with the work AND the report.
  4. Note that report-writing is an expensive process. You must start atleast 3 months before submission, and work at it steadily. I require a minimum of two complete passes through the full report before it is ready for submission. If you have excellent communication skills in writing, you may survive with one - but that is my call, and you most likely do not fit into this category.
  5. No document to be submitted to the college, without my concurrence.
  6. You must send periodic progress reports (atleast once in 15 days) without fail. Over a month without progress reports imply that either you or the project does not exist - the project will not be resumed after this.
  7. If you are serious and take the project seriously, you will enjoy the project, will have a project and its report that you can feel proud about, and may be able to get some paper published in some conference, etc about your project. I may even talk about your project, in some of my presentations and my webpage.
  8. You should keep track of all correspondence with me - written, e-mail, notes, etc. These will be useful when you write your report, or when you are lost at times in the project. I normally behave as a stateless server - so I expect you to clarify points discussed earlier, if I am lost(!) or if I digress from the path taken earlier.
  9. For any material that you read towards the project, ensure that you have proper record including the full title, author names, reference information, etc. These will be useful and required for the reference section of your report and also to track down that paper later on.

Wednesday, January 3, 2007

FOSS tools/software

Conduct a survey: phpsurveyor: http://sourceforge.net/projects/phpsurveyor/

Monday, January 1, 2007