Soft Co-Occurrence Clustering for Natural Language Understanding

Fair Isaac Corporation Computer Science, 2006-07

Liaison(s): Frank Elliott
Advisor(s): Christine Alvarado
Students(s): Stephen Jones (PM), Christopher Kain, George Tucker, Craig Weidert

Co-clustering is a statistical technique that groups objects that share similar features. It has applications in many fields, including natural language processing. Current co-clustering algorithms limit each item to one cluster, but in many cases items fall naturally into more than one cluster (e.g. the word âmayâÂ_x009d_ in natural language processing). The team used a Dirichlet mixture model to implement âsoftâÂ_x009d_ co-clustering, assigning each item a probability of being in each cluster.