Saturday, October 14, 2006

Free machine learning research idea

So this was a machine learning paper I've been meaning to write but probably will not get around to ...

In co-training, one uses multiple feature sets to describe the same data items, and trains classifiers for both, using the labels provided by one classifier on unlabeled to provide labeled data for retraining the other, with ostensibly improved results. If both classifiers are weakly predictive, this will produce good results even with a very small amount of labeled data.

A similar idea could extend to clustering, where one could break down a set of features into two or more mutually independent (or reasonably close to that) subsets, and perform clustering along each iteratively, using the clustering of one to improve the clustering of the other - e.g. weighting the features in such a way as to reflect the clustering provided by the other feature set. Would this perform better than clustering along the full feature set alone? Under what circumstances?

0 Comments:

Post a Comment

<< Home