Building effective classification systems is a central task in data mining and machine learning.
Usually, a classification algorithm builds a model from a given set of data records in which the labels
are known, and later, the learned model is used to assign labels to new data points. Applications of
such classification setting abound in many fields, for instance, in text categorization, fraud detection,
optical character recognition, or medical diagnosis, to cite some.
For all these applications, a desired property of a good classifier is the power of generalization
to new, unknown instances. The detection and characterization of statistically significant predictive
patterns is crucial for obtaining a good classification accuracy that generalizes beyond the training
data. Unfortunately, it is very often the case that the number of available data points with labels is
not sufficient.