Permutation Tests for Studying Classifier Performance Markus Ojala Helsinki Institute for Information Technology

pdf file
size 288,65 KB

added by sd 12/18/2016 18:54
info modified 12/19/2016 17:00

Permutation Tests for Studying Classifier Performance Markus Ojala Helsinki Institute for Information Technology

Building effective classification systems is a central task in data mining and machine learning.
Usually, a classification algorithm builds a model from a given set of data records in which the labels
are known, and later, the learned model is used to assign labels to new data points. Applications of
such classification setting abound in many fields, for instance, in text categorization, fraud detection,
optical character recognition, or medical diagnosis, to cite some.
For all these applications, a desired property of a good classifier is the power of generalization
to new, unknown instances. The detection and characterization of statistically significant predictive
patterns is crucial for obtaining a good classification accuracy that generalizes beyond the training
data. Unfortunately, it is very often the case that the number of available data points with labels is
not sufficient.