TY - GEN
T1 - Classifying remote sensing data with support vector machines and imbalanced training data
AU - Waske, Björn
AU - Benediktsson, Jon Atli
AU - Sveinsson, Johannes R.
PY - 2009
Y1 - 2009
N2 - The classification of remote sensing data with imbalanced training data is addressed. The classification accuracy of a supervised method is affected by several factors, such as the classifier algorithm, the input data and the available training data. The use of an imbalanced training set, i.e., the number of training samples from one class is much smaller than from other classes, often results in low classification accuracies for the small classes. In the present study support vector machines (SVM) are trained with imbalanced training data. To handle the imbalanced training data, the training data are resampled (i.e., bagging) and a multiple classifier system, with SVM as base classifier, is generated. In addition to the classifier ensemble a single SVM is applied to the data, using the original balanced and the imbalanced training data sets. The results underline that the SVM classification is affected by imbalanced data sets, resulting in dominant lower classification accuracies for classes with fewer training data. Moreover the detailed accuracy assessment demonstrates that the proposed approach significantly improves the class accuracies achieved by a single SVM, which is trained on the whole imbalanced training data set.
AB - The classification of remote sensing data with imbalanced training data is addressed. The classification accuracy of a supervised method is affected by several factors, such as the classifier algorithm, the input data and the available training data. The use of an imbalanced training set, i.e., the number of training samples from one class is much smaller than from other classes, often results in low classification accuracies for the small classes. In the present study support vector machines (SVM) are trained with imbalanced training data. To handle the imbalanced training data, the training data are resampled (i.e., bagging) and a multiple classifier system, with SVM as base classifier, is generated. In addition to the classifier ensemble a single SVM is applied to the data, using the original balanced and the imbalanced training data sets. The results underline that the SVM classification is affected by imbalanced data sets, resulting in dominant lower classification accuracies for classes with fewer training data. Moreover the detailed accuracy assessment demonstrates that the proposed approach significantly improves the class accuracies achieved by a single SVM, which is trained on the whole imbalanced training data set.
KW - Bagging
KW - Imbalanced training data
KW - Land cover classification
KW - Multispectral
KW - Support vector machines
UR - http://www.scopus.com/inward/record.url?scp=70349329573&partnerID=8YFLogxK
U2 - 10.1007/978-3-642-02326-2_38
DO - 10.1007/978-3-642-02326-2_38
M3 - Conference contribution
AN - SCOPUS:70349329573
SN - 3642023258
SN - 9783642023255
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 375
EP - 384
BT - Multiple Classifier Systems - 8th International Workshop, MCS 2009, Proceedings
T2 - 8th International Workshop on Multiple Classifier Systems, MCS 2009
Y2 - 10 June 2009 through 12 June 2009
ER -