Inicio  /  Information  /  Vol: 9 Par: 12 (2018)  /  Artículo
ARTÍCULO
TITULO

LICIC: Less Important Components for Imbalanced Multiclass Classification

Vincenzo Dentamaro    
Donato Impedovo and Giuseppe Pirlo    

Resumen

Multiclass classification in cancer diagnostics, using DNA or Gene Expression Signatures, but also classification of bacteria species fingerprints in MALDI-TOF mass spectrometry data, is challenging because of imbalanced data and the high number of dimensions with respect to the number of instances. In this study, a new oversampling technique called LICIC will be presented as a valuable instrument in countering both class imbalance, and the famous ?curse of dimensionality? problem. The method enables preservation of non-linearities within the dataset, while creating new instances without adding noise. The method will be compared with other oversampling methods, such as Random Oversampling, SMOTE, Borderline-SMOTE, and ADASYN. F1 scores show the validity of this new technique when used with imbalanced, multiclass, and high-dimensional datasets.

Palabras claves

 Artículos similares