Safe Level Graph for Synthetic Minority Over-sampling Techniques
—In the class imbalance problem, most existent classifiers which are designed by the distribution of balance datasets fail to recognize minority classes since a large number of negative instances can dominate a few positive instances. Borderline-SMOTE and Safe-Level-SMOTE are over-sampling techniques which are applied to handle this situation by generating synthetic instances in different regions. The former operates on the border of a minority class while the latter works inside the class far from the border. Unfortunately, a data miner is unable to conveniently justify a suitable SMOTE for each dataset. In this paper, a safe level graph is proposed as a guideline tool for selecting an appropriate SMOTE and describes the characteristic of a minority class in an imbalance dataset. Relying on advice of a safe level graph, the experimental success rate is shown to reach 73% when an F-measure is used as the performance measure and 78% for satisfactory AUCs.