Borderline Over-sampling for Imbalanced Data Classification

5th International Workshop on Computational Intelligence & Applications Proceedings : IWCIA 2009 24-29 頁 2009-11 発行
アクセス数 : 1007
ダウンロード数 : 813

今月のアクセス数 : 7
今月のダウンロード数 : 11
ファイル情報(添付)
A1005.pdf 391 KB 種類 : 全文
タイトル ( eng )
Borderline Over-sampling for Imbalanced Data Classification
作成者
Nguyen Hien M.
Cooper Eric W.
Kamei Katsuari
収録物名
5th International Workshop on Computational Intelligence & Applications Proceedings : IWCIA 2009
開始ページ 24
終了ページ 29
抄録
Traditional classification algorithms, in many times, perform poorly on imbalanced data sets in which some classes are heavily outnumbered by the remaining classes. For this kind of data, minority class instances, which are usually much more of interest, are often misclassified. The paper proposes a method to deal with them by changing class distribution through oversampling at the borderline between the minority class and the majority class of the data set. A Support Vector Machines (SVMs) classifier then is trained to predict new unknown instances. Compared to other over-sampling methods, the proposed method focuses only on the minority class instances lying around the borderline due to the fact that this area is most crucial for establishing the decision boundary. Furthermore, new instances will be generated in such a manner that minority class area will be expanded further toward the side of the majority class at the places where there appear few majority class instances. Experimental results show that the proposed method can achieve better performance than some other over-sampling methods, especially with data sets having low degree of overlap due to its ability of expanding minority class area in such cases.
NDC分類
技術・工学 [ 500 ]
言語
英語
資源タイプ 会議発表論文
出版者
IEEE SMC Hiroshima Chapter
発行日 2009-11
権利情報
(c) Copyright by IEEE SMC Hiroshima Chapter.
出版タイプ Version of Record(出版社版。早期公開を含む)
アクセス権 オープンアクセス
収録物識別子
[ISSN] 1883-3977
[URI] http://www.hil.hiroshima-u.ac.jp/iwcia/2009/