PENANGANAN DATA MISSING VALUE PADA KUALITAS PRODUKSI JAGUNG DENGAN MENGGUNAKAN METODE K-NN IMPUTATION PADA ALGORITMA C4.5
Corn is a staple crop for Indonesian people because most of his life is from the agriculture sector. To increase the productivity of corn, another thing to be aware of is looking at the quality of the corn products. Through empirical observation and observation, research explores and extracts data through the concept of data mining so that neglected data becomes useful. Thus determining the quality of corn production is an important task to help the farmers in determining the classification process. Missing value is a problem in maintaining a quality data. Missing value can be caused by several things, one of which is caused by an error at the time of data entry. Missing value will be a problem when the amount of data in large quantities, so it is very influential in the survey results. Therefore on this research proposed K-NN imputation method to handle missing value data. The results showed the accuracy of the C 4.5 algorithm classification process on the corn production dataset that experienced a missing value accuracy value of 92.90%. Whereas if done with special handling using the method K-NN imputation on the handling process missing value best value at k = 5 of 94.50% with this that the proposed method increases significantly.
M. A. Bustomi and Z. Dzulfikar, “Analisis Distribusi Intensitas RGB Citra Digital untuk Klasifikasi Kualitas Biji Jagung menggunakan Jaringan Syaraf Tiruan,” Fis. Dan Apl., vol. 10, no. 3, pp. 127–132, 2014.
L. Rokach and O. Maimon, Data Mining With Decision Trees - Theory and Applications. 2015.
T. Wang, Z. Qin, Z. Jin, and S. Zhang, “Handling over-fitting in test cost-sensitive decision tree learning by feature selection, smoothing and pruning,” Journal of Systems and Software, vol. 83, no. 7. pp. 1137–1147, 2010.
M. Malarvizhi and A. Thanamani, “K-NN Classifier Performs Better Than K-Means Clustering in Missing Value Imputation,” IOSR J. Comput. Eng., vol. 6, no. 5, pp. 12–15, 2012.
G. E. A. P. A. Batista and M. C. Monard, “A study of k-nearest neighbour as an imputation method,” Front. Artif. Intell. Appl., vol. 87, pp. 251–260, 2002.
E. S. Rahayu, R. Satria, and C. Supriyanto, “Penerapan Metode Average Gain , Threshold Pruning dan Cost Complexity Pruning untuk Split Atribut pada Algoritma C4 . 5,” J. Intell. Syst., vol. 1, no. 2, pp. 91–97, 2015.
C. J. Mantas and J. Abellán, “Credal-C4.5: Decision tree based on imprecise probabilities to classify noisy data,” Expert Syst. Appl., vol. 41, no. 10, pp. 4625–4637, 2014.
Q. Song, M. Shepperd, X. Chen, and J. Liu, “Can K-NN imputation improve the performance of C4.5 with small software project data sets? A comparative evaluation,” J. Syst. Softw., vol. 81, no. 12, pp. 2361–2370, 2008.
D. T. Larose, Discovering Knowledge in Data an introduction to data mining. 2005.
E. Acuña and C. Rodriguez, “The Treatment of Missing Values and its Effect on Classifier Accuracy,” Classif. Clust. Data Min. Appl., no. 1995, pp. 639–647, 2004.
M. Sokolova and G. Lapalme, “A systematic analysis of performance measures for classification tasks,” Inf. Process. Manag., vol. 45, no. 4, pp. 427–437, 2009.
Copyright (c) 2019 Moch. Lutfi, Mochamad Hasyim
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Copyright in each article belongs to the author.
- The authors admit that RESISTOR Journal as a publisher who published the first time under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) License.
- Authors can include writing separately, regulate distribution of non-ekskulif of manuscripts that have been published in this journal into another version (eg sent to respository institution author, publication into a book, etc.), by recognizing that the manuscripts have been published for the first time in RESISTOR Journal