Optimization of Body Mass Index Classification Using Machine Learning Approach for Early Detection of Obesity Risk
Abstract
This study aims to optimize the classification of obesity risk at an early stage using Principal Component Analysis (PCA), which is an important technique in machine learning. PCA is used to reduce the dimensionality of data, maintain important information without losing data, and has the advantage of reducing complexity which usually increases the risk of overfitting. The obesity dataset will be classified using algorithms such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree, Random Forest, Gradient Boosting Linear, and XGBoost. Specifically, each algorithm is chosen because of its respective advantages: KNN for nonlinear data, SVM for high-dimensional data, and Random Forest and XGBoost for complex data patterns. Evaluation is carried out using metrics such as accuracy, precision, recall, and F1-score to assess the performance of the algorithm. The results show that the Random Forest and XGBoost algorithms provide the best performance in terms of accuracy, especially when all dataset features are used without PCA reduction. This study is expected to be a consideration in determining the best algorithm for obesity classification, supporting early detection, and facilitating decision making in health analysis.
References
Baiq Nurul Azmi, Arief Hermawan, & Donny Avianto. (2023). Analisis Pengaruh Komposisi Data Training dan Data Testing pada Penggunaan PCA dan Algoritma Decision Tree untuk Klasifikasi Penderita Penyakit Liver. JTIM : Jurnal Teknologi Informasi Dan Multimedia, 4(4), 281–290. https://doi.org/10.35746/jtim.v4i4.298
Blüher, M. (2020). Metabolically healthy obesity. Endocrine Reviews, 41(3), 405–420. https://doi.org/10.1210/endrev/bnaa004
Cholil, S. R., Handayani, T., Prathivi, R., & Ardianita, T. (2021). Implementasi Algoritma Klasifikasi K-Nearest Neighbor (KNN) Untuk Klasifikasi Seleksi Penerima Beasiswa. IJCIT (Indonesian Journal on Computer and Information Technology), 6(2), 118–127. https://doi.org/10.31294/ijcit.v6i2.10438
Dewi, S., & Pakereng, M. A. I. (2023). Implementasi Principal Component Analysis Pada K-Means Untuk Klasterisasi Tingkat Pendidikan Penduduk Kabupaten Semarang. JIPI (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika), 8(4), 1186–1195. https://doi.org/10.29100/jipi.v8i4.4101
Dhurandhar, N. V. (2022). What is obesity?: Obesity Musings. International Journal of Obesity, 46(6), 1081–1082. https://doi.org/10.1038/s41366-022-01088-1
Georganos, S., Grippa, T., Niang Gadiaga, A., Linard, C., Lennert, M., Vanhuysse, S., Mboga, N., Wolff, E., & Kalogirou, S. (2021). Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International, 36(2), 121–136. https://doi.org/10.1080/10106049.2019.1595177
Herni Yulianti, S. E., Oni Soesanto, & Yuana Sukmawaty. (2022). Penerapan Metode Extreme Gradient Boosting (XGBOOST) pada Klasifikasi Nasabah Kartu Kredit. Journal of Mathematics: Theory and Applications, 4(1), 21–26. https://doi.org/10.31605/jomta.v4i1.1792
Hovi, H. S. W., Id Hadiana, A., & Rakhmat Umbara, F. (2022). Prediksi Penyakit Diabetes Menggunakan Algoritma Support Vector Machine (SVM). Informatics and Digital Expert (INDEX), 4(1), 40–45. https://doi.org/10.36423/index.v4i1.895
Idris, I. S. K., Mustofa, Y. A., & Salihi, I. A. (2023). Analisis Sentimen Terhadap Penggunaan Aplikasi Shopee Mengunakan Algoritma Support Vector Machine (SVM). Jambura Journal of Electrical and Electronics Engineering, 5(1), 32–35. https://doi.org/10.37905/jjeee.v5i1.16830
Klaten, T. P. R.-R. dr. S. T. (2022). Obesitas. KEMENKAS. https://yankes.kemkes.go.id/view_artikel/429/obesitas
Maskuri, M. N., Harliana, Sukerti, K., & Herdian Bhakti, R. M. (2022). Penerapan Algoritma K-Nearest Neighbor (KNN) untuk Memprediksi Penyakit Stroke Stroke Desease Predict Using KNN Algorithm. Jurnal Ilmiah Intech : Information Technology Journal of UMUS, 4(1), 130–140.
Murdika, U., Alif, M., & Mulyani, Y. (2021). Identifikasi Kualitas Buah Tomat dengan Metode PCA (Principal Component Analysis) dan Backpropagation. Electrician, 15(3), 175–180. https://doi.org/10.23960/elc.v15n3.2240
Nadiah, N., Soim, S., & Sholihin, S. (2022). Implementation of Decision Tree Algorithm Machine Learning in Detecting Covid-19 Virus Patients Using Public Datasets. Indonesian Journal of Artificial Intelligence and Data Mining, 5(1), 37–43. https://doi.org/10.24014/ijaidm.v5i1.17054
Nur Muhammad Ali Al Faizi, Mursyidul Ibad, Kuuni Ulfah Naila El Muna, & Budhi Setianto. (2023). Implementasi Principal Component Analysis dalam Analisis Faktor Kecacingan pada Anak Sekolah Dasar di Kabupaten Jember. SEHATMAS: Jurnal Ilmiah Kesehatan Masyarakat, 2(3), 700–710. https://doi.org/10.55123/sehatmas.v2i3.2327
Nurdiansyah, N., Muliadi, M., Herteno, R., Kartini, D., & Budiman, I. (2024). Implementasi Metode Principal Component Analysis (Pca) Dan Modified K-Nearest Neighbor Pada Klasifikasi Citra Daun Tanaman Herbal. Jurnal Mnemonic, 7(1), 1–9. https://doi.org/10.36040/mnemonic.v7i1.6664
Permana, A. P., Ainiyah, K., & Holle, K. F. H. (2021). Analisis Perbandingan Algoritma Decision Tree, kNN, dan Naive Bayes untuk Prediksi Kesuksesan Start-up. JISKA (Jurnal Informatika Sunan Kalijaga), 6(3), 178–188. https://doi.org/10.14421/jiska.2021.6.3.178-188
Pratiwi, S. A., Fauzi, A., Lestari, S. A. P., & Cahyana, Y. (2024). KLIK: Kajian Ilmiah Informatika dan Komputer Prediksi Persediaan Obat Pada Apotek Menggunakan Algoritma Decision Tree. KLIK: Kajian Ilmiah Informatika Dan Komputer, 4(4), 2381–2388. https://doi.org/10.30865/klik.v4i4.1681
Sajiwo, A. F. B., Rahmat, B., & Junaidi, A. (2024). Klasifikasi Indeks Standar Pencemaran Udaran (Ispu) Menggunakan Algoritma Xgboost Dengan Teknik Imbalanced Data (Smote). Jurnal Informatika Dan Teknik Elektro Terapan, 12(3), 2190–2200. https://doi.org/10.23960/jitet.v12i3.4699
Sari, L., Romadloni, A., & Listyaningrum, R. (2023). Penerapan Data Mining dalam Analisis Prediksi Kanker Paru Menggunakan Algoritma Random. 14(01), 155–162. https://doi.org/10.35970/infotekmesin.v14i1.1751
Sawant, N., & Khadapkar, D. R. (2022). Comparison of the performance of GaussianNB Algorithm, the K Neighbors Classifier Algorithm, the Logistic Regression Algorithm, the Linear Discriminant Analysis Algorithm, and the Decision Tree Classifier Algorithm on same dataset. International Journal for Research in Applied Science and Engineering Technology, 10(12), 1654–1665. https://doi.org/10.22214/ijraset.2022.48311
Septian, F. (2023). Optimasi Klusterisasi pada Lama Tempo Pekerjaan Berbasis Gradient Boost Algorithm. Indonesian Journal Of Information Technology, 10(2), 1–5.
World Health Organisation. (2024). Obesity-and-Overweight. World Health Organisation. https://www.who.int/es/news-room/fact-sheets/detail/obesity-and-overweight
Copyright (c) 2025 Journal of Applied Business and Technology

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.













