Optimization of Body Mass Index Classification Using Machine Learning Approach for Early Detection of Obesity Risk

  • Dewi Nasien Institut Bisnis dan Teknologi Pelita Indonesia, Indonesia
  • Steven Owen Institut Bisnis dan Teknologi Pelita Indonesia, Indonesia
  • Fenly Fenly Institut Bisnis dan Teknologi Pelita Indonesia, Indonesia
  • Johanes Johanes Institut Bisnis dan Teknologi Pelita Indonesia, Indonesia
  • Frendly Lombu Institut Bisnis dan Teknologi Pelita Indonesia, Indonesia
  • Leo Leo Institut Bisnis dan Teknologi Pelita Indonesia, Indonesia
  • Zirawani Baharum Universiti Kuala Lumpur, Malaysia
Keywords: Obesity, PCA, Classification, Machine Learning

Abstract

This study aims to optimize the classification of obesity risk at an early stage using Principal Component Analysis (PCA), which is an important technique in machine learning. PCA is used to reduce the dimensionality of data, maintain important information without losing data, and has the advantage of reducing complexity which usually increases the risk of overfitting. The obesity dataset will be classified using algorithms such as K-Nearest Neighbor (KNN), Support Vector Machine (SVM), Decision Tree, Random Forest, Gradient Boosting Linear, and XGBoost. Specifically, each algorithm is chosen because of its respective advantages: KNN for nonlinear data, SVM for high-dimensional data, and Random Forest and XGBoost for complex data patterns. Evaluation is carried out using metrics such as accuracy, precision, recall, and F1-score to assess the performance of the algorithm. The results show that the Random Forest and XGBoost algorithms provide the best performance in terms of accuracy, especially when all dataset features are used without PCA reduction. This study is expected to be a consideration in determining the best algorithm for obesity classification, supporting early detection, and facilitating decision making in health analysis.

References

Baiq Nurul Azmi, Arief Hermawan, & Donny Avianto. (2023). Analisis Pengaruh Komposisi Data Training dan Data Testing pada Penggunaan PCA dan Algoritma Decision Tree untuk Klasifikasi Penderita Penyakit Liver. JTIM : Jurnal Teknologi Informasi Dan Multimedia, 4(4), 281–290. https://doi.org/10.35746/jtim.v4i4.298

Blüher, M. (2020). Metabolically healthy obesity. Endocrine Reviews, 41(3), 405–420. https://doi.org/10.1210/endrev/bnaa004

Cholil, S. R., Handayani, T., Prathivi, R., & Ardianita, T. (2021). Implementasi Algoritma Klasifikasi K-Nearest Neighbor (KNN) Untuk Klasifikasi Seleksi Penerima Beasiswa. IJCIT (Indonesian Journal on Computer and Information Technology), 6(2), 118–127. https://doi.org/10.31294/ijcit.v6i2.10438

Dewi, S., & Pakereng, M. A. I. (2023). Implementasi Principal Component Analysis Pada K-Means Untuk Klasterisasi Tingkat Pendidikan Penduduk Kabupaten Semarang. JIPI (Jurnal Ilmiah Penelitian Dan Pembelajaran Informatika), 8(4), 1186–1195. https://doi.org/10.29100/jipi.v8i4.4101

Dhurandhar, N. V. (2022). What is obesity?: Obesity Musings. International Journal of Obesity, 46(6), 1081–1082. https://doi.org/10.1038/s41366-022-01088-1

Georganos, S., Grippa, T., Niang Gadiaga, A., Linard, C., Lennert, M., Vanhuysse, S., Mboga, N., Wolff, E., & Kalogirou, S. (2021). Geographical random forests: a spatial extension of the random forest algorithm to address spatial heterogeneity in remote sensing and population modelling. Geocarto International, 36(2), 121–136. https://doi.org/10.1080/10106049.2019.1595177

Herni Yulianti, S. E., Oni Soesanto, & Yuana Sukmawaty. (2022). Penerapan Metode Extreme Gradient Boosting (XGBOOST) pada Klasifikasi Nasabah Kartu Kredit. Journal of Mathematics: Theory and Applications, 4(1), 21–26. https://doi.org/10.31605/jomta.v4i1.1792

Hovi, H. S. W., Id Hadiana, A., & Rakhmat Umbara, F. (2022). Prediksi Penyakit Diabetes Menggunakan Algoritma Support Vector Machine (SVM). Informatics and Digital Expert (INDEX), 4(1), 40–45. https://doi.org/10.36423/index.v4i1.895

Idris, I. S. K., Mustofa, Y. A., & Salihi, I. A. (2023). Analisis Sentimen Terhadap Penggunaan Aplikasi Shopee Mengunakan Algoritma Support Vector Machine (SVM). Jambura Journal of Electrical and Electronics Engineering, 5(1), 32–35. https://doi.org/10.37905/jjeee.v5i1.16830

Klaten, T. P. R.-R. dr. S. T. (2022). Obesitas. KEMENKAS. https://yankes.kemkes.go.id/view_artikel/429/obesitas

Maskuri, M. N., Harliana, Sukerti, K., & Herdian Bhakti, R. M. (2022). Penerapan Algoritma K-Nearest Neighbor (KNN) untuk Memprediksi Penyakit Stroke Stroke Desease Predict Using KNN Algorithm. Jurnal Ilmiah Intech : Information Technology Journal of UMUS, 4(1), 130–140.

Murdika, U., Alif, M., & Mulyani, Y. (2021). Identifikasi Kualitas Buah Tomat dengan Metode PCA (Principal Component Analysis) dan Backpropagation. Electrician, 15(3), 175–180. https://doi.org/10.23960/elc.v15n3.2240

Nadiah, N., Soim, S., & Sholihin, S. (2022). Implementation of Decision Tree Algorithm Machine Learning in Detecting Covid-19 Virus Patients Using Public Datasets. Indonesian Journal of Artificial Intelligence and Data Mining, 5(1), 37–43. https://doi.org/10.24014/ijaidm.v5i1.17054

Nur Muhammad Ali Al Faizi, Mursyidul Ibad, Kuuni Ulfah Naila El Muna, & Budhi Setianto. (2023). Implementasi Principal Component Analysis dalam Analisis Faktor Kecacingan pada Anak Sekolah Dasar di Kabupaten Jember. SEHATMAS: Jurnal Ilmiah Kesehatan Masyarakat, 2(3), 700–710. https://doi.org/10.55123/sehatmas.v2i3.2327

Nurdiansyah, N., Muliadi, M., Herteno, R., Kartini, D., & Budiman, I. (2024). Implementasi Metode Principal Component Analysis (Pca) Dan Modified K-Nearest Neighbor Pada Klasifikasi Citra Daun Tanaman Herbal. Jurnal Mnemonic, 7(1), 1–9. https://doi.org/10.36040/mnemonic.v7i1.6664

Permana, A. P., Ainiyah, K., & Holle, K. F. H. (2021). Analisis Perbandingan Algoritma Decision Tree, kNN, dan Naive Bayes untuk Prediksi Kesuksesan Start-up. JISKA (Jurnal Informatika Sunan Kalijaga), 6(3), 178–188. https://doi.org/10.14421/jiska.2021.6.3.178-188

Pratiwi, S. A., Fauzi, A., Lestari, S. A. P., & Cahyana, Y. (2024). KLIK: Kajian Ilmiah Informatika dan Komputer Prediksi Persediaan Obat Pada Apotek Menggunakan Algoritma Decision Tree. KLIK: Kajian Ilmiah Informatika Dan Komputer, 4(4), 2381–2388. https://doi.org/10.30865/klik.v4i4.1681

Sajiwo, A. F. B., Rahmat, B., & Junaidi, A. (2024). Klasifikasi Indeks Standar Pencemaran Udaran (Ispu) Menggunakan Algoritma Xgboost Dengan Teknik Imbalanced Data (Smote). Jurnal Informatika Dan Teknik Elektro Terapan, 12(3), 2190–2200. https://doi.org/10.23960/jitet.v12i3.4699

Sari, L., Romadloni, A., & Listyaningrum, R. (2023). Penerapan Data Mining dalam Analisis Prediksi Kanker Paru Menggunakan Algoritma Random. 14(01), 155–162. https://doi.org/10.35970/infotekmesin.v14i1.1751

Sawant, N., & Khadapkar, D. R. (2022). Comparison of the performance of GaussianNB Algorithm, the K Neighbors Classifier Algorithm, the Logistic Regression Algorithm, the Linear Discriminant Analysis Algorithm, and the Decision Tree Classifier Algorithm on same dataset. International Journal for Research in Applied Science and Engineering Technology, 10(12), 1654–1665. https://doi.org/10.22214/ijraset.2022.48311

Septian, F. (2023). Optimasi Klusterisasi pada Lama Tempo Pekerjaan Berbasis Gradient Boost Algorithm. Indonesian Journal Of Information Technology, 10(2), 1–5.

World Health Organisation. (2024). Obesity-and-Overweight. World Health Organisation. https://www.who.int/es/news-room/fact-sheets/detail/obesity-and-overweight

Published
2025-09-30
How to Cite
Nasien, D., Owen, S., Fenly, F., Johanes, J., Lombu, F., Leo, L., & Baharum, Z. (2025). Optimization of Body Mass Index Classification Using Machine Learning Approach for Early Detection of Obesity Risk. Journal of Applied Business and Technology, 6(3), 193-200. https://doi.org/10.35145/jabt.v6i3.201