Comparison of Feature Selection with Information Gain Method in Decision Tree, Regression Logistic and Random Forest Algorithms

  • Muhammad Sholeh Universitas AKPRIND Indonesia, Indonesia
  • Uning Lestari Universitas AKPRIND Indonesia, Indonesia
  • Dina Andayati Universitas AKPRIND Indonesia, Indonesia
Keywords: Feature Selection, Classification Model, Science Data

Abstract

One of the approaches that can be done is to perform feature selection. Feature selection is done by identifying the most informative features and not using features that do not directly contribute to the target feature. The purpose of feature selection is to increase the accuracy of the model. The research was conducted by comparing the performance of the model by comparing the accuracy results of the model without any feature selection with the model that has done feature selection. The process is done by comparing the accuracy results with decision tree, random forest and SVM algorithms. In the research method of feature selection on science data, the steps include understanding the domain and dataset, exploratory analysis, data cleaning, measuring feature relevance with criteria such as Information Gain, and feature ranking. The results are evaluated and validated using model performance metrics before and after feature selection. This process ensures selection of relevant features, improving accuracy. The research process used the Lung Cancer Prediction datasheet which consists of 306 rows and 16 attributes. The results show that feature selection can improve the performance of the classification model by reducing features that do not contribute to the target. Comparison results using decision tree, Regression Logistic and random forest classification model algorithms and feature selection resulted in a high accuracy value of 0.968 in the Regression Logistic algorithm with a feature selection of 5.

References

D. Cielen, A. D. B. Meysman, and M. Ali, Introducing Data Science. 2016.

M. Arhami and M. Nasir, Data Mining - Algoritma dan Implementasi. Yogyakarta: Penerbit Andi, 2020.

D. Jollyta, W. Ramdhan, and M. Zarlis, Konsep Data Mining Dan Penerapan. Yogyakarta: Deepublish Publisher, 2020.

P. Mathur, Machine Learning Applications Using Python. 2019.

M. Barlow, Learning to Love Data Science. Gravenstein Highway North, Sebastopol: O’Reilly Media, Inc, 2015.

D. Sarkar, R. Bali, and T. Sharma, Practical Machine Learning with Python. Bangalore, Karnataka, India: Apress, 2018.

A. Naif Alharbi and M. Dahab, “Comparative Study on Fast Feature Selection,” International Journal of Information Technology and Language Studies (IJITLS), vol. 2, no. 2, pp. 55–64, 2018.

S. H. A. Aini, Y. A. Sari, and A. Arwan, “Seleksi Fitur Information Gain untuk Klasifikasi Penyakit Jantung Menggunakan Kombinasi Metode K-Nearest Neighbor dan Naïve Bayes,” Jurnal Pengembangan Teknologi Informasi dan Ilmu Komputer, vol. 2, no. 9, pp. 2546–2554, 2018.

M. Rijal et al., “Perbandingan Kinerja Metode Seleksi Fitur untuk Mendeteksi Aktivitas Trojan Performance Comparison of Feature Selection Methods for Detecting Trojan Activity,” Jurnal_Pekommas_Vol._7_No, vol. 2, no. april 2020, pp. 85–97, 2022.

K. Kurniabudi, A. Harris, and A. Rahim, “Seleksi Fitur Dengan Information Gain Untuk Meningkatkan Deteksi Serangan DDoS menggunakan Random Forest,” Techno.Com, vol. 19, no. 1, pp. 56–66, 2020, doi: 10.33633/tc.v19i1.2860.

R. N. Yusra, O. S. Sitompul, and Sawaluddin, “Kombinasi K-Nearest Neighbor (KNN) dan Relief-F Untuk Meningkatkan Akurasi Pada Klasifikasi Data,” InfoTekJar: Jurnal Nasional Informatika dan Teknologi Jaringan, vol. 1, pp. 0–5, 2021.

E. Nurlia and U. Enri, “Penerapan Fitur Seleksi Forward Selection Untuk Menentukan Kematian Akibat Gagal Jantung Menggunakan Algoritma C4.5,” Jurnal Teknik Informatika Musirawas) Elin Nurlia, vol. 6, no. 1, p. 42, 2021.

A. N. Puteri, A. Arizal, and A. D. Achmad, “Feature Selection Correlation-Based pada Prediksi Nasabah Bank Telemarketing untuk Deposito,” MATRIK : Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer, vol. 20, no. 2, pp. 335–342, 2021, doi: 10.30812/matrik.v20i2.1183.

P. Riswanto, R. A. Aziz, and S. -, “Penerapan Decision Tree C4.5 Sebagai Seleksi Fitur Dan Support Vector Machine (Svm) Untuk Diagnosa Kanker Payudara,” Jurnal Informatika, vol. 19, no. 1, pp. 54–61, 2019, doi: 10.30873/ji.v19i1.1442.

A. Bode, “Seleksi Fitur Untuk Prediksi Rating Film Hollywood Menggunakan Model K-Nearest Neighbor,” JUPITER: Jurnal Penerapan Ilmu-ilmu Komputer, vol. 5, no. 1, 2019.

A. S. B. Asmoro, W. S. G. Irianto, and U. Pujianto, “Perbandingan Kinerja Hasil Seleksi Fitur pada Prediksi Kinerja Akademik Siswa Berbasis Pohon Keputusan.,” Jurnal Edukasi dan Penelitian Informatika (JEPIN), vol. 4, no. 2, p. 84, 2018.

I. M. B. Adnyana, “Penerapan Feature Selection untuk Prediksi Lama Studi Mahasiswa,” Jurnal Sistem dan Informatika, vol. 13, no. 2, pp. 72–76, 2019.

Harianto, A. Sunyoto, and S. Sudarmawan, “Optimasi Algoritma Naïve Bayes Classifier untuk Mendeteksi Anomaly dengan Univariate Fitur Selection,” Edumatic: Jurnal Pendidikan Informatika, vol. 4, no. 2, pp. 40–49, 2020, doi: 10.29408/edumatic.v4i2.2433.

M. Swamynathan, Mastering Machine Learning with Python in Six Steps. Bangalore, Karnataka, India: apress, 2017.

S. Ozdemir, Principles of Data Science. Birmingham: Packt Publishing Ltd, 2017.

S. Suraya, M. Sholeh, and D. Andayati, “Penerapan Metode Clustering Dengan Algoritma K-Means Pada Pengelompokan Indeks Prestasi Akademik Mahasiswa,” Skanika, vol. 6, no. 1, pp. 51–60, 2023, doi: 10.36080/skanika.v6i1.2982.

G. Bonaccorso, Machine Learning Algorithm. Birmingham: Packt Publishing Ltd, 2017.

A. Fadlli and M. I. Rosadi, “Klasifikasi Penyakit Jantung Koroner Menggunakan Seleksi Fitur dan Support Vector Machine,” Jurnal Explore IT, vol. 10, no. 2, pp. 32–41, 2018.

K. N. F. S. Dewi Fatmarani Surianto, “SELEKSI FITUR INFORMATION GAIN (IG) PADA KLASIFIKASI DATA OPINI SAHAM MENGGUNAKAN METODE NAÏVE BAYES,” Jurnal INSTEK (Informatika Sains dan Teknologi), vol. 8, no. 1, pp. 36–45, 2023.

C. Kuzudisli, B. Bakir-Gungor, N. Bulut, B. Qaqish, and M. Yousef, “Review of feature selection approaches based on grouping of features,” PeerJ, vol. 11, 2023, doi: 10.7717/peerj.15666.

Y. Bouchlaghem, Y. Akhiat, and S. Amjad, “Feature Selection: A Review and Comparative Study,” E3S Web of Conferences, vol. 351, pp. 1–6, 2022, doi: 10.1051/e3sconf/202235101046.

A. Hermawan and A. P. Wibowo, “Implementasi Korelasi untuk Seleksi Fitur pada Klasifikasi Jamur Beracun Menggunakan Jaringan Syaraf Tiruan,” Jurnal INTEK, vol. 5, no. 1, pp. 63–67, 2022.

J. Angelyn and R. N. Putri, “Diagnosis System Design of Depression and Anxiety with NAÏVE BAYES Method,” J. Appl. Bus. Technol., vol. 2, no. 2, pp. 92–97, 2021.

E. M. Nazara and D. Nasien, “Employee Attendance System Using Rapid Application Development Method Based on Location Based Service,” J. Appl. Bus. Technol., vol. 5, no. 2, pp. 96–104, 2024, doi: https://doi.org/10.35145/jabt.v5i2.148.

C. Effendy and G. Gusrianty, “Application of Round Robin in Scheduling in Web-Based Wedding Organizers,” J. Appl. Bus. Technol., vol. 5, no. 2, pp. 90–95, 2024, doi: https://doi.org/10.35145/jabt.v5i2.150.

E. Susanto, G. Gustientiedina, and M. Siddik, “Application of the Forward Chaining Method in Diagnosing Tomato Fever,” J. Appl. Bus. Technol., vol. 5, no. 1, pp. 41–50, 2024, doi: https://doi.org/10.35145/jabt.v5i1.143 1.0.

S. R. Silva et al., “Extensive Sheep and Goat Production: The Role of Novel Technologies towards Sustainability and Animal Welfare,” Animals, vol. 12, no. 885, pp. 1–28, 2022, doi: 10.3390/ani12070885.

J. Chen and G. Gustientiedina, “Implementation of Fuzzy Expert System to Detect Parkinson’s Disease Based on Mobile,” J. Appl. Bus. Technol., vol. 5, no. 2, pp. 72–81, 2024, doi: 10.35145/jabt.v5i2.145.

Sudarno, N. Y. Putri, N. Renaldo, M. B. Hutahuruk, and Cecilia, “Leveraging Information Technology for Enhanced Information Quality and Managerial Performance,” J. Appl. Bus. Technol., vol. 3, no. 1, pp. 102–114, 2022, doi: https://doi.org/10.35145/jabt.v3i1.97.

N. Renaldo, Sudarno, M. B. Hutahuruk, A. T. Junaedi, Andi, and Suhardjo, “The Effect of Entrepreneurship Characteristics, Business Capital, and Technological Sophistication on MSME Performance,” J. Appl. Bus. Technol., vol. 2, no. 2, pp. 109–117, 2021, doi: https://doi.org/10.35145/jabt.v2i2.74.

N. Renaldo, Suhardjo, Suharti, Suyono, and Cecilia, “Benefits and Challenges of Technology and Information Systems on Performance,” J. Appl. Bus. Technol., vol. 3, no. 3, pp. 302–305, 2022, doi: https://doi.org/10.35145/jabt.v3i3.114.

M. Irman, E. A. Suhendra, and H. Diana, “Work Experience, Professionalism, Independence and the Application of Information Technology on Auditor Performance in Order to Increasing Audit Quality at the Financial Audit Agency of the Republic Indonesia Representative of the Riau Province,” J. Appl. Bus. Technol., vol. 2, no. 3, pp. 206–222, 2021, doi: https://doi.org/10.35145/jabt.v2i3.78.

Published
2024-09-30
How to Cite
Sholeh, M., Lestari, U., & Andayati, D. (2024). Comparison of Feature Selection with Information Gain Method in Decision Tree, Regression Logistic and Random Forest Algorithms. Journal of Applied Business and Technology, 5(3), 146-153. https://doi.org/10.35145/jabt.v5i3.155