Enhancing Software Effort Estimation in Healthcare Informatics: A Comparative Analysis of Machine Learning Models with Correlation-Based Feature Selection
Main Article Content
Abstract
Software effort estimation is one of the most crucial processes in the management of software projects predominantly related to the healthcare industry. It involves the prediction of efforts needed to develop and endorse different software applications. To render clinical projects on time within the budget range, flawless projection with efficient planning is incumbent. This paper discloses the techniques that utilize machine learning models for ameliorating software effort estimation by using biomedical datasets, including Breast Cancer Wisconsin, COVID-19, Sleepy Drivers EEG Brainwave, Heart Disease Prediction and Food Nutrition. All of these datasets are cleaned and prepared by handling missing values, converting categorical features, and splitting data into training and testing sets and are being trained by four popular machine learning models; Linear Regression, Gradient Boosting, Random Forest, and Decision Tree. Furthermore, correlation based features are selected in the feature matrix to investigate the influence of statistically linked features and to promote reliability. For evaluation and measurement of the effectiveness of these models, two performance metrics namely: R2 and Root Mean Squared Error are employed. The outcomes of the study delineate that Linear Regression and Gradient Boosting models give substantially better results than other models when choosing features on the basis of correlation. R2 scores are strikingly impressive for Food Nutrition, Breast Cancer, COVID-19, while RMSE scores are lowest for COVID-19 dataset, showing high accuracy.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution 4.0 International License.