Exploring Gene Expression and Clinical Data for Identifying Prostate Cancer Severity Levels using Machine Learning Methods


Marouf A. A., Alhajj R., Rokne J. G., Ghose S., Bismar T. A.

2023 IEEE Canadian Conference on Electrical and Computer Engineering, CCECE 2023, Regina, Kanada, 24 - 27 Eylül 2023, cilt.2023-September, ss.186-191 identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Cilt numarası: 2023-September
  • Doi Numarası: 10.1109/ccece58730.2023.10288946
  • Basıldığı Şehir: Regina
  • Basıldığı Ülke: Kanada
  • Sayfa Sayıları: ss.186-191
  • Anahtar Kelimeler: Gleason Grading Group, Machine Learning, Prostate cancer, Random Forest, Severity levels
  • İstanbul Medipol Üniversitesi Adresli: Evet

Özet

Prostate cancer (PCa) is the most common type of cancer in men worldwide. It is a cancer that starts in the small walnut-shaped male gland called the prostate. From the prostate, it can form a metastasis into other organs. If detected and diagnosed early the survival rate may increase to 95%. Therefore, early detection and diagnosis are important tasks performed by a pathologist. The pathologist identifies the severity levels using a scale called the Gleason grading group (GGG). The GGG is found by pathologists by looking at a biopsy sample and assigning a grade of low, intermediate, or high to the sample. The pathologist then assesses a second sample in the same manner. The GGG is found by adding these two scores provides the total Gleason score. In this paper, we have explored tissue microarray (TMA) and clinical data collected by pathologists of Alberta Precision Laboratory, for predicting the severity level of prostate cancer using various machine learning methods. Traditional classifiers, such as Naïve Bayes, Decision Tree, Support Vector Machine with Radial basis function (RBF), Logistic Regression, and ensemble classifiers, such as Random Forest, and Bagging with k-nearest neighbors have been applied through the machine learning pipeline containing imputation and sampling techniques. An integrated SMOTE-Tomek Links method is adopted for handling the class imbalance problem. The highest accuracy obtained is 99.64% from the Random Forest method.