Tweet and user validation with supervised feature ranking and rumor classification

Sailunaz K., Kawash J., Alhajj R.

Multimedia Tools and Applications, vol.81, no.22, pp.31907-31927, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 81 Issue: 22
  • Publication Date: 2022
  • Doi Number: 10.1007/s11042-022-12616-6
  • Journal Name: Multimedia Tools and Applications
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, FRANCIS, ABI/INFORM, Applied Science & Technology Source, Compendex, Computer & Applied Sciences, INSPEC, zbMATH
  • Page Numbers: pp.31907-31927
  • Keywords: Social media, Twitter, Classification, Rumors, Ranking, Support vector machine, Naive bayes, Random forest, Logistic regression, CNN, And LSTM
  • Istanbul Medipol University Affiliated: Yes


Filtering fake news from social network posts and detecting social network users who are responsible for generating and propagating these rumors have become two major issues with the increased popularity of social networking platforms. As any user can post anything on social media and that post can instantly propagate to all over the world, it is important to recognize if the post is rumor or not. Twitter is one of the most popular social networking platforms used for news broadcasting mostly as tweets and retweets. Hence, validating tweets and users based on their posts and behavior on Twitter has become a social, political and international issue. In this paper, we proposed a method to classify rumor and non-rumor tweets by applying a novel tweet and user feature ranking approach with Decision Tree and Logistic Regression that were applied on both tweet and user features extracted from a benchmark rumor dataset ‘PHEME’. The effect of the ranking model was then shown by classifying the dataset with the ranked features and comparing them with the basic classifications with various combination of features. Both supervised classification algorithms (namely, Support Vector Machine, Naïve Bayes, Random Forest and Logistic Regression) and deep learning algorithms (namely, Convolutional Neural Network and Long Short-Term Memory) were used for rumor detection. The classification accuracy showed that the feature ranking classification results were comparable to the original classification performances. The ranking models were also used to list the topmost tweets and users with different conditions and the results showed that even if the features were ranked differently by LR and RF, the topmost results for tweets and users for both rumors and non-rumors were the same.