Empowering Detection of Malicious Social Bots and Content Spammers on Twitter by Sentiment Analysis
Abstract
The role of Twitter as a platform to share opinions has been growing in the recent years especially since it has been widely used by public personae such as politicians, personalities of the show business, and other influencers to communicate with the public. For these reasons, the use of social bots to manipulate information and influence people's opinions is also growing. In this paper, we use a supervised classification model to distinguish bots from legitimate users on Twitter. More specifically, we show the importance of sentiment features in bot-human account detection. Moreover, we evaluate our detection model by testing on Russian bot accounts who are the most recent set of social bots that appeared on Twitter to show that these techniques may be easily adapted to work on new, unseen types of social bots.
References
Abokhodair, N., Yoo, D., and McDonald, D. W. (2016). Dissecting a social botnet: Growth, content and in
uence in twitter. CoRR, abs/1604.03627.
Ahmadi, M., Ulyanov, D., Semenov, S., Tromov, M., and Giacinto, G. (2016). Novel feature extraction, selection and fusion for eective malware family classication. In Proceedings of the Sixth ACM Conference on Data and Application Security and Pri-
vacy, CODASPY '16, pages 183-194, New York, NY, USA. ACM.
Boshmaf, Y., Muslukhov, I., Beznosov, K., and Ripeanu, M. (2011). The socialbot network: when bots socialize for fame and money. pages 93-102.
Breiman, L. (2001). Random forests. Machine Learning, 45(1):5-32.
Bridle, J. S. (1990). Probabilistic interpretation of feedforward classication network outputs, with relationships to statistical pattern recognition. In Soulie, F. F. and Herault,
J., editors, Neurocomputing, pages 227-236, Berlin, Heidelberg. Springer Berlin Heidelberg.
Chang, C.-C. and Lin, C.-J. (2011). Libsvm: A library for support vector machines. ACM Trans. Intell. Syst. Technol., 2(3):27:1{27-27}.
Chavoshi, N., Hamooni, H., and Mueen, A. (2016). Debot: Twitter bot detection via warped correlation. In Bonchi, F., Domingo-Ferrer, J., Baeza-Yates, R. A., Zhou, Z., and Wu, X., editors, IEEE 16th International Conference on Data Mining, ICDM 2016, December 12-15, 2016, Barcelona, Spain, pages 817-822. IEEE.
Chu, Z., Gianvecchio, S., Wang, H., and Jajodia, S. (2010). Who is tweeting on twitter: Human, bot, or cyborg? In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC '10, pages 21{30, New York, NY, USA. ACM.
Cresci, S., Pietro, R. D., Petrocchi, M., Spognardi, A., and Tesconi, M. (2017). The paradigm-shift of social spambots: Evidence, theories, and tools for the arms race.
CoRR, abs/1701.03017.
Dodds, P. S., Harris, K. D., Kloumann, I. M., Bliss, C. A., and Danforth, C. M. (2011).
Temporal patterns of happiness and information in a global social network: hedonometrics
and twitter. PLOS ONE, 6(12):e26752.
Fernandez-Delgado, M., Cernadas, E., Barro, S., and Amorim, D. (2014). Do we need
hundreds of classiers to solve real world classication problems? J. Mach. Learn.
Res., 15(1):3133{3181.
Ferrara, E., Varol, O., Davis, C., Menczer, F., and Flammini, A. (2016). The rise of
social bots. Commun. ACM, 59(7):96{104.
Haustein, S., Bowman, T. D., Holmberg, K., Tsou, A., Sugimoto, C. R., and Lariviere,
V. (2014). Tweets as impact indicators: Examining the implications of automated bot
accounts on twitter. CoRR, abs/1410.4139.
Hu, M. and Liu, B. (2004). Mining and summarizing customer reviews. In Proceedings
of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and
Data Mining, KDD '04, pages 168{177, New York, NY, USA. ACM.
Klenner, M. (2018). What does it mean to be a wutburger? a rst exploration. In Proceedings of 4th Workshop on Sentic Computing, Sentiment Analysis, Opinion Mining,
and Emotion Detection (EMSASW 2018) Co-located with the 15th Extended Semantic
Web Conference 2018 (ESWC 2018), pages 32{37.
Kolchyna, O., Souza, T. T. P., Treleaven, P. C., and Aste, T. (2015). Twitter sentiment analysis. CoRR, abs/1507.00955.
Lee, K., Eo, B. D., and Caverlee, J. (2011). Seven months with the devils: a long-term study of content polluters on twitter. In In AAAI Int'l Conference on Weblogs and Social Media (ICWSM.
Louppe, G. (2014). Understanding Random Forests: From Theory to Practice. ArXiv e-prints.
Nielsen, F. A. (2011). A new ANEW: evaluation of a word list for sentiment analysis in microblogs. CoRR, abs/1103.2903.
Popken, B. (2017). Twitter deleted 200,000 russian troll tweets.
https://www.nbcnews.com/tech/social-media/now-available-more-200-000-deletedrussian-
troll-tweets-n844731.
Singh, M., Bansal, D., and Sofat, S. (2016). A novel technique to characterize social
network users: Comparative study. In Proceedings of the 6th International Conference
on Communication and Network Security, ICCNS '16, pages 75{79, New York, NY,
USA. ACM.
Stringhini, G., Kruegel, C., and Vigna, G. (2010). Detecting spammers on social networks.
In Proceedings of the 26th Annual Computer Security Applications Conference,
ACSAC '10, pages 1{9, New York, NY, USA. ACM.
Tzelepis, C., Mezaris, V., and Patras, I. (2018). Linear maximum margin classier for
learning from uncertain data. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 40(12):2948{2962.
Wang, A. H. (2010). Don't follow me: Spam detection in twitter. 2010 International
Conference on Security and Cryptography (SECRYPT), pages 1{10.
Yang, C., Harkreader, R., and Gu, G. (2013). Empirical evaluation and new design for
ghting evolving twitter spammers. 8(8):1280{1293.
Full Text: pdf