Systematic Review Protocol

This research study covers three types of SMMs, (i.e., false information, bots, and malicious campaigns). As mentioned in Section 2, we searched and selected papers following a systematic literature review (SLR) process. We form search words for the three respective SMM's to query separately from databases such as Scopus, IEEE Digital Library, ACM Library, and Google Scholar. In order to query papers from these databases, we combine keywords in four dimensions: purpose, type of SMM, method, and used platform.

First, we start by querying false information-related articles; we use the search terms with keywords such as
Similarly, to find articles for bot detection, we use search query terms such as Likewise, to find articles for malicious campaigns, we use the keywords such as Figure below shows the detail of used search terms with their synonyms.

Finally, we combined all three forms and retrieved the final set of papers.

Since SMM is a recent threat in the context of OSN, we kept the timeline for search queries from 2015 to 2022 to cover the past half-decade and the current work in SMM. In some places, we use papers before 2015 (publication year) to support a few definitions and to provide the necessary background. We first exclude papers that do not pertain to the chosen SMM factors or are not written in the English language. Next, we analyze papers by reading their title and abstract to filter out the most notable works.

Next, we proceed to our second-level reading process using a set of criteria. The criteria include looking into aspects such as the category of false information type (fake news, rumor, hoax, propaganda, phishing), as all are commonly used definitions for false information. Other criteria include the OSN platform (such as Twitter and Facebook), the method or technique used, and the most prevalent features in the detection process. The bot type categories (spambot, social bot, follower bot, scam bot), with other aforementioned criteria, define the analysis criteria for bot detection-related articles. Likewise, we analyze malicious campaign detection papers on the defined criteria. This results in 34 papers for analysis in false information detection, 35 in bot detection, 28 in malicious campaign detection, and 3 in all three forms combined.

Bot Percentage on Twitter

Different OSNs have different usability roles among users. For example, Twitter is considered one of the sources of information even for journalists. As discussed in Section 3 concerning bots, on Twitter particularly, we identified bot percentages discovered from multiple studies in their datasets, ranging from 5% to 25% as shown in Table below. We note that Botometer is a widely used tool for bot detection with a threshold of 0.5.

Reference Sample Size Method Threshold %age
[31] >1000 (mDAU) Twitter Inc. N/A 5% to 8.5%
[32] 1000 Botometer >= 0.5 6%
[33] 254,492 Multiple Tools >= 0.8 8%
[16] 2,725,269 Botometer >= 0.5 13.2%
[8] 14 million Botometer >= 0.43 9% to 15%
[34] 88 million Manual Rules >= 0.5 21%
[19] 10,000 Botometer >= 0.5 22%
[35] 1,476,700 Botometer >= 0.76 5% to 25%

OSN Manipulation Landscape

In support of the discussion presented in Section 3, we identified OSN manipulation coverage in different studies and presented it in Table below. The table highlights that the majority of the works do not consider other SMM elements, such as false information works do not focus on bots and vice versa. Moreover, works that focus on bot detection only extract sentiment features from the information posted by the user rather than verifying their falsity and providing a score. Lastly, we did not find a sufficient technical paper on malicious campaign detection to put in the below table. The gap between different elements of SMM realizes that more interdisciplinary research against SMM is required. Works that focus on false information detection must also look into accounts propagating false information and their strategies of campaigns. In the table below, papers on false information are represented in light cyan color and bot detection by light purple.

Reference [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30]
Information Reporting
Error
Missing
Biased
Event
Random
Predictive
Time-critical
Intent
Misinformation
Disinformation
Malinformation
Claim-Type
Rumor
Propaganda
Fake News
Spam
Online Hate
Content Context
Satire
False Connection
Misleading Content
False context
Manipulated content
Fabricated content
User Belief
Confirmation Bias
Naive Realism
Homeostatis
Homophily
Account Type
Human Account
Bot Account
Cyborg Account
Bot Type
Social Bot
Follower Bot
Doppleganger Bot
Political Bot
Finance Bot
Spam Bot
Campaing Promotion
Organic
Inorganic
Campaign Controller
Single-based
Group-based

False and Real Information Diffusion

Section 4.1.1 delves into the characteristics of the diffusion dynamics associated with false information. While, in this section, we present our analysis of the cumulative distribution of false (or fake) and real information (or news) spread regarding tweets, retweets, and replies. Figure below depicts the normalized cumulative count and distribution of these categories for both real and fake news. Notably, we observe that the fake information tends to generate a notable, sudden surge of retweets and tweets, surpassing the response garnered by the real information.

Snow

(a)

Forest

(b)

Mountains

(c)

Mountains

(d)

Caption: Real vs. Fake news' temporal normalized cumulative count and normalized distribution spread regarding tweets, retweets, and replies for Politics Category. Here, t1-t10 in real news refers to the time duration between 2018-03 to 2018-12, and t1-t8 in fake news refers to the time duration between 2016-07 to 2017-09.

False information characterization

In Section 4.2 false information works are discussed with a literature review as shown in Table 4. This appendix gives information on the characterization of literary work that is done using different collection methods, platforms, dataset sizes or languages, and modalities. To elaborate more in detail, the following characterization is presented:

  • Data Collection Method: Researchers have collected and annotated datasets manually (⚫) or gathered from existing data repositories () or a combination of both ().
  • Platform: Name of the platform for which dataset is used.
  • Data Collection Tier: It involves the level of data collection. For instance, Tier 0 (◯ ◯ ◯): a) Only tweet text content or b) Tweet text and image, or c) News text content only, d) News text and image; Tier 1 (⚫◯ ◯): a) Tweet + account information or b) News information + meta-data (speaker, context, party affiliations), c) Images data only + meta-data, or d) Image + text in image + meta-data; Tier 2 (⚫⚫◯): Account information + timeline data; Tier 3 (⚫⚫⚫): Account information + Timeline data + friends timeline
  • Size: Count of the dataset size.
  • Classes: Number of class instance and labels. Binary involves two classes whereas multi-classification problems consist of three or more classes.
  • Availability: Whether code is available (✩) or data is available () or both (✪). Generally, due to ethics and privacy concerns, multiple researchers do not outsource datasets).
  • Feature and Model Selection (FS + Model): How features were extracted and model was build.
  • Popularity: Code popularity on Github with respect to repository stars. This demonstrates that the work is useful and influential.
  • Duration: This refers to temporal information when the dataset was collected.
  • Domain: Category of false information such as Politics, entertainment, etc.
  • Language: Dominant language used in dataset on which detection is conducted.
  • Modality: Type of content i.e., Unimodal (✎, text-based or image-based) or multimodal (📷, text and image-based).

Bot Feature Overview

Both inferential and descriptive approaches use various features in their model. We find five dimensions of feature groups for every account: user profile-based, content-based, temporal-based, devices-based, and network-based features. However, we have combined the device features with the user and temporal features with the network features, as one can be derived from the other.

  1. User Features:User features include meta-data features such as the number of tweets, number of followers/following, account creation date, verified status, and a few others, like short bio descriptions. These features are simple and range from numerical and binary to textual. As such, Beskow et al. [1] used 6 user features, Hayawi et al. [2] used 12 features, Yang et al. used 8 direct metadata user features and 12 derived features, Kudugunta et al. [3] used 10 account-level features (such as status count, list count) and 6 tweet-level features (such as the number of hashtags, URLs) to distinguish bots. On another end, Botometer tool [4] uses up to 1200 features to determine bot likelihood.

    Furthermore, these features can be trained with a supervised classifier like a Random forest model. We mentioned Random Forest, which has been widely adopted and accepted in several user featurebased bot detection researches [5]. Since “Random forest model can learn nonlinear decision boundaries, can handle a large amount of dataset and unnormalized features for training" [6]. However, due to the simplicity of the features, such models face the issue of feature tampering by adversaries. For example, maintaining a balance between followers and the following level takes less effort by a malicious user.

  2. Network Features:Twitter conversations are a combination of tweets and retweets. Ross and Andrew [7] analyzed that retweet networks can provide engagement activity between bots and humans and intra-group interactions. Less than 8% accounts were bot among the total unique accounts. However, more than 20% were bots in the top 100 and top-25 out-degree centrality rankings. This implies the persistent activity of bots to engage with humans irrespective of being small in population size.

    Moving ahead, Varol et al. [8] uses various network features such as network density, clustering coefficient, retweet network, and instrength and out-strength (weighted degree) distribution. Such features help anomaly detection and provide signals for looking further into dense networks of followers and friends of suspicious accounts. Other recent works focus on graph-based techniques using network features, such as node centrality [9] or Graph neural networks [10]. However, graph-based techniques and network features are computationally expensive in terms of collection time. Collecting large-scale datasets from Twitter is a technical hindrance due to Twitter rate limit issues [11], [12]. According to Beskow and Carley [1], it takes about 20 hours to collect network information of 250 accounts. Thus, most of the work remains till Tier 2 of data collection, which consists of user and timeline information.

  3. Takeaway A1: Network features are paramount for detection, though researchers avoid it due to data collection time and resource constraints.

  4. Content and Language Features:Bot accounts post malicious URLs, misinformation content, and malware on OSN [13]. Thus, content information brings many contexts to detect bots. Even though different bots post on different topics, tweet-based bot detection uses similar features such as tweet text, tweet length, number of hashtags, URLs, mentions used, is_possibly_sensitive, similarity between tweets, the sentiment of the tweet, originality along with the user-metadata [14]. On another end, Heidari and Jones proposed another solution [15] to use sentiment features from content to detect bots, as bots purposely may be skewed to a position on a topic. However, we noticed that most works focus on user metadata and neglect the content information for social bot detection.

    Research Gap A1: We note that limited work leverage features from content posted by bots. This may be an important feature to differentiate between benign and malicious bots.

Case Study: Analysis of coordinated Crypto-Scams (Example of Trigger Bots)

As mentioned in Section 5.1.2, researchers often use the descriptive approach where clusters help to analyze the activity of accounts. Moreover, the descriptive approach also helps to verify claim about certain behavior. One such activity our team addresses in this appendix section is related to Trigger bots, a new family of spambots on Twitter. To aid with a piece of background knowledge, a frequent assumption is that humans are inefficient in detecting social spambots. Despite this, on Twitter, our team observed a few genuine users claiming they found bots. We further looked in a few other conversations and found genuine accounts have expressed their concern about bots that flock to the comment section when a certain magic words are used on Twitter. We call such bots trigger bots. For example, one of the users wrote the word ''Metamask'' (a famous cryptocurrency wallet) and expressed that many bots replied after seeing the word ''Metamask'' due to their automated crawling behavior. Interestingly, few bots replied by sharing a malicious phishing link.

Following the analysis of user observations, a viability test was conducted. First, we tweeted by using the word ''Metamask'' in a sentence and expressed that we needed help with our wallet to provide some auxiliary context in the tweet. Within the first 10 hours, nine replies were received, and upon checking the Botometer score of each profile, it was found that most of them had a score of more than 4 on a scale of 0-5, indicating a strong possibility of a bot-like account. In the second test, a tweet containing only the word ''Metamask'' was posted without any additional context. The first reply was received in under nine seconds, and two more were received in less than seven minutes. One of the replies suggested we contact a phishing email address for account recovery, as they had experienced a similar issue. Given the lack of context in the tweet, it was hypothesized that bots were responding using a crawling method triggered by the magic words. (See Table below for more potential trigger words.)

Next, we aimed to explore the trigger bot network since these accounts had an image in their profiles and few followers, mimicking real profiles. For this, we followed two steps. First, we selected a trigger bot seed account. Second, we build a network of the seed bot's followers and their respective friends (followings). For visualization, we used the Gephi tool. As shown in Figure (a), the dense pathogenic network in orange seems to follow the genuine users (users whom Twitter verified) seen in blue. Then, we filtered out the genuine users and used the Gephi visualization tool (with the Yifan Hu model) to dive deep into this dense network. We find it interesting to see that it resulted in two concentric circles with varied characteristics. First, the accounts in the inner circle are explicitly tasked to follow the accounts in the outer circle, making the outer accounts look popular and credible. We believe this is why every account in the inner circle had zero incoming links and 190 outgoing links. For the accounts in the outer circle, a few had 124 incoming links, and others had 125 incoming links, as shown in Figure (b).

To give context, we found that only the outer profiles were used for malicious activities, such as posting phishing URLs and carrying crypto scams. These accounts were more persistent, visible, and tweeted with high responsiveness. Then, we use the Botometer service to identify all the bot accounts in our two circles. We note that Botometer could only find bots from the outer circle. The accounts in the inner circle never posted anything, producing no timeline activity for detection. The bot accounts are represented in red in Figure (c). As of 6th July 2022, we see only three accounts suspended by Twitter, as shown in yellow in Figure (c). Whereas a month later, Twitter discovered and suspended most bot accounts from the outer circle, as shown in Figure (d). This corroborates our findings about the crypto-scams trigger bots. However, the inner circle accounts were still active since they did not post anything. These accounts could be used for any other purposes as well. This highlights the need for a relationship between bots and the malicious activities they conduct to detect accurately.

Takeaway A2: We explored a new type of bots, named Trigger bots whose behavior gets triggered on certain magic words.

Snow

(a)

Forest

(b)

Mountains

(c)

Mountains

(d)

Caption: Visualization of crypto-scams (trigger) bots as depicted by (a) dense network of bots in orange color interacting with genuine users shown in blue color. On examining the orange network of trigger bots, we identified two concentric circles of accounts exhibiting two different roles--inner circle bots only tasked to follow outer circle bots and outer circle bots responsible for posting scam messages. These bots have a similar (in and out) degree metric as depicted in (b); in this network of bots, (c) presents different accounts shown as humans, bots, suspended using Botometer tool as of 06/07/2022; and (d) different accounts shown as humans, bots, suspended using Botometer tool as of 30/08/2022. We used two different dates to show that many of the accounts from the outer circle are now suspended, but none of the accounts in the inner circle are suspended due to no timeline post activity.

Sentence Potential Trigger Word
got hacked on Instagram hacked
need someone to help me write this essay essay
trust wallet trust wallet
my account got disabled account disabled
need logo school logo
need design design
need gfx gfx
buy followers buy followers
bot cashapp venmo cashapp, venmo
i need a sugar daddy, sugar daddy cheated sugar daddy
need this on a tshirt tshirt
password managers password managers

Bot detection and Feature manipulation

This appendix section describes the different considerations for features used in bot detection. As already discussed in sub-section 5.2.2, bot developers leverage the advantage of adversarial attacks on bot detectors. As shown in Table below, most of the works use standard features instead of robust features. Hence, the bot detector remains vulnerable to adversarial attacks.

Work Type Reference
Bot detection with Standard feature [8, 36, 37, 38, 39, 40]
Bot detection with Robust feature [41, 42]
Bot Detection with Adversarial training, on Standard Feature [43, 44, 45, 46]

References

  1. Beskow, D. M., & Carley, K. M. (2018, July). Bot-hunter: a tiered approach to detecting & characterizing automated activity on twitter. In Conference paper. SBP-BRiMS: International conference on social computing, behavioral-cultural modeling and prediction and behavior representation in modeling and simulation (Vol. 3, No. 3).
  2. Hayawi, K., Mathew, S., Venugopal, N., Masud, M. M., & Ho, P. H. (2022). DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data. Social Network Analysis and Mining, 12(1), 43.
  3. Kudugunta, S., & Ferrara, E. (2018). Deep neural networks for bot detection. Information Sciences, 467, 312-322.
  4. Davis, C. A., Varol, O., Ferrara, E., Flammini, A., & Menczer, F. (2016, April). Botornot: A system to evaluate social bots. In Proceedings of the 25th international conference companion on world wide web (pp. 273-274).
  5. Latah, M. (2020). Detection of malicious social bots: A survey and a refined taxonomy. Expert Systems with Applications, 151, 113383.
  6. Vargas, L., Emami, P., & Traynor, P. (2020, November). On the detection of disinformation campaign activity with network analysis. In Proceedings of the 2020 ACM SIGSAC Conference on Cloud Computing Security Workshop (pp. 133-146).s
  7. Schuchard, R. J., & Crooks, A. T. (2021). Insights into elections: An ensemble bot detection coverage framework applied to the 2018 US midterm elections. Plos one, 16(1), e0244309.
  8. Varol, O., Ferrara, E., Davis, C., Menczer, F., & Flammini, A. (2017, May). Online human-bot interactions: Detection, estimation, and characterization. In Proceedings of the international AAAI conference on web and social media (Vol. 11, No. 1, pp. 280-289).
  9. Dehghan, A., Siuta, K., Skorupka, A., Dubey, A., Betlen, A., Miller, D., ... & Pralat, P. (2022). Detecting Bots in Social-Networks Using Node and Structural Embeddings.
  10. Feng, S., Tan, Z., Wan, H., Wang, N., Chen, Z., Zhang, B., ... & Luo, M. (2022). TwiBot-22: Towards graph-based Twitter bot detection. arXiv preprint arXiv:2206.04564.
  11. Wright, J., & Anise, O. (2018). Don’t@ me: Hunting twitter bots at scale. Blackhat USA.
  12. Twitter Inc., “Rate Limits,” https://developer.twitter.com/en/docs/rate-limits, 2023, [Online; accessed 19-Mar-2023].
  13. Derhab, A., Alawwad, R., Dehwah, K., Tariq, N., Khan, F. A., & Al-Muhtadi, J. (2021). Tweet-based bot detection using big data analytics. IEEE Access, 9, 65988-66005.
  14. Morstatter, F., Wu, L., Nazer, T. H., Carley, K. M., & Liu, H. (2016, August). A new approach to bot detection: striking the balance between precision and recall. In 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) (pp. 533-540). IEEE.
  15. Heidari, M., & Jones, J. H. (2020, October). Using bert to extract topic-independent sentiment features for social media bot detection. In 2020 11th IEEE Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON) (pp. 0542-0547). IEEE.
  16. Vosoughi, S., Roy, D., & Aral, S. (2018). The spread of true and false news online. science, 359(6380), 1146-1151.
  17. Inwood, O., & Zappavigna, M. (2022). A Systemic Functional Linguistics Approach to Analyzing White Supremacist and Conspiratorial Discourse on YouTube. The Communication Review, 25(3-4), 204-234.
  18. Nakamura, K., Levy, S., & Wang, W. Y. (2019). r/fakeddit: A new multimodal benchmark dataset for fine-grained fake news detection. arXiv preprint arXiv:1911.03854.
  19. Shu, K., Mahudeswaran, D., Wang, S., Lee, D., & Liu, H. (2020). Fakenewsnet: A data repository with news content, social context, and spatiotemporal information for studying fake news on social media. Big data, 8(3), 171-188.
  20. Stein, J., Frey, V., & van de Rijt, A. (2023). Realtime user ratings as a strategy for combatting misinformation: an experimental study. Scientific reports, 13(1), 1626.
  21. Wang, W. Y. (2017). " liar, liar pants on fire": A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648.
  22. Huh, M., Liu, A., Owens, A., & Efros, A. A. (2018). Fighting fake news: Image splice detection via learned self-consistency. In Proceedings of the European conference on computer vision (ECCV) (pp. 101-117).
  23. Dou, Y., Shu, K., Xia, C., Yu, P. S., & Sun, L. (2021, July). User preference-aware fake news detection. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 2051-2055).
  24. Wang, Y., Ma, F., Jin, Z., Yuan, Y., Xun, G., Jha, K., ... & Gao, J. (2018, July). Eann: Event adversarial neural networks for multi-modal fake news detection. In Proceedings of the 24th acm sigkdd international conference on knowledge discovery & data mining (pp. 849-857).
  25. Z. A. Estela, “Are you fake news,” https://github.com/N2ITN/are-you-fake-news, 2022, [Online; accessed 18-July-2022].
  26. Hayawi, K., Mathew, S., Venugopal, N., Masud, M. M., & Ho, P. H. (2022). DeeProBot: a hybrid deep neural network model for social bot detection based on user profile data. Social Network Analysis and Mining, 12(1), 43.
  27. Subrahmanian, V. S., Azaria, A., Durst, S., Kagan, V., Galstyan, A., Lerman, K., ... & Menczer, F. (2016). The DARPA Twitter bot challenge. Computer, 49(6), 38-46.
  28. Abu-El-Rub, N., & Mueen, A. (2019, May). Botcamp: Bot-driven interactions in social campaigns. In The world wide web conference (pp. 2529-2535).
  29. Kudugunta, S., & Ferrara, E. (2018). Deep neural networks for bot detection. Information Sciences, 467, 312-322.
  30. Yang, K. C., Varol, O., Hui, P. M., & Menczer, F. (2020, April). Scalable and generalizable social bot detection through data selection. In Proceedings of the AAAI conference on artificial intelligence (Vol. 34, No. 01, pp. 1096-1103).
  31. U. Security and E. C. T. Report, “US Security and Exchange Commission Twitter Report,” https://www.sec.gov/Archives/edgar/data/141809 1/000156459014003474/twtr-10q_20140630.htm?_ga=1.106844928. 2072504916.1401902059, 2014, [Online; accessed 23-Feb-2023].
  32. Shao, C., Ciampaglia, G. L., Varol, O., Yang, K. C., Flammini, A., & Menczer, F. (2018). The spread of low-credibility content by social bots. Nature communications, 9(1), 1-9.
  33. Schuchard, R. J., & Crooks, A. T. (2021). Insights into elections: An ensemble bot detection coverage framework applied to the 2018 US midterm elections. Plos one, 16(1), e0244309.
  34. Wright, J., & Anise, O. (2018). Don’t@ me: Hunting twitter bots at scale. Blackhat USA.
  35. Keller, T. R., & Klinger, U. (2019). Social bots in election campaigns: Theoretical, empirical, and methodological implications. Political Communication, 36(1), 171-189.
  36. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., & Tesconi, M. (2016). DNA-inspired online behavioral modeling and its application to spambot detection. IEEE Intelligent Systems, 31(5), 58-64.
  37. Cresci, S., Di Pietro, R., Petrocchi, M., Spognardi, A., & Tesconi, M. (2017). Social fingerprinting: detection of spambot groups through DNA-inspired behavioral modeling. IEEE Transactions on Dependable and Secure Computing, 15(4), 561-576.
  38. Mbona, I., & Eloff, J. H. (2023). Classifying social media bots as malicious or benign using semi-supervised machine learning. Journal of Cybersecurity, 9(1), tyac015.
  39. Sayyadiharikandeh, M., Varol, O., Yang, K. C., Flammini, A., & Menczer, F. (2020, October). Detection of novel social bots by ensembles of specialized classifiers. In Proceedings of the 29th ACM international conference on information & knowledge management (pp. 2725-2732).
  40. Van Der Walt, E., & Eloff, J. (2018). Using machine learning to detect fake identities: bots vs humans. IEEE access, 6, 6540-6549.
  41. Rout, R. R., Lingam, G., & Somayajulu, D. V. (2020). Detection of malicious social bots using learning automata with url features in twitter network. IEEE Transactions on Computational Social Systems, 7(4), 1004-1018.
  42. Xu, T., Goossen, G., Cevahir, H. K., Khodeir, S., Jin, Y., Li, F., ... & Pearce, P. (2021). Deep entity classification: Abusive account detection for online social networks. In 30th {USENIX} Security Symposium ({USENIX} Security 21).
  43. Cresci, S., Petrocchi, M., Spognardi, A., & Tognazzi, S. (2019, June). Better safe than sorry: an adversarial approach to improve social bot detection. In Proceedings of the 10th ACM Conference on Web Science (pp. 47-56).
  44. Cresci, S., Petrocchi, M., Spognardi, A., & Tognazzi, S. (2021). The coming age of adversarial social bot detection. First Monday.
  45. Le, T., Tran-Thanh, L., & Lee, D. (2022, April). Socialbots on fire: Modeling adversarial behaviors of socialbots via multi-agent hierarchical reinforcement learning. In Proceedings of the ACM Web Conference 2022 (pp. 545-554).
  46. Shao, H., Yao, S., Jing, A., Liu, S., Liu, D., Wang, T., ... & Abdelzaher, T. (2020, August). Misinformation detection and adversarial attack cost analysis in directional social networks. In 2020 29th International Conference on Computer Communications and Networks (ICCCN) (pp. 1-11). IEEE.