Predicting content going viral in social networks is attractive for applications such as viral marketing, advertisement and entertainment, but is still a challenge in the big data era today. Previous works mainly focus on predicting the possible popularity of content, rather than the time at which such popularity is reached. These works proposed a novel yet practical iterative algorithm to predict the virality timing, in which the correlation between the timing and content popularity growth is captured by users' sharing activities. Such data is not just able to correlate the dynamics and associated timings in social cascades of viral content, but is also useful in self-correcting the prediction in each iterative process. 

[1] investigates how cascade dynamics can be incorporated to predict virality timing of a single cascade for a given viral target. The required information are the infection duration (time taken for a node to be infected) and cascade growth, which are commonly available on many social networks nowadays. [2] extends it by considering multiple cascades associated to a single piece of content and investigating the use of community information as an upper bound to improve the prediction accuracy. [3] looks at cascades’ rates of growth and formation to achieve accurate prediction when the number of infecting nodes cannot be accurately estimated.

Data Collection

User activities for online contents of interest are collected with scrapers run in Java. Social network APIs are used to obtain information necessary for prediction – users who post the content, and those who repost, comment on, share, or like, along with the time for each of these actions.

social network description
Twitter 1033344 tweets on various events, e.g., Chinese New Year, Bali 9 execution, and death of Mr. Lee Kuan Yew.

Other datasets scraped by different authors from Digg and Twitter have also been used in this work.
K. Lerman and R. Ghosh, ”Information Contagion: An Empirical Study of the Spread of News on Digg and Twitter Social Networks,” in Proc. of the 4rd ACM International Conference on Web Search and Data Mining, 2010, pp. 90-97.
R. A. Baos, et al., ”Diffusion dynamics with changing network composition,” Entropy 15(11), 2013, 4553-4568.

Data Analytics

User activities for a piece of content of interest are used to predict when it will reach a desired number of audience. This part is processed with a Matlab-based program. The modularity maximization approach is adopted in the community detection module.


[1] M. Cheung, J. She, and L. Cao, "Predicting the content virality in social cascade," in Proc. of the IEEE International Conference on Cyber, Physical and Social Computing, Aug. 2013, pp. 970-975.

[2] A. Junus, M. Cheung, J. She, and Z. Jie, "Community-aware prediction of virality timing using big data of social cascades," in Proc. of the 1st IEEE International Conference on Big Data Services and Applications, Apr. 2015, pp. 487-492.

[3] A. Junus and J. She, “Outbreak time prediction using social cascades,” in The 8th IEEE International Conference on Social Computing and Networking (SocialCom 2015). (accepted)