Violence Detection in Videos: Bibliography

1 Jun 2024


(1) Praveen Tirupattur,  University of Central Florida.


[1] E. Acar, S. Spiegel, S. Albayrak, and D. Labor. Mediaeval 2011 affect task: Violent scene detection combining audio and visual features with svm. In MediaEval, 2011.

[2] R. Blake and M. Shiffrar. Perception of human motion. Annu. Rev. Psychol., 58: 47–73, 2007.

[3] Blog-FB. Facebook statistics, 2015. URL what-the-shift-to-video-means-for-creators/. Online: accessed 08-Jan2016.

[4] D. Borth, R. Ji, T. Chen, T. Breuel, and S.-F. Chang. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 21st ACM international conference on Multimedia, pages 223–232. ACM, 2013.

[5] G. Bradski. Opencv. Dr. Dobb’s Journal of Software Tools, 2000.

[6] B. J. Bushman and L. R. Huesmann. Short-term and long-term effects of violent media on aggression in children and adults. Archives of Pediatrics & Adolescent Medicine, 160(4):348–352, 2006.

[7] L.-H. Cai, L. Lu, A. Hanjalic, H.-J. Zhang, and L.-H. Cai. A flexible framework for key audio effects detection and auditory context inference. Audio, Speech, and Language Processing, IEEE Transactions on, 14(3):1026–1039, 2006.

[8] Y. Chan, R. Harvey, and D. Smith. Building systems to block pornography. In Challenge of Image Retrieval, BCS Electronic Workshops in Computing series, pages 34–40, 1999.

[9] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1–27:27, 2011. Software available at

[10] M.-y. Chen and A. Hauptmann. Mosift: Recognizing human actions in surveillance videos. 2009.

[11] W.-H. Cheng, W.-T. Chu, and J.-L. Wu. Semantic context detection based on hierarchical audio models. In Proceedings of the 5th ACM SIGMM international workshop on Multimedia information retrieval, pages 109–115. ACM, 2003.

[12] C. Clarin, J. Dionisio, M. Echavez, and P. Naval. Dove: Detection of movie violence using motion intensity analysis on skin and blood. PCSC, 6:150–156, 2005.

[13] T. J. Clarke, M. F. Bradshaw, D. T. Field, S. E. Hampson, D. Rose, et al. The perception of emotion from body movement in point-light displays of interpersonal dialogue. Perception-London, 34(10):1171–1180, 2005.

[14] A. Datta, M. Shah, and N. D. V. Lobo. Person-on-person violence detection in video data. In Pattern Recognition, 2002. Proceedings. 16th International Conference on, volume 1, pages 433–438. IEEE, 2002.

[15] C.-H. Demarty, C. Penet, G. Gravier, and M. Soleymani. The mediaeval 2011 affect task. 2010.

[16] C.-H. Demarty, C. Penet, G. Gravier, and M. Soleymani. A benchmarking campaign for the multimodal detection of violent scenes in movies. In Computer Vision–ECCV 2012. Workshops and Demonstrations, pages 416–425. Springer, 2012.

[17] C.-H. Demarty, B. Ionescu, Y.-G. Jiang, V. L. Quang, M. Schedl, and C. Penet. Benchmarking violent scenes detection in movies. In Content-Based Multimedia Indexing (CBMI), 2014 12th International Workshop on, pages 1–6. IEEE, 2014.

[18] C.-H. Demarty, C. Penet, B. Ionescu, G. Gravier, and M. Soleymani. Multimodal violence detection in hollywood movies: State-of-the-art and benchmarking. In Fusion in Computer Vision, pages 185–208. Springer, 2014.

[19] C.-H. Demarty, C. Penet, M. Soleymani, and G. Gravier. Vsd, a public dataset for the detection of violent scenes in movies: design, annotation, analysis and evaluation. Multimedia Tools and Applications, pages 1–26, 2014.

[20] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em algorithm. Journal of the royal statistical society. Series B (methodological), pages 1–38, 1977.

[21] O. Deniz, I. Serrano, G. Bueno, and T. Kim. Fast violence detection in video. In The 9th International Conference on Computer Vision Theory and Applications (VISAPP), 2014.

[22] F. Eyben and B. Schuller. opensmile:): the munich open-source large-scale multimedia feature extractor. ACM SIGMultimedia Records, 6(4):4–13, 2015.

[23] F. Eyben, F. Weninger, N. Lehment, B. Schuller, and G. Rigoll. Affective video retrieval: Violence detection in hollywood movies by large-scale segmental feature extraction. PloS one, 8(12):e78506, 2013.

[24] G. Farneb¨ack. Two-frame motion estimation based on polynomial expansion. In Image Analysis, pages 363–370. Springer, 2003.

[25] M. Flood. The harms of pornography exposure among children and young people. Child abuse review, 18(6):384–400, 2009.

[26] P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: a view from the edge. In Proceedings of the 7th ACM SIGCOMM conference on Internet measurement, pages 15–28. ACM, 2007.

[27] Y. Gong, W. Wang, S. Jiang, Q. Huang, and W. Gao. Detecting violent scenes in movies by auditory and visual cues. In Advances in Multimedia Information Processing-PCM 2008, pages 317–326. Springer, 2008.

[28] T. Hassner, Y. Itcher, and O. Kliper-Gross. Violent flows: Real-time detection of violent crowd behavior. In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pages 1–6. IEEE, 2012.

[29] S. Hidaka. Identifying kinematic cues for action style recognition. Cognitive Science Society, 2012.

[30] T. Hofmann. Unsupervised learning by probabilistic latent semantic analysis. Machine learning, 42(1-2):177–196, 2001.

[31] L. R. Huesmann and L. D. Eron. Television and the aggressive child: A crossnational comparison. Routledge, 2013.

[32] L. R. Huesmann and L. D. Taylor. The role of media violence in violent behavior. Annu. Rev. Public Health, 27:393–415, 2006.

[33] Y.-G. Jiang, Q. Dai, C. C. Tan, X. Xue, and C.-W. Ngo. The shanghai-hongkong team at mediaeval2012: Violent scene detection using trajectory-based features. In MediaEval, 2012.

[34] Y.-G. Jiang, Q. Dai, X. Xue, W. Liu, and C.-W. Ngo. Trajectory-based modeling of human actions with motion reference points. In Computer Vision–ECCV 2012, pages 425–438. Springer, 2012.

[35] M. J. Jones and J. M. Rehg. Statistical color models with application to skin detection. International Journal of Computer Vision, 46(1):81–96, 2002.

[36] V. Lam, D.-D. Le, S. Phan, S. Satoh, D. A. Duong, and T. D. Ngo. Evaluation of low-level features for detecting violent scenes in videos. In Soft Computing and Pattern Recognition (SoCPaR), 2013 International Conference of, pages 213–218. IEEE, 2013.

[37] I. Laptev. On space-time interest points. International Journal of Computer Vision, 64(2-3):107–123, 2005.

[38] J. Lin and W. Wang. Weakly-supervised violence detection in movies with audio and video based co-training. In Advances in Multimedia Information ProcessingPCM 2009, pages 930–935. Springer, 2009.

[39] D. G. Lowe. Distinctive image features from scale-invariant keypoints. International journal of computer vision, 60(2):91–110, 2004.

[40] K. J. Mitchell, D. Finkelhor, and J. Wolak. The exposure of youth to unwanted sexual material on the internet a national survey of risk, impact, and prevention. Youth & Society, 34(3):330–358, 2003.

[41] J. Nam, M. Alghoniemy, and A. H. Tewfik. Audio-visual content-based violent scene characterization. In Image Processing, 1998. ICIP 98. Proceedings. 1998 International Conference on, volume 1, pages 353–357. IEEE, 1998.

[42] E. B. Nievas, O. D. Suarez, G. B. Garc´ıa, and R. Sukthankar. Violence detection in video using computer vision techniques. In Computer Analysis of Images and Patterns, pages 332–339. Springer, 2011.

[43] D. OpticalFlow. Optical flow implementation, 2015. URL http: // tracking.html#calcopticalflowfarneback. Online: accessed 21-Oct-2015.

[44] C. Parker. An analysis of performance measures for binary classifiers. In Data Mining (ICDM), 2011 IEEE 11th International Conference on, pages 517–526. IEEE, 2011.

[45] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12:2825–2830, 2011.

[46] J. Platt et al. Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in large margin classifiers, 10(3):61–74, 1999.

[47] M. Pogrebnyak, D. Timoshenko, I. Burcev, and A. Kulinkin. Adult-content detection in video with the use of nvidia gpu. 2015.

[48] L. Richardson. Beautiful soup. Crummy: The Site, 2013. URL http://www.

[49] C. J. V. Rijsbergen. Information Retrieval. Butterworth-Heinemann, Newton, MA, USA, 2nd edition, 1979. ISBN 0408709294.

[50] M. Saerbeck and C. Bartneck. Perception of affect elicited by robot motion. In Proceedings of the 5th ACM/IEEE international conference on Human-robot interaction, pages 53–60. IEEE Press, 2010.

[51] M. Schedl, M. Sj¨oberg, I. Mironica, B. Ionescu, V. L. Quang, and Y.-G. Jiang. Vsd2014: A dataset for violent scenes detection in hollywood movies and web videos. Sixth Sense, 6(2.00):12–40.

[52] C. Schulze, D. Henter, D. Borth, and A. Dengel. Automatic detection of csa media by multi-modal feature fusion for law enforcement support. In Proceedings of International Conference on Multimedia Retrieval, page 353. ACM, 2014.

[53] M. Sokolova and G. Lapalme. A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45(4):427–437, 2009.

[54] G. Sparks. Media effects research: A basic overview. Cengage Learning, 2015.

[55] A. Tompkins. The psychological effects of violent media on children. AllPsych Journal, 14, 2003.

[56] M. Wesch. Youtube statistics, 2008. URL thoughts/youtube-statistics/. Online: accessed 08-Jan-2016.

[57] Wikipedia. Optical flow, 2015. URL flow. Online: accessed 21-Oct-2015.

This paper is available on arxiv under CC 4.0 license.