HATE SPEECH DETECTION PADA VIDEO MENGGUNAKAN METODE KNN DAN NAIVE BAYES

Christopher Kelvin Pintoro Kwan; Vincentius Riandaru Prasetyo; Fitri Dwi Kartikasari

Christopher Kelvin Pintoro Kwan Fakultas Teknik Universitas Surabaya, Raya Kalirungkut, Surabaya 60293
Vincentius Riandaru Prasetyo Fakultas Teknik Universitas Surabaya, Raya Kalirungkut, Surabaya 60293
Fitri Dwi Kartikasari Fakultas Teknik Universitas Surabaya, Raya Kalirungkut, Surabaya 60293

Abstract Views: 70 times

PDF Downloads: 28 times

Keywords: hate speech, machine learning, knn, naive bayes

Abstract

Abstract—Hate speech has had many negative impacts in Indonesia, such as riots, physical and verbal altercations, divisions in society, and many more. Social media is the place to spread hate speech most quickly. Not only through text posts, It is quite common to find hate speech in the form of videos. In this research, researchers will create a model that applies machine learning models to detect hate speech in videos, where currently most machine learning models are used to detect hate speech in text form only. In its application, the model will convert the input video into text using Google API. Then classification will be carried out using KNN to classify whether the video is hate speech or not, and Naive Bayes to classify the context of the video. In an unbalanced dataset, the classification results obtained for hate speech classification were 74% and for video context classification the accuracy was 45%. In a balanced dataset but overfitting occurs, the accuracy obtained in hate speech classification is 93% and in video context classification the accuracy is 55%. Based on the test results, it was found that the model used can have good accuracy if the dataset used is balanced between labels and there is no overfitting on the labels.

Keywords: Hate Speech, Machine Learning, KNN, Naive Bayes

Abstrak—Hate speech atau ujaran kebencian sudah memberikan banyak dampak yang negatif di Indonesia seperti kerusuhan, pertengkaran fisik maupun verbal, perpecahan di masyarakat, dan masih banyak lagi. Sosial media menjadi tempat untuk menyebarkan hate speech paling cepat. Tidak hanya melalui postingan teks, cukup sering juga ditemukan hate speech berbentuk video. Dalam penelitian ini, peneliti akan membuat model yang menerapkan model machine learning untuk mendeteksi adanya hate speech dalam video dimana saat ini kebanyakan model machine learning digunakan untuk mendeteksi hate speech dalam bentuk teks saja. Dalam penerapannya, model akan mengubah video yang diinput menjadi teks menggunakan Google API. Kemudian klasifikasi akan dilakukan menggunakan KNN untuk mengklasifikasikan apakah video hate speech atau bukan, dan naive bayes untuk mengklasifikasikan konteks dari video. Pada dataset yang tidak seimbang hasil klasifikasi yang didapatkan pada klasifikasi hate speech adalah 74% dan klasifikasi konteks video didapatkan akurasi sebesar 45%. Pada dataset yang seimbang namun terjadi overfitting akurasi yang didapatkan pada klasifikasi hate speech adalah 93% dan pada klasifikasi konteks video didapatkan akurasi 55%. Berdasarkan hasil uji coba didapatkan bahwa model yang digunakan dapat memiliki akurasi yang baik apabila dataset yang digunakan seimbang antar label dan tidak ada overfitting pada label.

Kata kunci: hate speech, machine learning, knn, naive bayes

Downloads

Download data is not yet available.

References

Daftar Referensi
Kemp,S. (2021,Februari 9). Digital 2023: INDONESIA. https://datareportal.com/reports/digital-2023-indonesia
G. Ambar Wulan, R. G. M. P. M. (2021). Pencegahan Kejahatan Ujaran Kebencian di Indonesia. Jurnal Ilmu Kepolisian, 14(3), 19. https://doi.org/10.35879/jik.v14i3.278
Kusuma, R. A. (2019). Dampak Perkembangan Teknologi Informasi dan Komunikasi terhadap Perilaku Intoleransi dan Antisosial di Indonesia. MAWA’IZH: JURNAL DAKWAH DAN PENGEMBANGAN SOSIAL KEMANUSIAAN, 10(2), 273–290. https://doi.org/10.32923/maw.v10i2.932
Lopez-Bernal, D., Balderas, D., Ponce, P., & Molina, A. (2021). Education 4.0: Teaching the basics of knn, lda and simple perceptron algorithms for binary classification problems. Future Internet, 13(8). https://doi.org/10.3390/fi13080193
Ray, S. (2019). Introduction to Machine Learning and Different types of Machine Learning Algorithms. In Proceedings of the International Conference on Machine Learning, Big Data, Cloud and Parallel Computing: Trends, Prespectives and Prospects, COMITCon 2019.
Sepima, A., Siregar, G., & Siregar, S. A. (2021). Penegakan Hukum Ujaran Kebencian di Republik Indonesia. In Jurnal Retentum: Vol. Vol 2 (Issue 1 Februari).
Zulkarnain, Z. (2020). UJARAN KEBENCIAN (HATE SPEECH) DI MASYARAKAT DALAM KAJIAN TEOLOGI. Studia Sosia Religia, 3(1). https://doi.org/10.51900/ssr.v3i1.7672
Hossain Junaid, M. I., Hossain, F., & Rahman, R. M. (2021). Bangla Hate Speech Detection in Videos Using Machine Learning. 2021 IEEE 12th Annual Ubiquitous Computing, Electronics and Mobile Communication Conference, UEMCON 2021, 347–351. https://doi.org/10.1109/UEMCON53757.2021.9666550
Sutejo, T. L., & Lestari, D. P. (2019). Indonesia Hate Speech Detection Using Deep Learning. Proceedings of the 2018 International Conference on Asian Language Processing, IALP 2018, 39–43. https://doi.org/10.1109/IALP.2018.8629154
Wu, C. S., & Bhandary, U. (2020). Detection of Hate Speech in Videos Using Machine Learning. Proceedings - 2020 International Conference on Computational Science and Computational Intelligence, CSCI 2020, 585–590. https://doi.org/10.1109/CSCI51800.2020.00104