Automatic Text Summarization Berdasarkan Pendekatan Statistika pada Dokumen Berbahasa Indonesia


Abstract
Abstract—Propelled by the modern technological innovations data and text will be more abundant throughout the year. With this much text, automatic text summarization is needed now more than ever to help summarize a text. Automatic text summarization is defined as the creation of a shortened version of a text by a computer program, the product of this procedure still contains the most important points of the original text. Statistical approaches is one of automatic text summarization method. There is 5 statistical approaches that being used namely aggregation similarity method, frequency method, location method, title method (if text has a title), dan tf-based query method (if text doesn’t have a title). Cosine similarity is used to calculate title method, aggregation similarity method, and tf- based query method. There is two type of validation, user validation and system validation. For system validation compare the similarity between human summary and summary generated by program, which result in accuracy of 76.7647% for summary with 30% length of the original journal. For user validation result in 82% accuracy. The conclusion based on user validation and system validation is statistical approaches is suitable for automatic text summarization.
Keywords: automatic text summarization, statistical approaches, Indonesian document, cosine similarity
Abstrak— Dengan kemajuan teknologi jumlah data dan teks akan semakin melimpah sepanjang tahun. Dengan banyaknya teks ini dibutuhkan bantuan automatic text summarization untuk merangkum teks tersebut. Automatic text summarization didefinisikan sebagai versi singkat dari suatu teks menggunakan program komputer yang hasilnya masih memiliki informasi penting berupa gagasan dasar dan kata atau kalimat yang dapat merepresentasikan keseluruhan teks original. Salah satu metode dalam automatic text summarization adalah pendekatan statistika. Pendekatan statistika yang digunakan ada 5 yaitu aggregation similarity method, frequency method, location method, title method (bila teks memiliki judul), dan tf-based query method (bila teks tidak memiliki judul). Cosine similarity dipakai untuk perhitungan title method, tf-based query method, dan aggregation similarity method. Validasi dilakukan dengan dua macam validasi. Pertama adalah validasi sistem dengan membandingkan similaritas antara rangkuman program dan rangkuman manusia, yang menghasilkan akurasi 76.7647% untuk rangkuman dengan panjang 30% dari jurnal original. Kedua adalah validasi user yang menghasilkan akurasi 81%. Kesimpulannya berdasarkan validasi user dan validasi sistem yang cukup baik maka pendekatan statistika cocok dipakai dalam kasus automatic text summarization.
Kata kunci: automatic text summarization, pendekatan statistika, cosine similarity, dokumen berbahasa Indonesia
Downloads
References
Darmawan, R., & Wahono, R. S. (2015). Hybrid Keyword Extraction Algorithm and Cosine Similarity for Improving Sentences Cohesion in Text Summarization. Journal of Intelligent Systems, 1(2), 109 – 114. Retrieved from http://journal.ilmukomputer.org/index.php/jis/article/view/44
Garbade, M. J. (2018, September 19). A Quick Introduction to Text Summarization in Machine Learning. Retrieved from https://towardsdatascience.com/a-quick-introduction-to-text-summarization-in-machine-learning-3d27ccf18a9f
Ko, Y., & Seo, J. (2008). An effective sentence-extraction technique using contextual information and statistical approaches for text summarization. Pattern Recognition Letters, 29(9), 1366 – 1371. DOI: 10.1016/j.patrec.2008.02.008
Kyoomarsi, F., Khosravi, H., Eslami, E., & Davoudi, M. (2010). Extraction-based text summarization using fuzzy analysis. Iranian Journal of Fuzzy System, 7(3), 15 – 32. DOI: 10.1007/978-3-540-79187-4_11
Manuel,J., & Moreno, T. (2014). Automatic Text Summarization. DOI:10.1002/9781119004752
Tardan, P. P., Erwin, E., Eng, K. I., & Muliady, W. (2013). Automatic Text Summarization Based on Semantic Analysis Approach for Documents in Indonesia Languange. 2013 International Conference on Information Technology and Electrical Engineering (ICITEE), 47 – 52. DOI: 10.1109/ICITEED.2013.6676209

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
- Articles published in Keluwih: Jurnal Sains dan Teknologi are licensed under a Creative Commons Attribution-ShareAlike 4.0 International license. You are free to copy, transform, or redistribute articles for any lawful purpose in any medium, provided you give appropriate credit to the original author(s) and the journal, link to the license, indicate if changes were made, and redistribute any derivative work under the same license.
- Copyright on articles is retained by the respective author(s), without restrictions. A non-exclusive license is granted to Keluwih: Jurnal Sains dan Teknologi to publish the article and identify itself as its original publisher, along with the commercial right to include the article in a hardcopy issue for sale to libraries and individuals.
- By publishing in Keluwih: Jurnal Sains dan Teknologi, authors grant any third party the right to use their article to the extent provided by the Creative Commons Attribution-ShareAlike 4.0 International license.