Combination of TF-IDF and Rabin-Karp for Detecting Document Similarity in Student Thesis Abstracts
DOI:
https://doi.org/10.53513/jsk.v8i1.10611Keywords:
abstact thesis, plagiarism, TF-IDF, Rabin-KarpAbstract
Final semester students are required to complete a final project in the form of research relevant to their respective fields of study, to find innovative solutions, and to develop critical thinking skills. However, plagiarism is a common problem that often arises. Plagiarism is defined as the act of taking someone else's work, including opinions, and claiming it as one's own. Therefore, technology can be used to detect similarities in the abstracts of student manuscripts submitted during thesis title submissions, allowing for early detection of plagiarism. The corpus used was taken from the directory of final projects from the Computer Engineering Study Program, consisting of 98 data points, and from the Civil Engineering Study Program, consisting of 40 data points. In this study, utilizing the TF-IDF and Rabin-Karp algorithms, it was found that TF-IDF is capable of detecting the importance of a word in a document relative to the entire corpus. Rabin-Karp has also proven effective in detecting matching patterns in several corpuses, with a known pattern matching accuracy of 70%.References
T. e. a. Nurhaeni, “Sistem Penilaian Sidang Komprehensif Tugas Akhir Skripsi dan Tesis Berbasis Yii Framework Menggunakan Business Intelligence Methodology,” Technomedia J., vol. 5, no. 1, pp. 82–94, 2021.
Carlos Lage-Gomez, “On the interrelationships between diverse creativities in primary education STEAM projects,” Think. Ski. Creat., vol. 51, 2024.
Syaharuddin, “ENELUSURAN REFERENSI BERBASIS DIGITAL SEBAGAI PENINGKATAN SOFT SKILL MAHASISWA DALAM MENYELESAIKAN TUGAS AKHIR,” J. Pengabdi. Masy. Berkemajuan 3.2, vol. 3, no. 2, pp. 151–155, 2021.
M. Zaeni, “Urgensi penelitian pengembangan dalam menggali keterampilan berpikir kritis,” 2021.
R. N. Sari, “Data Mining Peminatan Mata Kuliah Pilihan Mahasiswa Tingkat Akhir Jurusan Informatika Menerapkan Algoritma C4. 5,” Bull. Comput. Sci. Res. 3.3, vol. 3, no. 3, pp. 263–269, 2023.
A. Sanders, “The Implementation of Data Mining to Get The Pattern for Selecting Students’ Thesis Title,” J. Komputer, Inf. dan Teknol., vol. 1, no. 2, pp. 165–173, 2021.
Kaile Chen, “Process mining and data mining applications in the domain of chronic,” Artif. Intell. Med., vol. 35, no. 10, 2023.
et al Olson, David L., Descriptive data mining. Singapore: Springer, 2019.
Meidelfi, “TF-IDF Implementation for Similarity Checker on The Final Project Title,” Int. J. Adv. Sci. Comput. Eng., vol. 3, no. 1, pp. 40–52, 2021.
M. Naf’an, “Penerapan Cosine Similarity dan Pembobotan TF-IDF untuk Mendeteksi Kemiripan Dokumen,” J. Linguist. Komputasional, vol. 2, no. 1, pp. 23–27, 2019.
Z. Zhu, “Hot topic detection based on a refined TF-IDF algorithm,” IEEE access, vol. 7, pp. 26996–27007, 2019.
I Billhaqqi, “Comparison analysis of Rabin-Karp and Winnowing algorithms in automated essay answer assessment system,” 2022.
A. D. Hartanto, “Best parameter selection of rabin-Karp algorithm in detecting document similarity,” 2019.
M. A. Yulianto, “The hybrid of jaro-winkler and rabin-karp algorithm in detecting Indonesian text similarity,” J. Online Inform., vol. 6, no. 1, pp. 88–95, 2021.
Downloads
Published
Issue
Section
License
This work is licensed under a Creative Commons Attribution 4.0 International License.
Nama dan alamat email yang masuk ke situs jurnal ini akan digunakan secara eksklusif untuk tujuan jurnal ini dan tidak akan digunakan untuk tujuan dan pihak lain.