Penerapan Metode N-Gram dan Cosine Similarity Dalam Pencarian Pada Repositori Artikel Jurnal Publikasi

Indra Gita Anugrah

doi:10.47065/bits.v3i3.1058

Abstract

Digital repository is one source of data in human information needs, especially in an organization. In a digital repository, various digital documents are stored that can be used by users, for example, a publication journal repository. Every day the published articles in the repository grow in the hundreds or even thousands in number, besides publication journals usually consist of various formats and languages. This will cause the search of results relatively low level of relevance. To optimize search results today, the application of an information retrieval system in a repository is important. Preprocessing is one of the most important stages of the development of a retrieval system, especially in the process of selecting a stemming algorithm to generate basic words (terms) which will later be used in determining the level of similarity between queries and documents in a search process. N-Gram is a method of character decomposition from a string that can be used to analyze words or sentences which are words or sentences from what language will later affect the determination of the stemming algorithm. Cosine Similarity is a method to determine the level of similarity, which will calculate the angle that represents the query vector and the document vector. In this study, a repository will be built that implements retrieval systems using N-Gram and Cosine Similarity, then the system performance will be calculated where the average total accuracy for Indonesian-language queries and English-language queries is 0.967, precision is 0.851 while the average recall is obtained. 0.869.

Highlights

Perkembangan Teknologi Informasi dan Komunikasi memberikan dampak yang besar dalam kehidupan manusia, salah satunya adalah penggunaan koleksi dokumen secara digital
that can be used by users
Every day the published articles in the repository grow in the hundreds

Summary

PENDAHULUAN

Perkembangan Teknologi Informasi dan Komunikasi memberikan dampak yang besar dalam kehidupan manusia, salah satunya adalah penggunaan koleksi dokumen secara digital. Repositori merupakan contoh koleksi dokumen digital yang berisi berbagai macam koleksi, salah satunya adalah kumpulan artikel penelitian dan jurnal. Untuk mengoptimalkan pencarian diperlukan sebuah mekanisme mengukur kemiripan anatara query yang diinputkan dengan koleksi dokumen dalam repositori. Metode Cosine Similarity merupakan salah satu metode yang digunakan untuk mengukur kemiripan dokumen dengan membandingkan antara vector query dengan vector dokumen, dimana sebelumnya query dan dokumen direpresentasikan dalam model ruang vector (Vektor Space Model). Pada penelitian ini akan dilakukan penerapan algoritma N-Gram yang digunakan untuk mendeteksi bahasa dan metode Cosine Similarity dalam mengukur kemiripan antara query dan dokumen yang direpresentasikan dalam Vector Space Model (VSM) [7]. Kumpulan dokumen yang digunakan merupakan kumpulan artikel jurnal dan nantinya akan dicari kemiripan dengan query yang diinputkan pada proses pencarian suatu repositori sehingga dapat digunakan sebagai referensi dalam penulisan artikel jurnal. Tujuan dari implementasi N-Gram dan Cosine Similarity diharapkan dapat meningkatkan kinerja dari pencarian pada repositori digital

METODOLOGI PENELITIAN

Pendeteksian Bahasa

Dokumen Preprocessing

Pembobotan Term

Cosine Similarity

Confusion Matrix

HASIL DAN PEMBAHASAN

KESIMPULAN

Full Text

Published version (

Free)

Open DOI Link

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

R Discovery Prime

R Discovery Prime

Penerapan Metode N-Gram dan Cosine Similarity Dalam Pencarian Pada Repositori Artikel Jurnal Publikasi

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Building of Informatics, Technology and Science (BITS)

Lead the way for us

Journal: Building of Informatics, Technology and Science (BITS)	Publication Date: Dec 31, 2021
License type: CC BY 4.0

Similar Papers

Hybrid Spelling Correction and Query Expansion for Relevance Document Searching
Dewi Soyusiawaty ... Denny Hilmawan Rahmatullah Wolley
International Journal of Advanced Computer Science and Applications | VOL. 12
Dewi Soyusiawaty, et. al.Dewi Soyusiawaty ... Denny Hilmawan Rahmatullah Wolley
01 Jan 2020
International Journal of Advanced Computer Science and Applications | VOL. 12

Content Based Video Retrieval—Methods, Techniques and Applications
Ashish Khare ... P Palanisamy
-
Ashish Khare, et. al.Ashish Khare ... P Palanisamy
01 Jan 2020
01 Jan 2020

Image retrieval: Benchmarking visual information indexing and retrieval systems
Abebe Rorissa
Bulletin of the American Society for Information Science and Technology | VOL. 33
Abebe RorissaAbebe Rorissa
01 Feb 2007
Bulletin of the American Society for Information Science and Technology | VOL. 33

KCReqRec: A Knowledge Centric Approach for Semantically Inclined Requirement Recommendation with Micro Requirement Mapping Using Hybrid Learning Models
Gerard Deepak ... Vihaan Nama
-
Gerard Deepak, et. al.Gerard Deepak ... Vihaan Nama
01 Jan 2023
01 Jan 2023

Editage

Paperpal

R Discovery

Mind the Graph

R Discovery Prime

R Discovery Prime

Penerapan Metode N-Gram dan Cosine Similarity Dalam Pencarian Pada Repositori Artikel Jurnal Publikasi

Abstract

Highlights

Summary

Talk to us

Similar Papers

More From: Building of Informatics, Technology and Science (BITS)