Cross-Language Information Retrieval for Poetry Form of Literature-Based on Machine Transliteration using Modified Cosine Similarity Algorithm
Main Article Content
Abstract
Introduction: The current era of the web content retrieval is enormously increasing. The regional users are accessing the web contents directly through the search-based information retrieval. The amount of contents and the literature available in the regional language is increasing day by day.
Objectives: This work presents the finding semiotic similarities between the poetry form of data in the form of stanza, lyrical and poetry. This study shows the methodology to retrieve the semiotic similarities between the regional contents available on World Wide Web.
Methods: The approach proposed a system that effectively recognize and recommend regional information based on the user queries regardless of query written in the Devanagari or Romanized script. The method achieves the objective to fetch the given query written in Devanagari or Romanized from the Marathi database to implement the search-based information retrieval. To achieve this input query is converts Devanagari script poetry into Roman script for the next processing steps. It uses defined Custom Mappings and Custom transcriptions Function by pre-processing the data, the model ensure that it is clean and in a suitable format for generating embeddings and performing similarity searches. These embeddings generated for regional input query and transliterated poetry stanza, sequentially combined to improve the algorithms accuracy to identify and find the input among different script.
Results: The input query written in Roman script transliterated into Marathi using the customized transcription function to generate the embedding further. The adapted cosine semiotic similarity value is used to compare the embeddings which makes the model to fetch the most semantically matched poetry stanza.
Conclusions: The proposed customized cosine semiotic similarity for retrieval achieves the accuracy of 92% and a loss of 0.18.