Abstract:
Though widely use across many research repositories, keyword search may not be
sufficient for people who are becoming more familiar with the use of chatbots like
ChatGPT. The proposed system will serve as a search engine for the UPM IRS
which is a repository for the university’s theses. The system will utilize the vector
space model in retrieving documents by directly embedding the user’s query into
a vector to be compared to the vectors stored in a vector store by cosine similarity.
Retrieval Augmented Generation (RAG) will then be used as the top documents
will be given to a large language model (LLM) to create an overview of the top
documents. The combination of a semantic retrieval method and a LLM was able
to yield a good user experience and relevant results to the users.