Abstract:
A text retrieval system that uses the Latent Semantic Analysis for indexing is developed. A collection of 106 documents are represented as vectors in a 377-dimensional term space. The number of dimensions corresponds to the number of extracted content words found in all the document titles in the database. The 377 by 106 matrix representing the entire data set is decomposed using singular value decomposition and the resulting matrices are truncated to 10 orthogonal factors. The recombination of the truncated matrices forms the basis for the computation of the distances of each document from a query vector obtained by treating a query as a pseudo-document. Results indicate that indexing using LSA is promising tool for improving retrieval.