Browsing by Author "Shakenov, Nurzhan"
Now showing 1 - 2 of 2
Results Per Page
Sort Options
- Development of a linguistic support model for information retrieval for cloud library systemsPublication . Shakenov, Nurzhan; Lopes, Rui Pedro; Tungatarova, AigulThis dissertation addresses the limitations of traditional keyword-based search methods in cloud library systems by developing a robust linguistic support model. Leveraging advanced techniques in text extraction, embedding generation, and semantic search, this study aims to enhance the accuracy and relevance of search results. Document ingestion and text extraction were performed using PyMuPDF, ensuring high-quality data for subsequent processes. Text embeddings generated by LangChain’s Mistral model were stored in the Chroma vector database, facilitating efficient retrieval. A user-friendly interface developed with Flask enabled seamless user interaction. The project faced challenges such as API key requirements for GPT-2, text extraction accuracy, and large-scale data handling, which were addressed through alternative tools and methodologies. The results demonstrate significant improvements in search accuracy and relevance, aligning with recent advancements in NLP. Future work will focus on enhancing data preprocessing, expanding datasets, and integrating more advanced search algorithms. This study contributes valuable insights into the practical application of NLP techniques in cloud library systems, offering a foundation for further research and development in the field.
- Development of a linguistic support model for information retrieval for cloud library systemsPublication . Shakenov, Nurzhan; Lopes, Rui Pedro; Tungatarova, AigulThis dissertation addresses the limitations of traditional keyword-based search methodsin cloud library systems by developing a robust linguistic support model. Leveraging advanced techniques in text extraction, embedding generation, and semantic search, this study aims to enhance the accuracy and relevance of search results. Document ingestion and text extraction were performed using PyMuPDF, ensuring high-quality data for subsequent processes. Text embeddings generated by LangChain’s Mistral model were stored in the Chroma vector database, facilitating efficient retrieval. A user-friendly interface developed with Flask enabled seamless user interaction. The project faced challenges such as API key requirements for GPT-2, text extraction accuracy, and large-scale data handling, which were addressed through alternative tools and methodologies. The results demonstrate significant improvements in search accuracy and relevance, aligning with recent advancements in NLP. Future work will focus on enhancing data preprocessing, expanding datasets, and integrating more advanced search algorithms. This study contributes valuable insights into the practical application of NLP techniques in cloud library systems, offering a foundation for further research and development in the field.
