Information Retrieval and Query Ranking of Unstructured Data in Dataspace using Vector Space Model

Main Article Content

Niranjan Lal, Shamimul Qamar, Savita Shi

Abstract

There is a vast amount of data is available on the web in the form of WebPages, on the clouds or in the repositories of any organization. All data are stored digitally by any companies, enterprises or any organization, these data may be text data, streamed data, images, Facebook data, Twitter data, Videos and other documents available digitally on the Internet related any areas like manufacturing, engineering, medical, etc. collectively called Dataspace. The data available over the internet may be structured data, unstructured or without any format. The storing mechanism is different for each organization but searching and retrieval of data should be easy from the user�s point, they are able to find the relevant information efficiently and accurate information that should be satisfied them, so there should be a proper model, search engine or interface for finding the information. Retrieving information from the Internet and large databases are quite difficult and time-consuming especially if such information is unstructured. Several algorithms and techniques have been developed in the area of data mining and information retrieval yet retrieving data from large databases continue to be problematic. In this paper, the Vector Space Model (VSM) technique of information retrieval is used, by using VSM model documents and queries can be represented as a vector, whose dimension is considered as terms to build the index represent the unstructured data. VSM is widely used for retrieving the documents and data due to its simplicity and efficiency work on a large number of datasets. VSM is based on term weighting on document vectors using three steps 1) First step is used to create indexes of the documents to retrieve the relevant data, 2) In the second step weighting of the indexed terms is used to retrieve the appropriate document for the end user, and (3) In the Finally steps the similarity measures is between documents to rank the documents relevant to the end user query using. The cosine measure is often used. We then found out that it is easier to retrieve data or information based on their similarity measures and produces a better and more efficient technique or model for information retrieval.

Article Details

How to Cite
, N. L. S. Q. S. S. (2018). Information Retrieval and Query Ranking of Unstructured Data in Dataspace using Vector Space Model. International Journal on Future Revolution in Computer Science &Amp; Communication Engineering, 4(1), 17–24. Retrieved from http://ijfrcsce.org/index.php/ijfrcsce/article/view/957
Section
Articles