DSpace Repository

Improving the effectiveness of information retrieval with genetic programming

Show simple item record

dc.contributor.advisor Lubinsky DJ en
dc.contributor.author Oren N en
dc.date.accessioned 2016-09-22T11:18:05Z
dc.date.available 2016-09-22T11:18:05Z
dc.date.created 1999 en
dc.date.submitted 2002 en
dc.identifier.uri http://hdl.handle.net/20.500.11892/109383
dc.description.abstract Information retrieval (IR) systems are responsible for the storage and retrieval of large amounts of data in an efficient manner. An important subtask of IR is searching, which deals with retrieving documents stored within the system in response to a user's information needs. <br><br> Many IR systems deal with text documents, and a myriad of techniques has been proposed to fulfil this task. All techniques attempt to classify documents based on their relevance to the user's needs. Amongst the most popular of approaches, due to both their simplicity and relative effectiveness, are vector based techniques based on a scheme utilising the frequency of words in the document and document collection. One class of these schemes is known as tf.idf. While popular, tf.idf schemes have very little theoretical grounding, and in many cases, their performance is not adequate. <br><br> This research seeks to find alternative vector schemes to tf.idf. This is achieved by using a technique from machine learning known as genetic programming. Genetic programming attempts to search a program space for "good" solutions in a stochastic directed manner. This search is done in a manner motivated by evolution, in that the good programs are more likely to be combined to form new programs, while poor, solutions die off. <br><br> Within this research, evolved programs consisted of a subset of possible classifiers, and one program was deemed better than another if it better classified documents as relevant or irrelevant to a user query. <br><br> The contribution of this research is an evaluation of the effectiveness of using genetic programming to create classifiers in a number of IR settings. A number of findings were made: It was discovered that the approach proposed here is often able to outperform the basic tf.idf method: on the CISI and CF datasets, improvements of just under five percent were observed. Furthermore, the form of the evolved programs indicates that classifiers with a structure different to tf.idf may have a useful role to play in information retrieval methods. Lastly, this research proposes a number of additional areas of investigation that may further enhance the effectiveness of this technique. en
dc.language English en
dc.title Improving the effectiveness of information retrieval with genetic programming en
dc.type Masters degree en
dc.description.degree MSc (Computer Science) en

Files in this item

Files Size Format View

There are no files associated with this item.

This item appears in the following Collection(s)

Show simple item record