Tuesday, July 25, 2006

Building Search Engines. Design - Architecture and other

1) Building a Vector Space Search Engine

a. Create a list of unique word for each document.

b. Remove the junk words, pronouns like a, and, the etc.

c. Create a vector for each document against the above worklist depending on if the word from list is present in document or not. E.g. [1, 0, 1, 0, 0, 0, 1]

d. Create a vector of a query words and match it against the vector of each document using the cosine formula. And display the list of documents with high value of cosines against some threshold value between (0…..1)

e. Get the implementation from [http://www.perl.com/pub/a/2003/02/19/engine.html]

0 Comments:

Post a Comment

<< Home