![]() Information extraction is about extracting information and finding the right piece of information the user is looking for, so he doesn’t need to read document, or even open it to find it. User has then to read document and find the information of interest. ![]() Information retrieval is about getting the right document. Information retrieval is different from information extraction. It can be a hard disk with PDF, TXT and Word documents it can be a database, both relational or non relational or some network with web pages. By data store we mean something that stores documents filled with some information. More correct name for the process of finding right information from data store is information retrieval. Search engine is a piece of software that helps users find in some data store the most relevant document as fast as possible. I would like to make some balance here by also describing some principles, but not too deep in details. Some of them are too long, and other are not explaining background, just showing code. However, there are many Lucene tutorials and books. I will show here how to build some simple search engine on Lucene. Not many people today are building search engine from the scratch, since there are several engine libraries out there and one of the most famous one is Apache Lucene. But also I got some background on some text mining courses (both at Coursera and at the University of Manchester) and I came to a point of my research where I had to build search engine. Even I worked in a company that was I guess pretending to build search engine (I was there just one month, when I realized they are not serious). I have crossed over search engines several times in my life. ![]() Since the Google took over lives and branded a verb for searching as Googling, making a search engine is considered cool thing.
0 Comments
Leave a Reply. |