Wednesday, June 22, 2005

Review - Lucene in Action

Lucene in ActionLucene in Action
by Erik Hatcher, Otis Gospodnetic

5 out of 5 stars

Lucene is an open source, search engine library that provides a sophisticated API that can be used to index documents and provide advanced search capabilities. Although using Lucene is not particularly difficult, like many open source projects, the available documentation leaves something to be desired. This book nicely fills that missing area.

The book starts with an introduction explaining both what Lucene is and also what it isn't. The next couple of chapters show us how to use the Lucene classes to index documents and then search for those documents. The authors next show us how to improve our searches by using different analyzers including how to write our own custom analyzers. Custom analyzers can allow, for example, searches using common misspellings or words that sound alike. The book moves on to look at the advanced search features that are available to the developer as well as explaining how to add your own features into Lucene. Since Lucene works only with text data, the authors next show us how to convert various data formats such as Word documents, HTML documents, and PDFs into text formats to allow Lucene to index and search them. The authors wrap up the main portion of the book with a look at some of the tools and extensions available that can provide some nice additional functionality such as highlighting search words in the found documents. The final chapter is a look at some real-life case studies of Lucene contributed by various authors. Some of the writing here is rather weak and seems, at least in some cases, to be little more than ads for the various sites and products.

The book is very well written and gives a good in-depth exploration of Lucene. The authors give plenty of code snippets showing the features of Lucene and provide a complete application to review as well. Anyone interested in using Lucene and wants more than the little documentation available should consider getting this book. One thing that annoyed me about the book was the constant pushing of JUnit. Most of the code samples include some traces of unit testing and seeing blocks of code with "assertEquals" everywhere was distracting to say the least. The authors should have considered that not everyone is using JUnit and that when you are trying to understand code, additional off-topic lines are simply confusing.

This earned 5 stars on Amazon. The book is published by Manning.

This review and all my other reviews can be seen on My Amazon Reviews page.

No comments: