Hibernate.orgCommunity Documentation
The SearchFactory object keeps track of the
underlying Lucene resources for Hibernate Search, it's also a convenient
way to access Lucene natively. The SearchFactory
can be accessed from a FullTextSession:
Example 8.1. Accessing the SearchFactory
FullTextSession fullTextSession = Search.getFullTextSession(regularSession); SearchFactory searchFactory = fullTextSession.getSearchFactory();
You can always access the Lucene directories through plain Lucene,
the Directory structure is in no way different with or without Hibernate
Search. However there are some more convenient ways to access a given
Directory. The SearchFactory keeps track of the
DirectoryProviders per indexed class. One directory
provider can be shared amongst several indexed classes if the classes
share the same underlying index directory. While usually not the case, a
given entity can have several DirectoryProviders if
the index is sharded (see Section 3.2, “Sharding indexes”).
Example 8.2. Accessing the Lucene Directory
DirectoryProvider[] provider = searchFactory.getDirectoryProviders(Order.class); org.apache.lucene.store.Directory directory = provider[0].getDirectory();
In this example, directory points to the lucene index storing
Orders information. Note that the obtained Lucene
directory must not be closed (this is Hibernate Search
responsibility).
Queries in Lucene are executed on an IndexReader.
Hibernate Search caches all index readers to maximize performance. Your
code can access this cached resources, but you have to follow some "good
citizen" rules.
Example 8.3. Accessing an IndexReader
DirectoryProvider orderProvider = searchFactory.getDirectoryProviders(Order.class)[0];
DirectoryProvider clientProvider = searchFactory.getDirectoryProviders(Client.class)[0];
ReaderProvider readerProvider = searchFactory.getReaderProvider();
IndexReader reader = readerProvider.openReader(orderProvider, clientProvider);
try {
//do read-only operations on the reader
}
finally {
readerProvider.closeReader(reader);
}The ReaderProvider (described in Reader strategy), will open an IndexReader
on top of the index(es) referenced by the directory providers. Because
this IndexReader is shared amongst several clients,
you must adhere to the following rules:
Never call indexReader.close(), but always call readerProvider.closeReader(reader), preferably in a finally block.
Don't use this IndexReader for
modification operations (you would get an exception). If you want to
use a read/write index reader, open one from the Lucene Directory
object.
Aside from those rules, you can use the IndexReader freely,
especially to do native queries. Using the shared
IndexReaders will make most queries more
efficient.
Lucene allows the user to customize its scoring formula by extending
org.apache.lucene.search.Similarity. The abstract
methods defined in this class match the factors of the following formula
calculating the score of query q for document d:
score(q,d) = coord(q,d) · queryNorm(q) · ∑t in q ( tf(t in d) · idf(t)2 · t.getBoost() · norm(t,d) )
| Factor | Description |
|---|---|
| tf(t ind) | Term frequency factor for the term (t) in the document (d). |
| idf(t) | Inverse document frequency of the term. |
| coord(q,d) | Score factor based on how many of the query terms are found in the specified document. |
| queryNorm(q) | Normalizing factor used to make scores between queries comparable. |
| t.getBoost() | Field boost. |
| norm(t,d) | Encapsulates a few (indexing time) boost and length factors. |
It is beyond the scope of this manual to explain this
formula in more detail. Please refer to
Similarity's Javadocs for more information.
Hibernate Search provides two ways to modify Lucene's similarity
calculation. First you can set the default similarity by specifying the
fully specified classname of your Similarity
implementation using the property
hibernate.search.similarity. The default value is
org.apache.lucene.search.DefaultSimilarity.
Additionally you can override the default similarity on class level using
the @Similarity annotation.
@Entity
@Indexed
@Similarity(impl = DummySimilarity.class)
public class Book {
...
}As an example, let's assume it is not important how often a
term appears in a document. Documents with a single occurrence of the term
should be scored the same as documents with multiple occurrences. In this
case your custom implementation of the method tf(float
freq) should return 1.0.
When two entities share the same index they must declare the
same Similarity implementation. Classes in the same
class hierarchy always share the index, so it's not allowed to override the
Similarity implementation in a subtype.