Query Language
QueryParser
To escape special characters use a backslash (). Special characters are:
\ + - ! ( ) : ^ ] { } ~ * ?
Boolean operators:
AND, OR and NOT
Terms listed without an operator use an implicit operator which by default is:
OR
To change the default use an instance of QueryParser rather than the static parse method.
QueryParser parser = new QueryParser("fieldname", analyzer);
parser.setOperator(QueryParser.DEFAULT_OPERATOR_AND);
Negating a term using NOT
must be combined with at least one nonnegated
term. (See book, “Lucene in Action”, Section 3.5.2)
Each of the uppercase word operators has a shortcut syntax:
Verbose Shortcut
a AND b +a +b
a OR b a b
a AND NOT b +a -b
Grouping
You can construct complex nested clauses using parentheses:
(agile OR exteme) AND methodology
Field Selection
A default field is provided to the parse method. Use field selector notation:
title:lucene
You can group field selection over several terms:
title:(a b c)
Range Queries
Text or date range queries use bracketed syntax with TO between the beginning and ending term. The type of bracket determines whether the range is inclusive (square brackets) or exclusive (curly brackets).
pubmonth:[200401 TO 200412]
{200401 TO 200412}
Note: Nondate range queries use the beginning and ending terms as the user entered them i.e. the terms are not analyzed. The terms must not contain whitespace. (the example queries are not date fields but text in the format YYYYMM) Date parsing is slightly weird. See book “Lucene in Action”, Section 3.5.5
Phrase Queries
Terms in double quotes create a PhraseQuery
:
\"This is Some Phrase*\"
The query will be analyzed, so in this example the StandardAnalyzer
parses
the query down to “some phrase”.
Note: The asterisk does not result in a wildcard search. The slop factor is zero unless you specify it using a trailing tilde:
\"sloppy phrase\"~5"
Note: A sloppy phrase query doesn’t require that the terms match in the same order. (For a solution, book “Lucene in Action”, section 6.3.4)
Note: Single-term phrases are converted to a TermQuery (not a PhraseQuery). See book “Lucene in Action”, Section 3.4.5 for info on PhraseQuery
Wildcard and prefix queries
If a term contains an asterisk or a question mark, it’s considered a WildcardQuery. When the term only contains a trailing asterisk it’s considered a PrefixQuery:
PrefixQuery*
Note: Both are converted to lowercase by default (this can be controlled). Note: Wildcards at the beginning of a term are prohibited.
Fuzzy Queries
A trailing tilde creates a fuzzy query on the preceding term. Note: The same performance caveats apply as the WildcardQuery. See book, “Lucene in Action”, Section 3.4.7
Boosting Queries
A carat (^) followed by a floating point number sets the boost factor for the preceding query.
junit^2.0 testing
Another example:
boostkeywords:queen^1.5 body:queen
See book, “Lucene in Action”, Section 3.3