+ The Oktave Forum » Technical » Engineering » Data Management (Moderator: mandar)
|-+ Lucene search query syntax
Username:
Password:

Pages: [1]
Topic Tools  
Read May 12, 2009, 08:39:05 am #0
rimi

Lucene search query syntax

Hi ,I am using lucene search engine in my applicaiton.I am facing trouble to search a particular word like "AWT-T" . I want only match data with this particular word "AWT-T" should be shown. But right now all the data matching with AWT is also showing.


Please help me.
Offline  
Read May 12, 2009, 07:24:34 pm #1
sids

Re: Lucene search query syntax

Lucene uses a Tokenizer to break-up text into individual tokens that are indexed / searched-for. You seem to be using a tokenizer that is breaking up the words at hyphens too (StandardTokenizer, one of the more commonly used tokenizers, has this behaviour.) If you would like to change the tokenization scheme: either use some other suitable tokenizer (if one exists,) or, write your own — it's pretty easy. Do ensure that you are using the same tokenizer for both indexing as well as searching; otherwise the universe will come to an end (or something to that effect.)

You can use Luke to see how your query is being broken up. It can also show you how your indexed text was broken up. Luke is an indispensable tool for when you have to figure out such things.

You can find some useful notes on Lucene Analyzers/Tokenizers and how to customise them here: http://mext.at/?p=26

Hope this helps.


http://www.grok.in/
"Ignorance killed the cat, curiosity was framed."
Offline  
Read May 12, 2009, 07:33:14 pm #2
sids

Re: Lucene search query syntax

The above notwithstanding, if you are just looking to modify the syntax of the query to solve the problem, you can try to put double quotes around the word — "AWT-T" — enclasing in double quotes is Lucene's syntax to force a phrase search. However, this will not work if your Analyzer (that was used when indexing) drops single-letter tokens. Again, Luke can help you ascertain this.


http://www.grok.in/
"Ignorance killed the cat, curiosity was framed."
Offline  
Read May 14, 2009, 02:58:29 am #3
llabhilash

Re: Lucene search query syntax

Sorry to barge in but thought its relevant..

On the lucene search quality reports page (http://wiki.apache.org/lucene-java/TREC_2007_Million_Queries_Track_-_IBM_Haifa_Team), it says that using the "Sweet Spot Similarity" improves the search results along with the use of other factors like "lexical affinities", tf normalisation and a query expansion step.

I could not find any material for "Sweet Spot Similarity" and how to make lucene use that instead of the default similarity score.

Any pointers would be of great help.

Thanks.
Offline  
Read May 14, 2009, 07:05:52 am #4
aditya

Re: Lucene search query syntax

AFAIK Sweet-spot similarity is a notion of similarity defined in Lucene. Here sweet-spot refers to the length of the document. If a document is shorter or longer than the sweet-spot of document length then it is penalised during retrieval. The javadoc page has some details on the implementation details.

http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/misc/SweetSpotSimilarity.html

Also, this is the full paper of the page you were referring to and it seems to have some details on it.

http://elvis.slis.indiana.edu/irpub/TREC/TREC2007_NOTEBOOK/NOTEBOOK.PAPERS/ibm_haifa_mq.pdf

Hope this helps.
Offline  
Pages: [1]
Jump to: