org.apache.lucene.search.function

Java Source Code / Java Documentation
1. 6.0 JDK Core
2. 6.0 JDK Modules
3. 6.0 JDK Modules com.sun
4. 6.0 JDK Modules com.sun.java
5. 6.0 JDK Modules sun
6. 6.0 JDK Platform
7. Ajax
8. Apache Harmony Java SE
9. Aspect oriented
10. Authentication Authorization
11. Blogger System
12. Build
13. Byte Code
14. Cache
15. Chart
16. Chat
17. Code Analyzer
18. Collaboration
19. Content Management System
20. Database Client
21. Database DBMS
22. Database JDBC Connection Pool
23. Database ORM
24. Development
25. EJB Server geronimo
26. EJB Server GlassFish
27. EJB Server JBoss 4.2.1
28. EJB Server resin 3.1.5
29. ERP CRM Financial
30. ESB
31. Forum
32. GIS
33. Graphic Library
34. Groupware
35. HTML Parser
36. IDE
37. IDE Eclipse
38. IDE Netbeans
39. Installer
40. Internationalization Localization
41. Inversion of Control
42. Issue Tracking
43. J2EE
44. JBoss
45. JMS
46. JMX
47. Library
48. Mail Clients
49. Net
50. Parser
51. PDF
52. Portal
53. Profiler
54. Project Management
55. Report
56. RSS RDF
57. Rule Engine
58. Science
59. Scripting
60. Search Engine
61. Security
62. Sevlet Container
63. Source Control
64. Swing Library
65. Template Engine
66. Test Coverage
67. Testing
68. UML
69. Web Crawler
70. Web Framework
71. Web Mail
72. Web Server
73. Web Services
74. Web Services apache cxf 2.0.1
75. Web Services AXIS2
76. Wiki Engine
77. Workflow Engines
78. XML
79. XML UI
Java
Java Tutorial
Java Open Source
Jar File Download
Java Articles
Java Products
Java by API
Photoshop Tutorials
Maya Tutorials
Flash Tutorials
3ds-Max Tutorials
Illustrator Tutorials
GIMP Tutorials
C# / C Sharp
C# / CSharp Tutorial
C# / CSharp Open Source
ASP.Net
ASP.NET Tutorial
JavaScript DHTML
JavaScript Tutorial
JavaScript Reference
HTML / CSS
HTML CSS Reference
C / ANSI-C
C Tutorial
C++
C++ Tutorial
Ruby
PHP
Python
Python Tutorial
Python Open Source
SQL Server / T-SQL
SQL Server / T-SQL Tutorial
Oracle PL / SQL
Oracle PL/SQL Tutorial
PostgreSQL
SQL / MySQL
MySQL Tutorial
VB.Net
VB.Net Tutorial
Flash / Flex / ActionScript
VBA / Excel / Access / Word
XML
XML Tutorial
Microsoft Office PowerPoint 2007 Tutorial
Microsoft Office Excel 2007 Tutorial
Microsoft Office Word 2007 Tutorial
Java Source Code / Java Documentation » Search Engine » lucene » org.apache.lucene.search.function 
org.apache.lucene.search.function
org.apache.lucene.search.function
Programmatic control over documents scores.
The function package provides tight control over documents scores.
WARNING: The status of the search.function package is experimental. The APIs introduced here might change in the future and will not be supported anymore in such a case.
Two types of queries are available in this package:
  1. Custom Score queries - allowing to set the score of a matching document as a mathematical expression over scores of that document by contained (sub) queries.
  2. Field score queries - allowing to base the score of a document on numeric values of indexed fields.
 
Some possible uses of these queries:
  1. Normalizing the document scores by values indexed in a special field - for instance, experimenting with a different doc length normalization.
  2. Introducing some static scoring element, to the score of a document, - for instance using some topological attribute of the links to/from a document.
  3. Computing the score of a matching document as an arbitrary odd function of its score by a certain query.
Performance and Quality Considerations:
  1. When scoring by values of indexed fields, these values are loaded into memory. Unlike the regular scoring, where the required information is read from disk as necessary, here field values are loaded once and cached by Lucene in memory for further use, anticipating reuse by further queries. While all this is carefully cached with performance in mind, it is recommended to use these features only when the default Lucene scoring does not match your "special" application needs.
  2. Use only with carefully selected fields, because in most cases, search quality with regular Lucene scoring would outperform that of scoring by field values.
  3. Values of fields used for scoring should match. Do not apply on a field containing arbitrary (long) text. Do not mix values in the same field if that field is used for scoring.
  4. Smaller (shorter) field tokens means less RAM (something always desired). When using FieldScoreQuery, select the shortest FieldScoreQuery.Type that is sufficient for the used field values.
  5. Reusing IndexReaders/IndexSearchers is essential, because the caching of field tokens is based on an IndexReader. Whenever a new IndexReader is used, values currently in the cache cannot be used and new values must be loaded from disk. So replace/refresh readers/searchers in a controlled manner.
History and Credits:
  • A large part of the code of this package was originated from Yonik's FunctionQuery code that was imported from Solr (see LUCENE-446).
  • The idea behind CustomScoreQurey is borrowed from the "Easily create queries that transform sub-query scores arbitrarily" contribution by Mike Klaas (see LUCENE-850) though the implementation and API here are different.
Code sample:

Note: code snippets here should work, but they were never really compiled... so, tests sources under TestCustomScoreQuery, TestFieldScoreQuery and TestOrdValues may also be useful.

  1. Using field (byte) values to as scores:

    Indexing:

          f = new Field("score", "7", Field.Store.NO, Field.Index.UN_TOKENIZED);
          f.setOmitNorms(true);
          d1.add(f);
        

    Search:

          Query q = new FieldScoreQuery("score", FieldScoreQuery.Type.BYTE);
        
    Document d1 above would get a score of 7.
  2. Manipulating scores

    Dividing the original score of each document by a square root of its docid (just to demonstrate what it takes to manipulate scores this way)

          Query q = queryParser.parse("my query text");
          CustomScoreQuery customQ = new CustomScoreQuery(q) {
            public float customScore(int doc, float subQueryScore, float valSrcScore) {
              return subQueryScore / Math.sqrt(docid);
            }
          };
        

    For more informative debug info on the custom query, also override the name() method:

          CustomScoreQuery customQ = new CustomScoreQuery(q) {
            public float customScore(int doc, float subQueryScore, float valSrcScore) {
              return subQueryScore / Math.sqrt(docid);
            }
            public String name() {
              return "1/sqrt(docid)";
            }
          };
        

    Taking the square root of the original score and multiplying it by a "short field driven score", ie, the short value that was indexed for the scored doc in a certain field:

          Query q = queryParser.parse("my query text");
          FieldScoreQuery qf = new FieldScoreQuery("shortScore", FieldScoreQuery.Type.SHORT);
          CustomScoreQuery customQ = new CustomScoreQuery(q,qf) {
            public float customScore(int doc, float subQueryScore, float valSrcScore) {
              return Math.sqrt(subQueryScore) * valSrcScore;
            }
            public String name() {
              return "shortVal*sqrt(score)";
            }
          };
        
Java Source File NameTypeComment
ByteFieldSource.javaClass Expert: obtains single byte field values from the org.apache.lucene.search.FieldCache FieldCache using getBytes() and makes those values available as other numeric types, casting as needed.

WARNING: The status of the search.function package is experimental.

CustomScoreQuery.javaClass Query that sets document score as a programmatic function of several (sub) scores.
  1. the score of its subQuery (any query)
  2. (optional) the score of its ValueSourtceQuery (or queries), for most simple/convineient use case this query would be a org.apache.lucene.search.function.FieldScoreQuery FieldScoreQuery
Subclasses can modify the computation by overriding CustomScoreQuery.customScore(int,float,float) .

WARNING: The status of the search.function package is experimental.

DocValues.javaClass Expert: represents field values as different types. Normally created via a org.apache.lucene.search.function.ValueSource ValueSuorce for a particular field and reader.

WARNING: The status of the search.function package is experimental.

FieldCacheSource.javaClass Expert: A base class for ValueSource implementations that retrieve values for a single field from the org.apache.lucene.search.FieldCache FieldCache .

Fields used herein nust be indexed (doesn't matter if these fields are stored or not).

It is assumed that each such indexed field is untokenized, or at least has a single token in a document. For documents with multiple tokens of the same field, behavior is undefined (It is likely that current code would use the value of one of these tokens, but this is not guaranteed).

Document with no tokens in this field are assigned the Zero value.

FieldScoreQuery.javaClass A query that scores each document as the value of the numeric input field.

The query matches all documents, and scores each document according to the numeric value of that field.

FloatFieldSource.javaClass Expert: obtains float field values from the org.apache.lucene.search.FieldCache FieldCache using getFloats() and makes those values available as other numeric types, casting as needed.

WARNING: The status of the search.function package is experimental.

FunctionTestSetup.javaClass
IntFieldSource.javaClass Expert: obtains int field values from the org.apache.lucene.search.FieldCache FieldCache using getInts() and makes those values available as other numeric types, casting as needed.

WARNING: The status of the search.function package is experimental.

OrdFieldSource.javaClass Expert: obtains the ordinal of the field value from the default Lucene org.apache.lucene.search.FieldCache Fieldcache using getStringIndex().

The native lucene index order is used to assign an ordinal value for each field value.

Example:
If there were only three field values: "apple","banana","pear"
then ord("apple")=1, ord("banana")=2, ord("pear")=3

WARNING: ord() depends on the position in an index and can thus change when other documents are inserted or deleted, or if a MultiSearcher is used.

ReverseOrdFieldSource.javaClass Expert: obtains the ordinal of the field value from the default Lucene org.apache.lucene.search.FieldCache FieldCache using getStringIndex() and reverses the order.

The native lucene index order is used to assign an ordinal value for each field value.

Field values (terms) are lexicographically ordered by unicode value, and numbered starting at 1.
Example of reverse ordinal (rord):
If there were only three field values: "apple","banana","pear"
then rord("apple")=3, rord("banana")=2, ord("pear")=1

WARNING: rord() depends on the position in an index and can thus change when other documents are inserted or deleted, or if a MultiSearcher is used.

ShortFieldSource.javaClass Expert: obtains short field values from the org.apache.lucene.search.FieldCache FieldCache using getShorts() and makes those values available as other numeric types, casting as needed.

WARNING: The status of the search.function package is experimental.

TestCustomScoreQuery.javaClass Test CustomScoreQuery search.
TestFieldScoreQuery.javaClass Test FieldScoreQuery search.

Tests here create an index with a few documents, each having an int value indexed field and a float value indexed field. The values of these fields are later used for scoring.

The rank tests use Hits to verify that docs are ordered (by score) as expected.

The exact score tests use TopDocs top to verify the exact score.

TestOrdValues.javaClass Test search based on OrdFieldSource and ReverseOrdFieldSource.

Tests here create an index with a few documents, each having an indexed "id" field. The ord values of this field are later used for scoring.

The order tests use Hits to verify that docs are ordered as expected.

The exact score tests use TopDocs top to verify the exact score.

ValueSource.javaClass Expert: source of values for basic function queries.

At its default/simplest form, values - one per doc - are used as the score of that doc.

Values are instantiated as org.apache.lucene.search.function.DocValues DocValues for a particular reader.

ValueSource implementations differ in RAM requirements: it would always be a factor of the number of documents, but for each document the number of bytes can be 1, 2, 4, or 8.

ValueSourceQuery.javaClass Expert: A Query that sets the scores of document to the values obtained from a org.apache.lucene.search.function.ValueSource ValueSource .

The value source can be based on a (cached) value of an indexd field, but it can also be based on an external source, e.g.

www.java2java.com | Contact Us
Copyright 2009 - 12 Demo Source and Support. All rights reserved.
All other trademarks are property of their respective owners.