Wednesday, June 13, 2012

Solr - Tomcat UTF-8 Encoding


To Support Unicode Characters with Solr and Tomcat, you need to have additional settings :-

<Server ...>
 <Service ...>
   <Connector ... URIEncoding="UTF-8"/>
     ...
   </Connector>
 </Service>
</Server>


Solr - Clean up Index



Sometimes you want to delete all the records from the Solr index without deleting the index directory.
This can be done by executing an http url :-
http://host:port/solr/core/update?stream.body=<delete><query>*:*</query></delete>&commit=true



Or by posting data xml data :-
<delete><query>*:*</query></delete>

Cleaning data using Solrj :-

SolrServer server = null;
try {
    server = new CommonsHttpSolrServer(masterIndexUrl);
    server.deleteByQuery("*:*");
    server.commit(true, true);
    server.optimize(true, true);            
} catch (Exception e) {
    try {
        server.rollback();                
    } catch (Exception e1) {
        
    }
}

Solr Sort feature


User usually like to Solr of Fields such as Document title and do not get the expected results.

Few key things to take into account when using fields for Sorting in Solr -
  • Sorting doesn't work good on multivalued and tokenized fields. (multivalued="false")
  • The field should be marked as indexed to enable sorting. (indexed="true")

Documentation
Sorting can be done on the "score" of the document, or on any multiValued="false" indexed="true" field provided that field is either non-tokenized (ie: has no Analyzer) or uses an Analyzer that only produces a single Term (ie: uses the KeywordTokenizer)

<field name="title" type="text" indexed="true" stored="true"/>

<field name="title_sort" type="string" indexed="true" stored="false"/>

<!-- Copy to a string type field -->
<copyField source="title" dest="title_sort" />

Tuesday, June 12, 2012

Chaining Solr copyField

Solr does not allow chaining of copyfields and it does not recurse.


e.g.
<!-- Fields defination -->   
<field name="subject" type="text" indexed="true" stored="true"/>   
<field name="subject_text" type="text" indexed="true" stored="true"/>   
<field name="text" type="text" indexed="true" stored="true"/>

<!-- Copying subject field to subject_text field -->   
<copyField source="subject" dest="subject_text"/>

<!-- The subject_text cannot feed into text.So you would land up with no subject being copied -->   
<copyField source="subject_text" dest="text"/> 

Solr Documentation @ http://wiki.apache.org/solr/SchemaXml#Copy_Fields quotes
The copy is done at the stream source level and no copy feeds into another copy.

So a copyfield cannot be a source of other copyfield tag.
The copyfield source must be an actual field, which has some value and does no cascade.