If you have a search requirement where you need to search for names with accented characters or other special punctuation, like

O’Neil, then the names should be searchable when the user types Oneil or O’Neil. If the search should return all results which have O’Neil or Oneil or Úna or Una, then SOLR makes it very easy to implement.

All you need to do is following:

  1. Implement a computed field and assign a custom field type to that computed field.
  2. Once you build your computed field goto the schema.xml of your index where you want the accent search to be implemented.
  3. Add new dynamic field under the fields section like below. In the below example, I am creating a computed field with fieldtype as auto_complete. You can name it anything you’d like. Make sure to have it marked as indexed and stored true.
    <dynamicField name="*_ac" type="auto_complete" indexed="true" stored="true" />
  4. Then in the same field under fieldtypes define your fieldtype as auto_complete to allow SOLR to search and return the accents.
    <fieldType name="auto_complete" class="solr.TextField" positionIncrementGap="100">
    <analyzer type="index">
    <tokenizer class="solr.WhitespaceTokenizerFactory" />
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="20" side="front"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    </analyzer>
    <analyzer type="query">
    <tokenizer class="solr.WhitespaceTokenizerFactory" />
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true" />
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="20" side="front"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
    </analyzer>
  5. Make sure that the computed field you have created has the fieldtype auto_complete.
  6. Once you add above changes make sure to rebuild the indexes and then see the magic.
  7. If you search Oneil or O’Neil you should see results like Kevin O’Neil and Mark Oneil both.

    The above implementation supports all below accents:

    • é – e
    • François – Francois
    • D’eli – Deli
    • Frédéric – Frederic
    • Rüd – Rud
    • Enikő – Eniko
    • Horváth – Horvath
    • Benjamín – Benjamin
    • Caliò – Calio
    • Miccichè – Micciche
    • Müller – Muller
    • Loïc – Loic
    • Schröder – Schroder
  8. In the fieldtype section the ordering is the most important thing. If it is not in the correct order then search won’t work.
  9. All the accent search above is happening because of the below lines.
    <filter class="solr.ASCIIFoldingFilterFactory"/>
    <charFilter class="solr.MappingCharFilterFactory" mapping="mapping-ISOLatin1Accent.txt"/>
  10. The mapping-ISOLatin1Accent.txt should have the mappings for such “special” characters. In SOLR, this file comes pre-populated by default. For example: ü -> ue, ä -> ae, etc.
Advertisements