Share this page 

Compare accentuated lettersTag(s): Internationalization String/Number


From the javadoc,
the result of String.compareTo() is a negative integer
if this String object lexicographically precedes the
argument string. The result is a positive integer if
this String object lexicographically follows the argument
string. The result is zero if the strings are equal;
compareTo returns 0 exactly when the equals(Object)
method would return true.

In this code, we have 2 strings, "état" and "famille". We expect that "état" is before "famille". But String.compareTo() will return that "famille" is before "état".

class Test {
  public static void main(String args[]) {

    String s1 = "état";
    String s2 = "famille";

    // here we are expecting "é" < "f"
    if (s1.compareTo(s2) > 0) {
      if (s1.compareTo(s2) > 0) {
          // s1 lexicographically follows s2 which is not true!
          System.out.println("not ok " + s1 + " > " + s2 );
      }
    }
    /*
      output :
         not ok état > famille
    */
}
The fix is to use java.text.Collator.compareTo(). From the javadoc :
Collator.compare() compares the source string to the target string
according to the collation rules for this Collator.
Returns an integer less than, equal to or greater than zero
depending on whether the source String is less than,
equal to or greater than the target string.
import java.text.Collator;

public class Test {
    public static void main(String args[]) {

      String s1 = "état";
      String s2 = "famille";

      // Collator c = Collator.getInstance(java.util.Locale.FRANCE);
      Collator c = Collator.getInstance();
      c.setStrength(Collator.PRIMARY);
      if (c.compare(s1, s2) < 0) {
          // s2 lexicographically follows s1
          System.out.println("ok " + s1 + " < " + s2);
      }
    }
    /*
       output :
         ok état < famille
    */

  }

Equality

To compare without taking into account the presence of accentued so that "é" == "e", we use a Collator.

import java.text.Collator;
// import java.util.Locale;

public class TextTest {

  public static void main(String ... args) {
    String a = "Real";
    String b = "Réal";

    System.out.println(a + " and " + b + " equals? " +
        check(a,b));
    /*
     * output :
     *  Real and Réal equals? true
     */
  }


  public static boolean check(String a, String b) {
    // Collator c = Collator.getInstance(Locale.US);
    //
    // accent and upper/lowercase not taken into account
    Collator c = Collator.getInstance();
    c.setStrength(Collator.PRIMARY);
    return  (c.compare(a,b) == 0 );
  }
}
See this HowTo.