Class SimpleUCharReplacer

  • All Implemented Interfaces:
    Serializable, Cloneable, Map<Integer,​String>
    Direct Known Subclasses:
    UCharFilter

    public class SimpleUCharReplacer
    extends HashMap<Integer,​String>

    Provides substitution for unicode characters with replacement strings.

    This is a much simplified version of UCharReplacer by Markus Gylling from the org.daisy.util package.

    The use of this class may result in a change in unicode character composition between input and output. If you need a certain normalization form, normalize after the use of this class.

    Usage example:

     SimpleCharReplacer ucr = new SimpleCharReplacer();
     ucr.addSubstitutionTable(fileURL);
     ucr.addSubstitutionTable(fileURL2);
     String ret = ucr.replace(input);
     

    The translation table file is using the same xml format as that of java.util.Properties [1][2], using the HEX representation (without the characteristic 0x-prefix!) of a unicode character as the key attribute and the replacement string as value of the entry element.

    If the key attribute contains exactly one unicode codepoint (one character) it will be treated literally. It will not be interpreted as a HEX representation of another character, even if theoretically possible. E.g. if the key is "a", it will be treated as 0x0061 rather than as 0x000a

    Note - there is a significant difference between a unicode codepoint (32 bit int) and a UTF16 codeunit (=char) - a codepoint consists of one or two codeunits.

    To make sure an int represents a codepoint and not a codeunit, use for example com.ibm.icu.text.Normalizer to NFC compose, followed by com.ibm.icu.text.UCharacterIterator to retrieve possibly non-BMP codepoints from a string.

    • see [1] http://java.sun.com/j2se/1.5.0/docs/api/java/util/Properties.html
    • see [2] http://java.sun.com/dtd/properties.dtd
    See Also:
    Serialized Form
    • Constructor Detail

      • SimpleUCharReplacer

        public SimpleUCharReplacer()
        Creates a new instance.
    • Method Detail

      • addSubstitutionTable

        public void addSubstitutionTable​(URL table)
                                  throws IOException
        Adds a substitution table to this instance. See the class description for the format.
        Parameters:
        table - the url to the substitution table.
        Throws:
        IOException - if the table could not be added
      • replace

        public CharSequence replace​(String input)
        Replaces characters in the input according to this object's current configuration.
        Parameters:
        input - the input
        Returns:
        returns a modified string