Class SimpleUCharReplacer
- java.lang.Object
-
- java.util.AbstractMap<K,V>
-
- java.util.HashMap<Integer,String>
-
- org.daisy.dotify.common.text.SimpleUCharReplacer
-
- All Implemented Interfaces:
Serializable
,Cloneable
,Map<Integer,String>
- Direct Known Subclasses:
UCharFilter
public class SimpleUCharReplacer extends HashMap<Integer,String>
Provides substitution for unicode characters with replacement strings.
This is a much simplified version of UCharReplacer by Markus Gylling from the org.daisy.util package.
The use of this class may result in a change in unicode character composition between input and output. If you need a certain normalization form, normalize after the use of this class.
Usage example:
SimpleCharReplacer ucr = new SimpleCharReplacer(); ucr.addSubstitutionTable(fileURL); ucr.addSubstitutionTable(fileURL2); String ret = ucr.replace(input);
The translation table file is using the same xml format as that of java.util.Properties [1][2], using the HEX representation (without the characteristic 0x-prefix!) of a unicode character as the
key
attribute and the replacement string as value of theentry
element.If the
key
attribute contains exactly one unicode codepoint (one character) it will be treated literally. It will not be interpreted as a HEX representation of another character, even if theoretically possible. E.g. if thekey
is "a", it will be treated as 0x0061 rather than as 0x000aNote - there is a significant difference between a unicode codepoint (32 bit int) and a UTF16 codeunit (=char) - a codepoint consists of one or two codeunits.
To make sure an int represents a codepoint and not a codeunit, use for example
com.ibm.icu.text.Normalizer
to NFC compose, followed bycom.ibm.icu.text.UCharacterIterator
to retrieve possibly non-BMP codepoints from a string.- see [1] http://java.sun.com/j2se/1.5.0/docs/api/java/util/Properties.html
- see [2] http://java.sun.com/dtd/properties.dtd
- See Also:
- Serialized Form
-
-
Nested Class Summary
-
Nested classes/interfaces inherited from class java.util.AbstractMap
AbstractMap.SimpleEntry<K extends Object,V extends Object>, AbstractMap.SimpleImmutableEntry<K extends Object,V extends Object>
-
-
Constructor Summary
Constructors Constructor Description SimpleUCharReplacer()
Creates a new instance.
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description void
addSubstitutionTable(URL table)
Adds a substitution table to this instance.CharSequence
replace(String input)
Replaces characters in the input according to this object's current configuration.-
Methods inherited from class java.util.HashMap
clear, clone, compute, computeIfAbsent, computeIfPresent, containsKey, containsValue, entrySet, forEach, get, getOrDefault, isEmpty, keySet, merge, put, putAll, putIfAbsent, remove, remove, replace, replace, replaceAll, size, values
-
Methods inherited from class java.util.AbstractMap
equals, hashCode, toString
-
-
-
-
Method Detail
-
addSubstitutionTable
public void addSubstitutionTable(URL table) throws IOException
Adds a substitution table to this instance. See the class description for the format.- Parameters:
table
- the url to the substitution table.- Throws:
IOException
- if the table could not be added
-
replace
public CharSequence replace(String input)
Replaces characters in the input according to this object's current configuration.- Parameters:
input
- the input- Returns:
- returns a modified string
-
-