It provides a facility, called autodetect, to detect the encoding of a given string. This is useful for applications in which a client might send strings in an unknown format, a common issue for Internet applications.
int change_mapping(mapping table[],size_t table_sz);You can modify the mapping behavior of an existing instance of os_str_conv (whether heap- or stack-allocated) by calling os_str_conv::change_mapping(). Override information is stored for future conversion services associated with that instance.
The override mapping information applies to whatever explicit mapping has been established for the given os_str_conv instance. Mappings of os_str_conv instances cannot be overridden by instances using autodetect. Attempts to do so will return -1 from change_mapping() to indicate this error condition.
The change_mapping() method takes the following two parameters:
mapping_table[]An array of mapping code pairs that can be allocated locally, globally, or on the heap. If the array is heap-allocated, the user must delete it after calling change_mapping().
Internally, change_mapping() makes a sorted copy of mapping_table[]. The sorting provides quick lookup at run time. The internal copy is freed when the os_str_conv destructor is eventually called.
Note that the mapping pairs are unsigned 32-bit quantities. The LSB (least significant bit) is on the right, so, for example, the single-byte character 0x5C is represented as 0x0000005C, and the two-byte code 0x81,0x54 is 0x0000815F.
size_t table_szThe number of elements in the mapping_table. Be sure that this is not the number of bytes in the array.
There are three overloadings to the os_str_conv::convert() method to provide flexibility for dealing with Unicode strings with different byte-ordering schemes. If a parameter is of char* type, all 16-bit quantities are considered big-endian, regardless of platform. However, if the type is os_unsigned_int16*, the values assigned or read are handled according to the platform architecture.
encode_type convert(char* dest, const char* src);If either dest or src is a buffer containing Unicode characters, these 16-bit characters are considered big-endian, regardless of platform architecture.
encode_type convert(os_unsigned_int16* dest, const char* src);This overloading interprets 16-bit Unicode buffer dest according to the byte order of the processor used.
encode_type convert(char* dest, const os_unsigned_int16* src);This overloading interprets 16-bit Unicode buffer src according to the byte order of the processor used.
virtual size_t get_converted_size(const char* src) const;Returns the size of the buffer, in units of bytes, required to contain the converted result of the given src string. If src is a Unicode string, its 16-bit characters are considered big-endian, regardless of platform architecture.
Because the entire source string must be examined, the time it takes for this function to complete is proportional to the length of the source string.
If the autodetect mode is used and autodetect fails to determine the encoding of src, get_converted_size() returns 0.
virtual size_t get_converted_size( const os_unsigned_int16* src ) const;Returns the size of the buffer, in units of bytes, required to contain the converted result of the given src string. If src is a Unicode string, its 16-bit characters are interpreted according to the byte order of the processor used.
Because the entire source string must be examined, the time it takes for this function to complete is proportional to the length of the source string.
If the autodetect mode is used and autodetect fails to determine the encoding of src, get_converted_size() returns 0.
os_str_conv( encode_type_enum dest, encode_type_enum src=AUTOMATIC );Instantiates a conversion path.
encode_type_enum can be one of
Updated: 03/31/98 17:25:09