Sybase ASE Character Sets and SymmetricDS

Character Sets, Unitypes and SymmetricDS

In the world of databases, there are many different character sets that can be used. Ranging from the standard UTF-8 character set, to the ISO 8859/1 or Latin-1 character set, each of these has their own specific use and purpose. Additionally, Sybase ASE has specific types called Unitypes that always use the UTF-16 representation of the characters, regardless of the character type selected. Traditionally, this poses quite a few challenges when synchronizing data to another platform, especially when that other platform is using a different character set. Using SymmetricDS, however, we have found a way to make the transition seamless, so all Univarchar, Unichar and Unitext columns are converted and synchronized as their standard varchar, char and text counterparts, while still maintaining the integrity of the data.

Unitypes Explained

Looking into this unique column type a little bit more, it is not hard to see why it would be popular. Unitype columns are named after the Unicode character set. This is a character set designed for all the world’s different languages. This results in Unitype columns being able to store multiple different languages, allowing them to be internationally applicable. Starting with Unichar and Univarchar, they can be used anywhere that you can use char and varchar character data types, however you will not have to make any syntax changes. In addition to this, any queries that contain character literals that cannot be represented in the server’s character set are automatically changed so that they use the Unichar/Univarchar datatype. These columns are stored as 16-bit Unicode values. The Unitext column is similar. This column is stored as UTF-16 encoded values, regardless of the database’s default character set. When you go to pull data out of these Unitype columns, Sybase ASE returns a binary encoded UTF-16 Little Endian byte array. This can lead to problems when the target database that you are trying to synchronize data with is in a UTF-8 character set, and that’s where SymmetricDS can help.

Synchronizing Unitypes

When it comes to the synchronization of Unitype columns, this was quite different from previous endeavors. We had to find a way to take all of the values stored in these Unitype columns, which were using a different character set, and convert them so the UTF-8 target database could understand them. There were a couple of different approaches that could have been taken, such as having the end user create a transform to do this conversion, however we felt that this would be something better worked into the core of SymmetricDS, allowing for users to get this up and running by simply setting a parameter. By setting the sybase.ase.convert.unitypes.for.sync parameter, this automatically notifies SymmetricDS that you will be synchronizing tables that contain Unitype columns, and from there, SymmetricDS handles the rest. Once this parameter is changed, all of the Unitype columns will be read into the system as LONGVARBINARY types. This allows us to preserve the exact byte array. From here, SymmetricDS adds the string “fffe” onto the front of the byte array, as this signifies that the byte array is using Little Endian (LE) UTF-16 notation. After this, we can successfully decode the UTF-16 LE byte array and get the UTF-8 byte representation of it, and convert that into the UTF-8 representation that we need to synchronize. This will allow us to successfully synchronize the Unitype column to its corresponding type in the target table, such as Univarchar to Varchar or Unitext to Text.

Conclusions

Overall, adding the functionality to synchronize such a unique data type was a welcomed experience. This allows the SymmetricDS product to be more versatile, and allows for a larger range of synchronization possibilities. It introduced a lot of new avenues and paths for the product, such as the addition of being able to compare Unitype columns to non-Unitype columns, and correct for any differences between the two databases and columns. By adding this functionality, we believe it not only helps push SymmetricDS to be a better product, but also to be something that more people can use to help better their products or work spaces.