Convert Unicode to Language Text

Question

Hello Community!

I have a CSV file that contains twitter messages that are stored in Unicode (UTF-8) format. I am attempting to convert the Unicode to the original character language. I’ve used the “convertfromcodepage” function and have entered the following expression using the multi-field formula tool in an attempt to convert the unicode to Japanese (the original language of the tweet).

ConvertFromCodePage([_CurrentField_], 20936)

An example of the twitter input data is shown in bold below.

I'm at \u6d77\u9bae\u51e6\u3044\u308f\u3044 in \u6211\u5b6b\u5b50\u5e02, \u5343\u8449\u770c https://t.co/yLOdRigcY0

Unfortunately, I received the following output after using the convertfromcodepage function…not really what I was looking for

Im At U6d77U9baeU51e6U3044U308fU3044 In U6211U5b6bU5b50U5e02 U5343U8449U770c HttpsTCoYlodrigcy0

I need the obtain the following:

I'm at 海鮮処いわい in 我孫子市, 千葉県 https://t.co/yLOdRigcY0

I've read the posting about how to bring double-byte characters (DBCs) into Alteryx  at the following link https://community.alteryx.com/t5/tkb/articleprintpage/tkb-id/knowledgebase/article-id/609

I'm new to Alteryx and believe that I am missing something here and welcome any and all feedback.

Emmanuel_G · Answer

@AdamR_AYX Thanks for your answer ! 🙂

AdamR_AYX · Answer

Hi Xena,

Sorry for the very late reply. I know you have already found a solution to your problem, but just wanted to add some additional details for any future users who come across the same issue.

The actually function calls that you needed here were

CharFromInt(HexToNumber())

But with the added complication that these work only on a single character. That is to say

CharFromInt(HexToNumber(6d77)) = 海

To apply it to the whole string we can use a RegEx parse tool and then a replace tool to substitute the Unicode characters back into the original string.

I have attached an example workflow.

Adam

ConvertFromEscapedUnicode.yxmd