Unrecognized Chinese Character (IBM DB2)
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
I am working on one task in which I need to fetch the data from the IBM DB2 table and in some of the columns there is a Chinese character available with the English character.
While fetching the character the English character is easily readable but for the Chinese character, some unrecognized characters are showing.
I need to store both the values in the database but on an Alteryx designer side, I am getting the unrecognized character for the Chinese character.
Kindly help me out with this issue and if possible then send a sample workflow as well.
Thanks & Regards,
Jim Hsieh
- Labels:
- Text Mining
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello @Jim0620,
While I don't have any experience connecting to an IBM DB2 database using Alteryx, could you please tell me if the Input Data tool provides you with a Code Page selection? A screenshot of the Input Tool configuration pane would be enough, although please make sure to redact any confidential information present such as the connection name, however please do not redact the name of the parameters (i.e. Record Limit, File Format, Connection Name etc.) You can drag down the Preview pane to show the entirety of Configuration pane in a single screenshot.
If there is indeed a Code Page option, then perhaps the Interface tools in the sample workflow I am going to provide via the post link may be of help, provided the necessary Code Page option is added from this page and it is a Code Page value that Alteryx Designer actually supports.
This is the post related to extending the Code Page options present in Alteryx Designer, please read it to see if it can help solve your problem:
Is there any way to read traditional Chinese Characters?
Additionally, after replying to your post, I noticed that on the right pane there is a link to a Knowledge Base article mentioning Display issues when reading data with Cyrillic characters from DB2 and it suggests that the column is cast as GRAPHIC datatype to make it readable in Alteryx Designer, hopefully it might solve the problem you are having.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello again @Jim0620,
Were you able to figure this out or did the solution above help you? I would like to know if this helped solve your problem.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Neo,
I still can't fix Chinese words issue, please check attached file.
the ODBC setting user ID code page is *sysval, when I use QV loading the same file data is correct.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @Jim0620,
In your query, could you change the query:
select CCUST, CNME from BPCSF6.RCM where CCUST in (624000)
to this:
select CCUST, Cast(CNME as GRAPHIC(32)) as New_CNME from BPCSF6.RCM where CCUST in (624000)
And we can see if the instructions in the knowledge base works for this scenario. I wrote it as GRAPHIC(32) but I'm assuming it can be increased and 32 was written as an example in the KB article.
If this doesn't work, I could recommend changing the ODBC settings to force W_String / Unicode support but I don't know if that is possible with IBM DB2 unfortunately.
You could also try all these recommendations while having AMP Engine turned off first and if you see that your problem is solved, you can turn on AMP Engine and run the workflow again to verify that AMP Engine causes no issues regarding the encoding and you can then keep using it.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Neo,
1. The new SQL script is incorrect.
2. IBM ODBC setting does not support "W_String "
3. The Traditional Chinese can fixed, I use the function ConvertFromCodePage([CNME], 950), the Traditional Chinese OK
4. The Alteryx code page for Simplified Chinese is 936, or I use code page 65001, Simplified Chinese characters cannot be displayed correctly
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
To understand better (since the query you posted doesn't match the query I posted as it includes additional column definitions):
Your current query is:
select CCUST,CNME as ORI_CNME, trim(CAST(CNME as Char(30) CCSID 935)) as CNME, Cast(CNME as GRAPHIC(32)) as New_CNME from BPCSF6.RCML01 where CCOMP =88
and it contains a new part that uses CCSID setting, which I am assuming you added to change the codepage of the extracted column. Upon reading the error message, I understand that the GRAPHIC(32) method I provided is what resulted in an error. If you change this query to the following:
select CCUST, CNME as ORI_CNME, Trim(CAST(CNME as Char(30) CCSID 935)) as CNME from BPCSF6.RCML01 where CCOMP = 88
Does it work without errors? If it works without errors, do you only need to convert the codepage of the new CNME column you obtained from the query to 950? If that solves your problem , I would say that you can keep using this method as the GRAPHIC method doesn't seem to be working as expected for Traditional Chinese characters, which might be due to the encoding that is used in the database.
Additional question:
Does the following change in query output exactly the character set you want (by changing the CCSID value from 935 to 950)? If so, you wouldn't need the ConvertFromCodePage formula in Alteryx:
select CCUST, CNME as ORI_CNME, Trim(CAST(CNME as Char(30) CCSID 950)) as CNME from BPCSF6.RCML01 where CCOMP = 88
