Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Text to Columns tool modifies special characters (German Umlaute)

Hanspeter
7 - Meteor

Hello, I'm still new to Alteryx. I tried to import an HTML file and actually succeeded in transforming it properly. Except that the German Umlaute (and other special characters) were transformed after I used the Text to Columns tool. Is there a way to keep the original characters? Please see the example below.

3 REPLIES 3
danilang
19 - Altair
19 - Altair

Hi @Hanspeter 

 

The Umlauts aren't actually dropped by the Text to Columns tool, because they are never actually in the HTM file.  Html files only use standard ASCII characters.  Any other characters are represented as a character code.   Here's the HTM file when I open it in Chrome

 

Web.png

 

 

The is the corresponding html(open it in notepad and search for 87074407) 

 

<tr>

   <td >

       <font face="courier new" size="2" color=#014c7f>

            <nobr id=l0014003>87074407&nbsp;&nbsp;</nobr>

       </font>

   </td>

   <td style= background:#E8EAD8 >

        <font face="courier new" size="2" color=#0273bc>

          <nobr id=l0014014>G&#xf6;rgen&nbsp;&amp;&nbsp;H&#xfc;bscher&nbsp;Media&nbsp;und&nbsp;&nbsp;&nbsp;</nobr>

 

The bold corresponds to the two names Görgen & Hübscher and the part in green is the code for the each of the umlaut characters &#xf6; = ö &#xfc; = ü.   This the corresponding line after your multi-field tool(line 7)

 

87074407 G&#xf6;rgen &amp; H&#xfc;bscher Media und &#x5b;...&#x5d;

 

When the HTML is displayed in a browser, the browser translates #xf6; and displays "ö".  Since you're reading the raw html, it's up to your workflow to do the conversion.

 

A simple formula tool with Replace([Data22],"&#xf6;","ö") will convert the o-umlaut giving "Görgen &amp; H&#xfc;bscher Media und

 

For a bulk update, you'll want to use something like a Find Replace tool using a list of codes and their corresponding characters.  You can find list of code here in the Character Code column.  This list has leading 0s , "fc" is shown as "00fc"  but you can drop the 00 since your htm doesn't include them.

 

Dan

 

Hanspeter
7 - Meteor

Thanks, Dan. Great explanations. Now I understand much more about the HTM file and the characters there!! 

snahta
5 - Atom

Hi,

I am also having the same issue with German character and now i got the answer.

Labels