Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Text to Columns tool modifies special characters (German Umlaute)

6 - Meteoroid

Hello, I'm still new to Alteryx. I tried to import an HTML file and actually succeeded in transforming it properly. Except that the German Umlaute (and other special characters) were transformed after I used the Text to Columns tool. Is there a way to keep the original characters? Please see the example below.

17 - Castor
17 - Castor

Hi @Hanspeter 


The Umlauts aren't actually dropped by the Text to Columns tool, because they are never actually in the HTM file.  Html files only use standard ASCII characters.  Any other characters are represented as a character code.   Here's the HTM file when I open it in Chrome





The is the corresponding html(open it in notepad and search for 87074407) 



   <td >

       <font face="courier new" size="2" color=#014c7f>

            <nobr id=l0014003>87074407&nbsp;&nbsp;</nobr>



   <td style= background:#E8EAD8 >

        <font face="courier new" size="2" color=#0273bc>

          <nobr id=l0014014>G&#xf6;rgen&nbsp;&amp;&nbsp;H&#xfc;bscher&nbsp;Media&nbsp;und&nbsp;&nbsp;&nbsp;</nobr>


The bold corresponds to the two names Görgen & Hübscher and the part in green is the code for the each of the umlaut characters &#xf6; = ö &#xfc; = ü.   This the corresponding line after your multi-field tool(line 7)


87074407 G&#xf6;rgen &amp; H&#xfc;bscher Media und &#x5b;...&#x5d;


When the HTML is displayed in a browser, the browser translates #xf6; and displays "ö".  Since you're reading the raw html, it's up to your workflow to do the conversion.


A simple formula tool with Replace([Data22],"&#xf6;","ö") will convert the o-umlaut giving "Görgen &amp; H&#xfc;bscher Media und


For a bulk update, you'll want to use something like a Find Replace tool using a list of codes and their corresponding characters.  You can find list of code here in the Character Code column.  This list has leading 0s , "fc" is shown as "00fc"  but you can drop the 00 since your htm doesn't include them.




6 - Meteoroid

Thanks, Dan. Great explanations. Now I understand much more about the HTM file and the characters there!!