Hello, I'm still new to Alteryx. I tried to import an HTML file and actually succeeded in transforming it properly. Except that the German Umlaute (and other special characters) were transformed after I used the Text to Columns tool. Is there a way to keep the original characters? Please see the example below.
Solved! Go to Solution.
Hi @Hanspeter
The Umlauts aren't actually dropped by the Text to Columns tool, because they are never actually in the HTM file. Html files only use standard ASCII characters. Any other characters are represented as a character code. Here's the HTM file when I open it in Chrome
The is the corresponding html(open it in notepad and search for 87074407)
<tr>
<td >
<font face="courier new" size="2" color=#014c7f>
<nobr id=l0014003>87074407 </nobr>
</font>
</td>
<td style= background:#E8EAD8 >
<font face="courier new" size="2" color=#0273bc>
<nobr id=l0014014>Görgen & Hübscher Media und </nobr>
The bold corresponds to the two names Görgen & Hübscher and the part in green is the code for the each of the umlaut characters ö = ö ü = ü. This the corresponding line after your multi-field tool(line 7)
87074407 Görgen & Hübscher Media und [...]
When the HTML is displayed in a browser, the browser translates #xf6; and displays "ö". Since you're reading the raw html, it's up to your workflow to do the conversion.
A simple formula tool with Replace([Data22],"ö","ö") will convert the o-umlaut giving "Görgen & Hübscher Media und"
For a bulk update, you'll want to use something like a Find Replace tool using a list of codes and their corresponding characters. You can find list of code here in the Character Code column. This list has leading 0s , "fc" is shown as "00fc" but you can drop the 00 since your htm doesn't include them.
Dan
Thanks, Dan. Great explanations. Now I understand much more about the HTM file and the characters there!!
Hi,
I am also having the same issue with German character and now i got the answer.