This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Hello, I'm still new to Alteryx. I tried to import an HTML file and actually succeeded in transforming it properly. Except that the German Umlaute (and other special characters) were transformed after I used the Text to Columns tool. Is there a way to keep the original characters? Please see the example below.
The Umlauts aren't actually dropped by the Text to Columns tool, because they are never actually in the HTM file. Html files only use standard ASCII characters. Any other characters are represented as a character code. Here's the HTM file when I open it in Chrome
The is the corresponding html(open it in notepad and search for 87074407)
The bold corresponds to the two names Görgen & Hübscher and the part in green is the code for the each of the umlaut characters ö = ö ü = ü. This the corresponding line after your multi-field tool(line 7)
87074407 Görgen & Hübscher Media und [...]
When the HTML is displayed in a browser, the browser translates #xf6; and displays "ö". Since you're reading the raw html, it's up to your workflow to do the conversion.
A simple formula tool with Replace([Data22],"ö","ö") will convert the o-umlaut giving "Görgen & Hübscher Media und"
For a bulk update, you'll want to use something like a Find Replace tool using a list of codes and their corresponding characters. You can find list of code here in the Character Code column. This list has leading 0s , "fc" is shown as "00fc" but you can drop the 00 since your htm doesn't include them.