Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

XML Parse containing 'escaped version characters'

Pingu
10 - Fireball

Hello community,

 

I have an XML that I need to parse (can't share due to NDA) and the file contains all the escaped characters:

 

"

'

<

>

&

 

I could not get the XML parse tool to work with this XML. I found a solution, but I feel it is very ugly and also it increases the running time a lot. I used the Find Replace tool with the above fields to be replaced by respectively <, &, >, " and ,. Following it with the XML parse tool then it worked.

 

Then by googling the community I found out about the EscapeXMLMetacharacters(String) formula: https://help.alteryx.com/10.1/index.htm#Reference/Functions.htm#Specialized

 

But I need the exact opposite of this formula? Does this also exist?

 

Or more in general, what is the best way to parse XMLs in Alteryx that still have all those weird characters?

 

&lt;expedicion&gt;123456&lt;/expedicion&gt;&#13;
&lt;referencia&gt;123456&lt;/referencia&gt;&#13;
&lt;fecha&gt;09/09/2020&lt;/fecha&gt;&#13;
&lt;tipo&gt;ABCDEFG&lt;/tipo&gt;&#13;
&lt;descripcion&gt;ABCDEFG&lt;/descripcion&gt;&#13;

....

 

Thank you.

4 REPLIES 4
jdunkerley79
ACE Emeritus
ACE Emeritus

No the function doesn't exist but nice idea will add it as an extension!

 

I would suggest doing:

jdunkerley79_0-1599645372315.png

First extract all the entities then group them to make a dynamic find and replace table

 

The function I used to reverse an entity to plain character was:

if LowerCase(left([Raw],3)) = '&#x' then
	CharFromInt(HexToNumber(Substring([Raw],3,Length([Raw])-4)))
elseif LowerCase(left([Raw],2)) = '&#' then
	CharFromInt(Substring([Raw],2,Length([Raw])-3))
else
	switch(LowerCase([Raw]),
           [Raw],
		   '&gt;','>',
		   '&lt;','<',
           '&quot;', '"',
           '&amp;', '&',
           '&apos;', "'")
endif

 

Quick sample attached

Pingu
10 - Fireball

Thats a pretty neat solution as well thanks I will ook into it.

 

I currently solved it as this as described in the main post. 

Pingu_0-1599654136245.png

 

Thanks! And who knows it will be a function in a later Alteryx version.

Yongcan
8 - Asteroid

I encounter same issue today and get more special chars beyond standard 5 char list under 

https://help.alteryx.com/current/designer/specialized-functions

 

 

eg: 

23&#170; S&#233;rie - 5&#170; Emiss&#227;o de CRI(Subordinada)

 

Such special char can be find under , they are decimal value of unicode.

https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references

 

I notice load such file with input tool as XML works perfect under alteryx. However other ETL tool can't convert them and can only load as it is from raw data. so i create a Dummy XML Data section and use XML Parse tool to get it's actual value. It works and of cause can handle the 5 standard escape chars.

 

Find replace solution should work for standard escape chars (note find replace have to run multiple times and may slow down the process). haven't try the XML Parse solution on large volume data.

XML Parse Tool solution.png

 

 

Pingu
10 - Fireball

Actually a very good solution I think it would also be slightly quicker to run.

Labels
Top Solution Authors