Hello community,
I have an XML that I need to parse (can't share due to NDA) and the file contains all the escaped characters:
" |
' |
< |
> |
& |
I could not get the XML parse tool to work with this XML. I found a solution, but I feel it is very ugly and also it increases the running time a lot. I used the Find Replace tool with the above fields to be replaced by respectively <, &, >, " and ,. Following it with the XML parse tool then it worked.
Then by googling the community I found out about the EscapeXMLMetacharacters(String) formula: https://help.alteryx.com/10.1/index.htm#Reference/Functions.htm#Specialized
But I need the exact opposite of this formula? Does this also exist?
Or more in general, what is the best way to parse XMLs in Alteryx that still have all those weird characters?
<expedicion>123456</expedicion>
<referencia>123456</referencia>
<fecha>09/09/2020</fecha>
<tipo>ABCDEFG</tipo>
<descripcion>ABCDEFG</descripcion>
....
Thank you.
Solved! Go to Solution.
No the function doesn't exist but nice idea will add it as an extension!
I would suggest doing:
First extract all the entities then group them to make a dynamic find and replace table
The function I used to reverse an entity to plain character was:
if LowerCase(left([Raw],3)) = '&#x' then
CharFromInt(HexToNumber(Substring([Raw],3,Length([Raw])-4)))
elseif LowerCase(left([Raw],2)) = '&#' then
CharFromInt(Substring([Raw],2,Length([Raw])-3))
else
switch(LowerCase([Raw]),
[Raw],
'>','>',
'<','<',
'"', '"',
'&', '&',
''', "'")
endif
Quick sample attached
Thats a pretty neat solution as well thanks I will ook into it.
I currently solved it as this as described in the main post.
Thanks! And who knows it will be a function in a later Alteryx version.
I encounter same issue today and get more special chars beyond standard 5 char list under
https://help.alteryx.com/current/designer/specialized-functions
eg:
23ª Série - 5ª Emissão de CRI(Subordinada)
Such special char can be find under , they are decimal value of unicode.
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
I notice load such file with input tool as XML works perfect under alteryx. However other ETL tool can't convert them and can only load as it is from raw data. so i create a Dummy XML Data section and use XML Parse tool to get it's actual value. It works and of cause can handle the 5 standard escape chars.
Find replace solution should work for standard escape chars (note find replace have to run multiple times and may slow down the process). haven't try the XML Parse solution on large volume data.
Actually a very good solution I think it would also be slightly quicker to run.