Extract data from cell in spreadsheet containing large amount of XML
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Good morning everyone,
There's a column in my spreadsheet containing a large chunk of XML code. Is there any way to extract the data from it? I know the XML Parse tool exists, however I noticed the formatting for my data is different. It is my first time running into any XML so I am a bit confused. The workflow attached has the data in a text input, so no need to download the excel unless you want to see the entirety of the data.
Thanks!
Solved! Go to Solution.
- Labels:
- Parse
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Had a look at the sample text field in VScode.
I manually split it into separate rows based on my best guess of hierarchy. i've highlighted a row that makes little sense to me. The color value never receives a closing tag. We then get two closing tags for tags that were never opened. It's also worth noting that there is a lot of information in this cell.
I would suggest the next steps are working out what information you want out of this cell along with what the structure of your desired output looks like. In my experience that is better than parsing for parsing sake without a plan of what you want.
If you realize you only need key parts of this - i would consider using REGEX to parse the specific pieces of information you want. E.g. You might want vehicle color and price. Regex can help you get that specific information from the string
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Lots of issues with the XML string... Start tags missing end tags and vice versa. Did you by chance alter the XML in preparation for posting here or is that string as is from the source and maybe scrubbed out more than intended?
Problem I'm seeing is on the "Color" (missing "type=" and end tag) and "FrameSize" (missing start tag) attribute.
If this is direct for the source, then yeah... you won't be able to use the XML Parse tool. To answer your question though, yes. you can extract the data from it, but will have to all be manual, in that you may need to create an expression to extract the value for each attribute. Likely some heavy REGEX, which could impact the integrity of your data should something change in the XML that your expression/design isn't expecting.
You may want notify the owner/vendor for the system the XML came from and let them know to see if they can correct.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @jonvesguerra. Did my workflow solve your issue?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello! I think your solution did very well extracting the important information considering the poor formatting of the XML. I appreciate your work!
