Input a directory of XML files and output as XLSX
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Community,
I have a directory, with sub-directories, of XML files. I am trying to convert the XML files to Excel XLSX format.
I am using the Directory tool, *.xml, and include sub-directories option. I've also successfully managed to get the file name and path to write the files to. I know my final tool needs to be the Output data tool to save as .XLSX, but I'm not sure what I need to have in the middle to do the actual processing/conversion.
Any help would be appreciated.
Thanks,
--sue
Solved! Go to Solution.
- Labels:
- Developer Tools
- Dynamic Processing
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The steps in between would be the transformations to convert the XML into tabular format that can be stored in an Excel.
Can you please share a sample XML and the expected excel.
Best,
Jagdeesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @JagdeeshN,
Thanks for your assistance. I've attached a sample Excel output. Alteryx will not let me upload XML, even if I change the file type to TXT. Do you have any suggestions on how I can get the XML to you? Below is a literal copy/paste. Sorry.
This is a dead easy thing to do in Excel. All you have to do is open the XML file in Excel, select the 'As an XML table' option, then OK and, voila, lovely little table. The unfortunate part is that I have hundreds of these and it is very repetitive, so any recommendations you have would be greatly appreciated.
<?xml version='1.0'?>
<Start>
<NCSR9053_HDR>
<UserID>JDOE</UserID><RunControl>2020F-2021W</RunControl><Institution>NCAAT</Institution><AcadCareer>PSEC</AcadCareer><TermFrom>1204</TermFrom><TermTo>1211</TermTo><AcadProgram>0801</AcadProgram><ProgName>General Arts & Science WC</ProgName><AcadPlan>P0801</AcadPlan><ACADEMIC_LOAD>F</ACADEMIC_LOAD><PlanName>General Arts & Science - WC</PlanName><AcadLevel>01</AcadLevel><CoopLevel>N</CoopLevel><MTCUCode>54701</MTCUCode><APSCode>01059</APSCode><RESIDENCY>INTL</RESIDENCY><CountContinue>12</CountContinue><CountSwitch>1</CountSwitch><CountLeave>5</CountLeave><CountOther>0</CountOther><CountTotal>18</CountTotal><CountFailedCourse>0</CountFailedCourse><CountCohort>18</CountCohort><CountNonCohort>0</CountNonCohort><Degree>ONT DIPL</Degree><AcadOrg>8615</AcadOrg><OrgName>School of Academic Studies</OrgName><DeanName>John Smith</DeanName><DivisionName>Jedi College</DivisionName><KeyDateFrom>09/22/2020</KeyDateFrom><KeyDateTo>02/05/2021</KeyDateTo><KDFrom>10TH</KDFrom>
</NCSR9053_HDR>
<NCSR9053_HDR>
<UserID>JDOE</UserID><RunControl>2020F-2021W</RunControl><Institution>NCAAT</Institution><AcadCareer>PSEC</AcadCareer><TermFrom>1204</TermFrom><TermTo>1211</TermTo><AcadProgram>0801</AcadProgram><ProgName>General Arts & Science WC</ProgName><AcadPlan>P0801</AcadPlan><ACADEMIC_LOAD>F</ACADEMIC_LOAD><PlanName>General Arts & Science - WC</PlanName><AcadLevel>03</AcadLevel><CoopLevel>N</CoopLevel><MTCUCode>54701</MTCUCode><APSCode>01059</APSCode><RESIDENCY>INTL</RESIDENCY><CountContinue>27</CountContinue><CountSwitch>0</CountSwitch><CountLeave>1</CountLeave><CountOther>0</CountOther><CountTotal>28</CountTotal><CountFailedCourse>0</CountFailedCourse><CountCohort>0</CountCohort><CountNonCohort>28</CountNonCohort><Degree>ONT DIPL</Degree><AcadOrg>8615</AcadOrg><OrgName>School of Academic Studies</OrgName><DeanName>John Smith</DeanName><DivisionName>Jedi College</DivisionName><KeyDateFrom>09/22/2020</KeyDateFrom><KeyDateTo>02/05/2021</KeyDateTo><KDFrom>10TH</KDFrom>
</NCSR9053_HDR>
<NCSR9053_HDR>
<UserID>JDOE</UserID><RunControl>2020F-2021W</RunControl><Institution>NCAAT</Institution><AcadCareer>PSEC</AcadCareer><TermFrom>1204</TermFrom><TermTo>1211</TermTo><AcadProgram>0801</AcadProgram><ProgName>General Arts & Science WC</ProgName><AcadPlan>P0801</AcadPlan><ACADEMIC_LOAD>F</ACADEMIC_LOAD><PlanName>General Arts & Science - WC</PlanName><AcadLevel>04</AcadLevel><CoopLevel>N</CoopLevel><MTCUCode>54701</MTCUCode><APSCode>01059</APSCode><RESIDENCY>INTL</RESIDENCY><CountContinue>2</CountContinue><CountSwitch>1</CountSwitch><CountLeave>12</CountLeave><CountOther>0</CountOther><CountTotal>15</CountTotal><CountFailedCourse>0</CountFailedCourse><CountCohort>0</CountCohort><CountNonCohort>15</CountNonCohort><Degree>ONT DIPL</Degree><AcadOrg>8615</AcadOrg><OrgName>School of Academic Studies</OrgName><DeanName>John Smith</DeanName><DivisionName>Jedi College</DivisionName><KeyDateFrom>09/22/2020</KeyDateFrom><KeyDateTo>02/05/2021</KeyDateTo><KDFrom>10TH</KDFrom>
</NCSR9053_HDR>
<NCSR9053_HDR>
<UserID>JDOE</UserID><RunControl>2020F-2021W</RunControl><Institution>NCAAT</Institution><AcadCareer>PSEC</AcadCareer><TermFrom>1204</TermFrom><TermTo>1211</TermTo><AcadProgram>0850</AcadProgram><ProgName>Pre-Health Sci Pathway CertDip</ProgName><AcadPlan>P0850</AcadPlan><ACADEMIC_LOAD>F</ACADEMIC_LOAD><PlanName>Pre-Health Sci Pathway CertDip</PlanName><AcadLevel>01</AcadLevel><CoopLevel>N</CoopLevel><MTCUCode>41598</MTCUCode><APSCode>01239</APSCode><RESIDENCY>OUTPR</RESIDENCY><CountContinue>2</CountContinue><CountSwitch>0</CountSwitch><CountLeave>0</CountLeave><CountOther>0</CountOther><CountTotal>2</CountTotal><CountFailedCourse>0</CountFailedCourse><CountCohort>1</CountCohort><CountNonCohort>1</CountNonCohort><Degree>ONT CERT</Degree><AcadOrg>8615</AcadOrg><OrgName>School of Academic Studies</OrgName><DeanName>John Smith</DeanName><DivisionName>Jedi College</DivisionName><KeyDateFrom>09/22/2020</KeyDateFrom><KeyDateTo>02/05/2021</KeyDateTo><KDFrom>10TH</KDFrom>
</NCSR9053_HDR>
</Start>
Thank you,
--sue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @YeahMan ,
If you read all these files as an xml in the input too, it automatically shows up as a table. SImilar to what you described in excel.
Please find attached a sample workflow.
Do let me know if this helps.
Best,
Jagdeesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @JagdeeshN,
Thank you. Yes, I was able to get this far with a single XML file, but how do I do this using the Directory tool? I want to loop through a directory and the subsequent sub-directories, input the XML files and output as XLSX using the same filename and location?
E.g.,
C:\XML Directory\Sample.XML would be saved to C:\XML Directory\Sample.XLSX,
C:\XML Directory\sub-directory\Sample2.XML would be saved to C:\XML Directory\sub-directory\Sample2.XLSX, etc.
And, just to keep things interesting, there are 4 different XML configurations, so Sample.XML & Sample2.XML could have a different structure.
Thanks,
--sue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Please find attached a sample workflow that uses the directory tool to read multiple xml files and convert them into a tabular format.
Do let me know if this helped.
Best,
Jagdeesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @JagdeeshN,
You are helping me get closer to a solution! Based on what I have learned so far (from your last sample), I have to set up 4 different templates, one for each XML file. Ok, this is not horrible and I can do that, but I still am not able to use the input file name and input directory to save the output as Excel. I can use the filename in the Filename column created in the output of the Dynamic Input tool, but I no longer have the Directory information available. I've tried changing the Action parameter, but no luck.
Thanks again,
--sue
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Please refer to the attached sample.
In this workflow, the output from the join tool will give you the full path, directory within the workflow. You can use these values to create the output URL/Path/Name which can then be used within the output tool creating excels.
Do let me know if this helps.
Best,
Jagdeesh
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @JagdeeshN
Thank you for all of your assistance!
I learned about the:
- Directory tool (and how to use the file information the tool provides. I would never have thought of joining this information as columns to the rest of the data. I tried to create a variable, but couldn't get that to work)
- The Text to Columns tool
- Using a column as a path & filename (don't forget to put the '|||Sheet1' as part of the filename AND fill in a dummy value in the 'Write to File or Database' parameter, e.g., C:\temp\dummy.xlsx|||sheet1)
--sue
