This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
We're actively looking for ideas on how to improve Weekly Challenges and would love to hear what you think!Submit Feedback
Different results from the original, but that seems to be a common theme. I join the resentment in working with XML as the data size snowballs without any way to put in countermeasures.
Here's my solution. I used a combination of the Python tool and traditional Alteryx tools. For this challenge, I don't think there's anything within Alteryx better than the Python tool.
The entire challenge can be done within the Python tool. Anyway, the Python tool also allows for (by using pandas and the ElementTree XML API) the easiest understanding of the problem. I believe my output file is significantly more complete than the sample output file.
from ayx import Alteryx import pandas as pd import xml.etree.ElementTree as ET def parse_xml(root: str) -> list: programs = root.findall('program') out_list = list() # Loop over programs for program in programs: program_id = program.find('programID').text orchestra = program.find('orchestra').text season = program.find('season').text # Can have multiple concerts concerts = program.findall('concertInfo') works_info = program.find('worksInfo') # Can have multiple works for work in works_info.findall('work'): # If the work is an intermission, go to the next work if work.find('interval') is not None: continue # E.g., 5733* work_id = work.attrib['ID'] # E.g., One Sweet Morning if work.find('movement') is not None: movement = work.find('movement').text else: movement = None # E.g., Bernstein, Leonard if work.find('composerName') is not None: composer = work.find('composerName').text else: composer = None # E.g., WEST SIDE STORY (WITH FILM) if work.find('workTitle') is not None: work_title = work.find('workTitle').text else: work_title = None # E.g., Newman, David if work.find('conductorName') is not None: conductor_name = work.find('conductorName').text else: conductor_name = None # Initialize as an empty string -> e.g., Duke, Cherry (Mezzo-Soprano) Gosling, Stephen (Piano) soloists = '' # Test for soloists if work.find('soloists') is not None: # Can have multiple soloists for soloist in work.find('soloists'): if soloist.find('soloistName').text != None: soloist_name = soloist.find('soloistName').text else: soloist_name = 'NONE' if soloist.find('soloistInstrument').text != None: soloist_instrument = soloist.find('soloistInstrument').text else: soloist_instrument = 'NONE' concat_soloist = soloist_name + " (" + soloist_instrument + ")" soloists += concat_soloist + ' ' # Not using soloist roles in the output # soloist_roles = soloist.find('soloistRoles').text # Trim the extra space on the right soloists = soloists.rstrip() # For each concert, add the relevant work data for concert in concerts: concert_location = concert.find('Location').text event_type = concert.find('eventType').text venue = concert.find('Venue').text concert_date = concert.find('Date').text concert_time = concert.find('Time').text out_list.append( (program_id, work_id, orchestra, season, concert_location, event_type, venue, concert_date, concert_time, conductor_name, composer, work_title, movement, soloists ) ) return out_list if __name__ == '__main__': df_out = pd.DataFrame( data=parse_xml(root=ET.fromstring(Alteryx.read('#1')['DownloadData'])), columns=['Program ID', 'Piece ID', 'Ensemble', 'Season', 'Location', 'Venue', 'Event Type', 'Performance Date', 'Time', 'Conductor', 'Composer', 'Work', 'Movement', 'Soloists', ] ) Alteryx.write(df_out, 1)
Not perfectly matching output like some others and took me a hell of a lot longer than I expected. But I'm happy with what I learned with this one
LOTs of trial and error involved with this one. I still don't quite understand the XML tree structure and why some parts need to be parsed as Roots and some as Child, when they all look like Child...
Got more results then the sample output, however my information seems perfectly valid. There are even ID#s now in the output that dont show up at all in the initial raw download, so I am guessing something changed further upstream than we can access. Or at least that is my excuse. :)