Hi Everyone,
I am trying to understand how best to create a workflow to flatten a hierarchical data structure. I initially thought cross-tab and a join would work but I think it needs more, Any hints or suggestions would be much appreciated. I have attached a file with sample input and desire output.
Many thanks,
Arnab
Solved! Go to Solution.
Hello @ardutta ,
I looked at this and I thought it is a good scenario. When I looked at the data, there is one thing missing. The hierarchy link. There is nothing in there to identify what element belong to what level or item. That is the issue. If you just want the records to appear the way you in your output with no logic, that becomes easier.
The question here is this, How do you know the branches and which belong to which?
Please let us know.
Hi @ImadZidan, Thanks for your reply.
The sequence of records and the ID value defines the hierarchy link. For example, all [ID-n] valued subsequent rows are related to a [ID] value and when it encounters a row with same [ID] or [ID+x] value, it marks the beginining of a new relationship set and the pattern continues. Hence I solved it using multi-row operations so that I can represent this relationship in a format that can be used for table operations.
Thanks,
Arnab
Hello @ardutta ,
I think I understand.
Please see attached. Let me know if you need clarification.
At least this will give you an idea.
Thank you. I will go through your solution and let you know if I have any questions.
Hi @ImadZidan ,
Sorry for the late reply. I tested your solution and it worked for the sample file so accepted it as the solution. However, I will need to make changes to it to work for my real scenario, specifically because the min and max would not work as all sets do not have the same min value. But I liked the thought process in your solution 🙂 so will try to extend upon it.
Thank you once again
Thank you @ardutta ,
Please feel free to involve me if you want to brainstorm. I will be happy to help further even with the real scenario.
Hi @ImadZidan , Just wanted to share that I have developed a Python script in my real world scenario. The code to flatten is below.
Your suggested solution approach or the initial multi-row approach would also work with some tweaking but thought I'd share this in the post in case it helps someone in the future. This can also be replicated using a batch macro but I am still new to macros in Alteryx so haven't it tried it. I will give it a try to help my understanding of macros and upload the solution once ready.
Thanks
df = Alteryx.read("#1")
df2 = pd.DataFrame()
df.head
#################################
for index, row in df.iterrows():
if row["PROJECT_Level"] == 1:
Level1 = row["PROJECT_Name"]
elif row["PROJECT_Level"] == 2:
Level2 = row["PROJECT_Name"]
elif row["PROJECT_Level"] == 3:
Level3 = row["PROJECT_Name"]
elif row["PROJECT_Level"] == 4:
Level4 = row["PROJECT_Name"]
elif row["PROJECT_Level"] == 5:
df2.at[index, "PROJECT_Code"] = row["PROJECT_Code"]
df2.at[index, "PROJECT_Name"] = row["PROJECT_Name"]
df2.at[index, "Level 1"] = Level1
df2.at[index, "Level 2"] = Level2
df2.at[index, "Level 3"] = Level3
df2.at[index, "Level 4"] = Level4
df2.at[index, "2019_Capex_Total"] = row["2019_CAPEX_TOTAL"]
df2.at[index, "2019_Opex_Total"] = row["2019_OPEX_TOTAL"]
Alteryx.write(df2, 1)