Inspire EMEA 2022 On-Demand is live! Watch now, and be sure to save the date for Inspire 2023 in Las Vegas next May.

Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer and Intelligence Suite.
SOLVED

Hierarchical Tree issue

Emil_Kos
17 - Castor
17 - Castor

Hi,

 

I am searching for inspiration. I received a really strange file with a relationship between one and more companies. 

 

I am a data in this structure:

 

└─┬─Company A
  ├───Company AB
  ├───Company AC
  ├───Company AD
  ├───Company AE
  ├───Company AF
  ├───Company AG
  ├─┬─Company B
  │ └───Company BA
  ├───Company CA
  ├───Company CB
  ├─┬─Company DE
  │ └───Company DF

 

From this I need to prepare something in this format:

 

Parent Child
N/A Company A
Company A Company AB
Company A Company AC
Company A Company AD
Company A Company AE
Company A Company AF
Company A Company AG
Company A Company B
Company B Company BA
Company A Company CA
Company A Company CB
Company A Company DE
Company DE Company DF

 

Later it becomes even more complicated. I removed companies to name but it should give an idea: 

 

Emil_Kos_0-1615973694508.png

 

Do you remember there was a similar issue in the community which I could check in order to find some kind of inspiration for this one? 

 

9 REPLIES 9
danilang
18 - Pollux
18 - Pollux

Hi @Emil_Kos 

 

Try something like this

 

danilang_1-1615990109851.png

 

Find the nesting level by counting the number of next level box characters("└","├" and "│") in the line. Then build the entire company hierarchy for each company and extract the immediate parent from current hierarchy 

danilang_0-1615990941000.png

 

 

It works for your initial set, but you may have to tweak it for the full data.

 

Dan

 

  

 

 

Emil_Kos
17 - Castor
17 - Castor

Hi @danilang,


I think this is the right course of action 🙂

Definitely, it is very helpful. I will accept your post as a solution, but I want to see if other people have other ideas.

Emil_Kos
17 - Castor
17 - Castor

Hi @danilang,


Your solution games me the necessary inspiration to finish my task. Your idea of counting specific signs was brilliant.

 

Thank you for your help you are awesome! 😀

Emil_Kos
17 - Castor
17 - Castor

Hi @danilang,


I have a question as I am quite inexperienced in regex:

 

Could you explain how this one works?

 

(.*?)(\<.+)

danilang
18 - Pollux
18 - Pollux

Hi @Emil_Kos 

 

(.*?)(\<.+)  The pieces between the () are capturing groups and represent the bits you're trying to match in the string.  

 

(.*?) The period matches any character and the * represents 0 or more times.  The ? means stop the first time you hit the next character in the match string. In your case it's the first < in the string. Without the ? the match is what's known as greedy and will match all characters until the last <     

 

(\<.+)  \ is the escape character and signifies that the next character is to treated literally.  In this case < is not a reserved character, so the \ is not required, but if you wanted to match * you need to escape it since * is a reserved character as mentioned above.   + means one or more.  so this whole part matches any string like "<x" where x is any string with at least one character

 

Dan

Emil_Kos
17 - Castor
17 - Castor

Hi @danilang,

 

Thank you very much for a detailed explanation. It is very helpful!

Emil_Kos
17 - Castor
17 - Castor

Hi @danilang,


Apologies to bother you again but I want to create documentation in which I will explain in details how my whole workflow works.

 

I have one last piece missing. When you will have a moment could you kindly explain to me how the regex below works?

 

REGEX_Replace(REGEX_Replace([Concat_Value], ".*\|(.*?\|.*)", "$1"),"(.*)\|.*","$1")

danilang
18 - Pollux
18 - Pollux

Hi @Emil_Kos 

 

What we're doing at this part is trying to return direct parent which is between the second last "|" and the last "|"

 

For the inner regex_parse, ".*\|(.*?\|.*)", lets assume that we're parsing the string "Company A|Company B|Company BA"

 

Regex can scan through the string multiple times and can change scan direction at various times to make the entire regex expression match the target string.  This is one of those cases.  The initial ".*\|" matches any sequence of characters up to the last "|" since it doesn't include the "?"(see previous post).  Using the target string above, this would normally match the second "|", but since the engine is trying to match the entire expression, it then starts looking backwards to try and match the next part as well.  The next part is "(.*?\|.*)".  Starting from where the first part finished, and adding back one character at a time, the engine can match "Company B|Company BA" to the second part. The section before this "Company A|"  can also match the first expression so the last position where the target string is matched by the entire regex expression is the second last "|". 

 

The actual effect of this is that the expression will match the second last "|", so if the target has multiple "|" it will always return the text after the second last one   

 

The outer regex_parse, "(.*)\|.*", takes the output of the first, "Company B|Company BA" and matches everything up to but not including the last "|" retuning the parent "Company B" 

 

 

Dan

Emil_Kos
17 - Castor
17 - Castor

Hi @danilang,

 

Thank you for the detailed explanation! It is very helpful! 

Labels