Alteryx Designer Desktop Discussions

Emil_Kos · ‎03-17-2021

Hi,

I am searching for inspiration. I received a really strange file with a relationship between one and more companies.

I am a data in this structure:

└─┬─Company A
├───Company AB
├───Company AC
├───Company AD
├───Company AE
├───Company AF
├───Company AG
├─┬─Company B
│ └───Company BA
├───Company CA
├───Company CB
├─┬─Company DE
│ └───Company DF

From this I need to prepare something in this format:

Parent	Child
N/A	Company A
Company A	Company AB
Company A	Company AC
Company A	Company AD
Company A	Company AE
Company A	Company AF
Company A	Company AG
Company A	Company B
Company B	Company BA
Company A	Company CA
Company A	Company CB
Company A	Company DE
Company DE	Company DF

Later it becomes even more complicated. I removed companies to name but it should give an idea:

Do you remember there was a similar issue in the community which I could check in order to find some kind of inspiration for this one?

danilang · ‎03-17-2021

Hi @Emil_Kos

Try something like this

Find the nesting level by counting the number of next level box characters("└","├" and "│") in the line. Then build the entire company hierarchy for each company and extract the immediate parent from current hierarchy

It works for your initial set, but you may have to tweak it for the full data.

Dan

Emil_Kos · ‎03-17-2021

Hi @danilang,

I think this is the right course of action 🙂

Definitely, it is very helpful. I will accept your post as a solution, but I want to see if other people have other ideas.

Emil_Kos · ‎03-18-2021

Hi @danilang,

Your solution games me the necessary inspiration to finish my task. Your idea of counting specific signs was brilliant.

Thank you for your help you are awesome! 😀

Emil_Kos · ‎03-18-2021

Hi @danilang,

I have a question as I am quite inexperienced in regex:

Could you explain how this one works?

(.*?)(\<.+)

danilang · ‎03-18-2021

Hi @Emil_Kos

(.*?)(\<.+) The pieces between the () are capturing groups and represent the bits you're trying to match in the string.

(.*?) The period matches any character and the * represents 0 or more times. The ? means stop the first time you hit the next character in the match string. In your case it's the first < in the string. Without the ? the match is what's known as greedy and will match all characters until the last <

(\<.+) \ is the escape character and signifies that the next character is to treated literally. In this case < is not a reserved character, so the \ is not required, but if you wanted to match * you need to escape it since * is a reserved character as mentioned above. + means one or more. so this whole part matches any string like "<x" where x is any string with at least one character

Dan

Emil_Kos · ‎03-18-2021

Hi @danilang,

Thank you very much for a detailed explanation. It is very helpful!

Emil_Kos · ‎03-19-2021

Hi @danilang,

Apologies to bother you again but I want to create documentation in which I will explain in details how my whole workflow works.

I have one last piece missing. When you will have a moment could you kindly explain to me how the regex below works?

REGEX_Replace(REGEX_Replace([Concat_Value], ".*\|(.*?\|.*)", "$1"),"(.*)\|.*","$1")

danilang · ‎03-19-2021

Hi @Emil_Kos

What we're doing at this part is trying to return direct parent which is between the second last "|" and the last "|"

For the inner regex_parse, ".*\|(.*?\|.*)", lets assume that we're parsing the string "Company A|Company B|Company BA"

Regex can scan through the string multiple times and can change scan direction at various times to make the entire regex expression match the target string. This is one of those cases. The initial ".*\|" matches any sequence of characters up to the last "|" since it doesn't include the "?"(see previous post). Using the target string above, this would normally match the second "|", but since the engine is trying to match the entire expression, it then starts looking backwards to try and match the next part as well. The next part is "(.*?\|.*)". Starting from where the first part finished, and adding back one character at a time, the engine can match "Company B|Company BA" to the second part. The section before this "Company A|" can also match the first expression so the last position where the target string is matched by the entire regex expression is the second last "|".

The actual effect of this is that the expression will match the second last "|", so if the target has multiple "|" it will always return the text after the second last one

The outer regex_parse, "(.*)\|.*", takes the output of the first, "Company B|Company BA" and matches everything up to but not including the last "|" retuning the parent "Company B"

Dan

Emil_Kos · ‎03-19-2021

Hi @danilang,

Thank you for the detailed explanation! It is very helpful!

Alteryx Designer Desktop Discussions

Hierarchical Tree issue

Tool Mastery | Decision Tree

Flattening hierarchical records

Tree Tool Return Values

Automating Hierarchical Structure Conversion

Container Issues

Re: Need to check if we can activate container bas...

Alteryx 2024.2 Upgrade Issue – Formula Tool Config...

Re: Alteryx Core Exam Data Download Issues

Re: Rolling 12 months dates

Re: Alteryx single function list