This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Find the nesting level by counting the number of next level box characters("└","├" and "│") in the line. Then build the entire company hierarchy for each company and extract the immediate parent from current hierarchy
It works for your initial set, but you may have to tweak it for the full data.
(.*?)(\<.+) The pieces between the () are capturing groups and represent the bits you're trying to match in the string.
(.*?) The period matches any character and the * represents 0 or more times. The ? means stop the first time you hit the next character in the match string. In your case it's the first < in the string. Without the ? the match is what's known as greedy and will match all characters until the last <
(\<.+) \ is the escape character and signifies that the next character is to treated literally. In this case < is not a reserved character, so the \ is not required, but if you wanted to match * you need to escape it since * is a reserved character as mentioned above. + means one or more. so this whole part matches any string like "<x" where x is any string with at least one character
What we're doing at this part is trying to return direct parent which is between the second last "|" and the last "|"
For the inner regex_parse, ".*\|(.*?\|.*)", lets assume that we're parsing the string "Company A|Company B|Company BA"
Regex can scan through the string multiple times and can change scan direction at various times to make the entire regex expression match the target string. This is one of those cases. The initial ".*\|" matches any sequence of characters up to the last "|" since it doesn't include the "?"(see previous post). Using the target string above, this would normally match the second "|", but since the engine is trying to match the entire expression, it then starts looking backwards to try and match the next part as well. The next part is "(.*?\|.*)". Starting from where the first part finished, and adding back one character at a time, the engine can match "Company B|Company BA" to the second part. The section before this "Company A|" can also match the first expression so the last position where the target string is matched by the entire regex expression is the second last "|".
The actual effect of this is that the expression will match the second last "|", so if the target has multiple "|" it will always return the text after the second last one
The outer regex_parse, "(.*)\|.*", takes the output of the first, "Company B|Company BA" and matches everything up to but not including the last "|" retuning the parent "Company B"