Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Help with a formula tool to create identify if first string of text matches before _

binsell
8 - Asteroid

I'm hoping someone could kindly help me figure out a formula to return the result I want. I have attached by input data and output data so you can see what I'm trying to achieve. I believe it can be done with a formula tool but it might not be as simple as that! I have a string of text in 'SESSION ID' and some of my data has a duplicate 'SESSION ID' BUT it has an underscore after it and some other unique numbers. First of all I'm trying to identify the data where the session ID's match at the start (so if one session id has an underscore in, want the two to match BEFORE the underscore) and if that is a match, then return then the SCSFEE, INVFEE and Comment to the session ID that has the underscore in. So it basically matched these columns with the session ID without the underscore and is the same numbers for the first characters before the underscore. I didn't want to put the amount of characters to match as they can vary sometimes. So as example, session ID have 8 characters at the start say 12345678 and the other session ID is 12345678_202312356 BUT then another row in my data could be 1234 and other session id is 1234_2023123 so I couldn't say match the first 4 characters say as the other example has 8 characters at the start. 

 

I hope that makes sense and I know might sound a little confusing but hopefully the input and output I have attached will make it clear. A massive thank you in advance for your help! 

18 REPLIES 18
binsell
8 - Asteroid

@OllieClarke Having looked at my data again, there are always 7 characters before the underscore. So as example, 1234567_2023123 is one session ID and I would like it to match with session ID 1234567 (this one is without the underscore) so maybe in the first instance if I can do a formula or some other tool that will identify if the first 7 characters of the session ID are the same. I was looking for a solution but so much out there and never exactly the same as your issue! I'm sure this is an easy one to figure out! Thanks again for all your help!

OllieClarke
15 - Aurora
15 - Aurora

Hi @binsell 

 

The text to columns tool will split the I’d on the underscore into 2 columns

 

the filter tool splits out the ones which have an underscore with the ones that dont

 

the join tool ‘matches’ the first part of the underscored ids with the ids that have no underscore

 

the union then brings this all together. 

the test tool is just to throw an error if an underscored id doesn’t match. 

Does that help you understand?

binsell
8 - Asteroid

@OllieClarke Yes it does, I was just thinking of the text to columns tool and the rest makes perfect sense. So will work on that. Thanks so much

binsell
8 - Asteroid

@OllieClarke thanks for all your help. I'm nearly there but I can't seem to get it to work with my data (I just provided a simple example) for some reason on the join, it's not pulling through the session ID that had the underscore in it as I need to keep it. Very strange as followed your workflow and I can see it works!! Thank you for taking the time to help me. I'm going to have to revert back to excel for this as I know how to solve it in there! Thanks so much again, I really appreciate it

OllieClarke
15 - Aurora
15 - Aurora

@binsell 

If you share how you've configured the join, I may be able to help. You need the SCSESSIONID from the Right input (assuming same orientation as my workflow)

image.png

binsell
8 - Asteroid

@OllieClarke Thank you, very kind of you! So here is I have configured the join and basically session ID 1, is just the start of the session ID with the numbers after the underscore taken away, so it will join on the other session id that doesn't have the underscore after it. Hope that makes sense! But when I union the data back together, the session ID with the underscore is in two sperate columns still and in your workflow, they aren't and I can seem to figure where I have gone wrong. 

 

 

Also, how did you paste an image into here? Mine just keeps saying 'image upload failed' so have to attach it instead! 

 

Thanks again! 

 

 

 

binsell
8 - Asteroid

@OllieClarke I have managed to figure it out! I have done a formula after so bring the session ID back together and worked! Thank you again for all your help, very much appreciated. 

AmitMiller
Alteryx
Alteryx

Hi @binsell, I've attached a workflow that aims to solve your scenario, using the Multi-row Formula Tool

fill missing values.png

 

 

 

 

 

 

 

 

 

 

 

as Aatalai suggested, I recommend experimenting a bit with the Multi-row Formula tool =]

  • An excellent place to start would be the one-tool example! its very comprehensive for this one.
  • While you'll need 3 copies of this tool (1 for each column), it still presents an elegant solution, to my opinion.

Do note, that you can separate\extract the relevant portion of the sessionID column either via the 

  1. Text-to-columns tool (underscore as the delimiter)
  2. Regex tool
    1. Tokenize method - which will output 2 new columns (I used the regex [^_]+ which aims to match a sequence of characters that aren't underscore)
    2. Parse method - which will output 1 new column per group (here I used the regex (\d+) which aims to find a sequence of digits)
      1. The parenthesis defines a group, and since there's only 1 - only 1 new column would be created
      2. Both regexes will work - just showing 2 ways to approach the same pattern

While Regex usually takes more time to run, it can sometimes shorten other processes

  • For example - the parse method of the Regex tool extracts only the relevant column

Now - for the Multi-row tool, you'll want to condition if the previous record's session first sequence of digits is identical to the current one - and the current record's value is empty - then insert the value from the previous record.

 

if
[Row-1:sessionID1]==[sessionID1] and isnull([SCSFEE (gross)])
then
[Row-1:SCSFEE (gross)]
else 
[SCSFEE (gross)]
endif

 

 The same needs to be done for all 3 columns.

Do use the variable references, rather than manually writing them:

variable reference.png

 

 

 

 

 

 

 

 

I've only continued the workflow from the text-to-columns as it is the same no matter what starting point you choose

 

P.S

You could skip the first step of extracting the different session_IDs, and integrate a Regex_replace function as part of each of the Multi-Row tools. However, I think there's a Diminishing return to minimizing the tool count, but making the other tools purpose harder to read

 

Hope that helps!

Amit.

binsell
8 - Asteroid

Thank you for taking the time to look at this! As per my comment to Ollie, I have managed to resolve this and did it only use parse, filter, join and formula tool and worked perfectly! This is also really helpful as gives me exposure to tools that I haven't seen before so really good insight for me! 

Labels