Help with separation of numerical data from semi-structured text input

Hi community, hope everyone is doing well.

I have got stuck with a formatting and data extraction flow. The usual input is an outlook email, but for the purpose of the issue I attach a mockup data formatted in similar fashion to what Alteryx reads when connected to the outlook tool.

The data that comes in contains company names, machine numbers, depot id, stock id... all combined, and they may come or not in the same order (as you may see in the body column of the attached excel dataset). when it comes to this data tickets, they can contain 1 or more machine numbers that need to be extracted along with the depot id and the stock id and arranged as columns so it can be used as an input to other process.

The attached flow does first the trimming of the left and right text portion that I do not need and after that continues with the tools to read and post in columns the machine numbers. The machine numbers that are valid to extract are the ones with 6-digit codes (the ones with 5-digit codes are not valid...some will not have as they are other type of tickets) and they are needed to enrich a number of fields in a SharePoint list. My key data item is the stock id, as this will be the item to look out in the SharePoint list to match and enrich the required fields (this is another flow).

Part of this flow uses regex formulas and tool which I am not too familiar with. The issue I am having is that the regex tool formula to read all available 6-digit code machine numbers also reads the invalid 5-digit codes or the stock id or even the depot id and it does not provide an accurate read and extraction of the needed 6-digit machine number codes (they start with 19 when you look at the attached excel dataset). Tickets can contain a single 6-digit machine number code, or it can come with more than 1 (on the range between 2 and 15), so part of the formula in the regex tool states that, so if the case shows up, I am able to read the listed 6-digit machine number codes provided. Part of what I did was to use the tokenize option in the regex tool and use the formula \b\d{6}\b, but the result in most cases would always show null, with a few that would bring one or some of the 6-digit machine number codes.

The goal is to be able to read, format and extract all 6-digit machine number codes whether 1 or many (up to 15 if possible), and have the stock id also listed to be able to find the SharePoint list record id and update the required fields in the other build flow (not attached). Any advice or suggestion to improve the flow and be able to accomplish the above goal, is welcomed. Thanks.

Ananias

KLCdatatest.xlsx

KLC data test.yxmd

Regex

Parse

Transformation

Help

Accepted answers

Matt_D

Hi @adlossc

Not sure if this helps, I've just deconstructed the body string with the 3 constants, depot, stock and tracker to leave you with the remaining numbers, which if 6 digits, are extracted.

Screenshot 2024-10-01 102848.png

Matt

KLC data test.yxzp

All comments

OTrieger

@adlossc

You are using incorrect REGEX code. If you set it to:
19\d{4} you will get all the machine numbers. Select Tokenize

OTrieger

@adlossc
Your Stock ID has 7 digits, \d{7} will get you the needed code.

Qiu

@adlossc
I made a quick sample with Regex, but not yet consider the case of many 6-digit machine number .

Can you provide more sample data?

KLC data test-r1.yxmd

Matt_D

Hi @adlossc

Not sure if this helps, I've just deconstructed the body string with the 3 constants, depot, stock and tracker to leave you with the remaining numbers, which if 6 digits, are extracted.

Screenshot 2024-10-01 102848.png

Matt

KLC data test.yxzp

adlossc

Hi @Qiu

Thank you for your help in the case presented. Attached you will find expanded data to 40 tickets with a verity of cases with: 1 or many 6-digit machine number codes, 1 or many 5-digit machine number codes along with 6-digit machine number codes, many 6-digit machine number codes, and cases without any type of machine codes, hope it helps provide more data to work with...I will be testing your solution this afternoon. Let me know if more info is needed! Thanks

KLCdatatest.xlsx

adlossc

Hi @Matt_D,

Thank you for your input for solution, will try it and see how that goes, and get back to you... I just replied to user Qui with another file with expanded data in case you want to check it up with your flow design. Thanks!

adlossc

One thing I have to mention is there is ongoing effort to standardize how the data comes in to be separated and extracted. I am re uploading the previously shared file with an additional Sheet2 that contains 11 records in the format that is being worked for standardization, where after each 6-digits machine number code there is a ";" as a separator, which in theory would make the extraction of numbers easier...my question here would be if using Regex for separation would still be useful or rather use the formula or text to columns tools @OTrieger @Qui @Matt_D?

KLCdatatest.xlsx

OTrieger

@adlossc
It is really depends on your preference as you could do it in any of these ways

adlossc

Thank @OTrieger you for your input, I tested your suggestion for the Stock ID and it does the trick for when it comes as number digit series, but it does not when I get cases that get an R added at the end of the number (meaning there might be an original case and this R may be the sub sequent case under the customer profile) or it can come alphanumeric (maybe 5 % of cases comes like this as it other region).. so, these records for Stock ID could look like this: 0052723R / GCAN5199...do you think the formula you sent can be modified to adapt these additional letters if they happen, or would a formula tool or text to columns would help too?

Image attached shows the result using the \d{7} formula

Capture.PNG

Quick Links

This months top contributors

atcodedog05 19458

Qiu 15866

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703