Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extracting specific data in string for multiple fields or columns in table

AlYap
7 - Meteor

Hi, 

 

I have a dataset of ledger title (string) that is somewhat messy but needs to be converted to a title that is tidier and easier to read. To achieve this, I need to break this down into a table with multiple fields or columns. We operate globally so different people write $values differently e.g. commas instead of dot, that's another issue that needs addressing but not important (lowest priority) as we just need to tell people the "right way" to spell out values in future.

 

Current dataset:

EUR Approved: POR 2020 APT - BNS8 - (Frankfurt, Germany) Real Estate - ($ 193,8MM )

NAM Approved: 2019 TXD -TDU2 (Charlotte, NC) Full Real Estate & Initial Budget: - RE:$138.8MM Initial: $183.2MM

NAM Approved: 2021 BRS - JBD6 (Quebec, CA) Full Real Estate - (Project Total- $738.3MM)

 

For context of above:

Region Field = EUR/NAM needs to change to two letters.

Building Type = APT/TXD/BRS

Site ID = BNS8, TDU2, JBD6

Budget Type = Full Real Estate = RE ; Initial Budget = IB 

Total Budget needs to be combined total where two numbers are available e.g. 138.8+183.2= 322.0

Description to be adjusted in following pattern: [Site ID] [Building Type] [Budget Type]

 

 

Expected output:

 

YearRegionBudget TypeTotal Budget ($)Description
2020EURE193.8BNS8 APT Real Estate
2019NARE & IB322.0TDU2  TXD Full Real Estate and Initial Budget
2021NARE738.3JBD6 BRS Full Real Estate

 

Am new to Alteryx, so any help welcome!!! Thank you!!!!

 

8 REPLIES 8
mpennington
11 - Bolide

The budget amount is tricky. Used RegEx to nail it down. Let me know if you have questions on how it works.

 

Edit, went ahead and added the last couple of steps and updated the workflow. Lots of RegEx for the strings.

 

Extract2.png

AlYap
7 - Meteor

Thank you so much .@mpennington this helped tremendously. Some follow up:

 

REGEX formula: (NA|EU)\u.*(?:POR)*\s(\d{4})\s(\w{3})\s*-\s*(\w{4})

 

1. YEAR - For some reasons, not all years pulled up 

2. REGION - just spotted other regions JPN, IND, INA in the dataset - these would need to be grouped as OT (as in other countries)

 

Below is the output:

 RegionYearBuilding TypeSite ID
NAM NME Approver: 2021 TNS - ILT1 (New Haven, DE) Construction Supplemental -$22.5MM (Project Total: $188.4MM)NA2021TNSILG1
IND  NME Approver: 2022 TRS450G+3 - BRQ2 (Prague, Czech Republic) - $25,613,196[Null][Null][Null][Null]
EUR  NME Approver: 2020 S&A -EGN1 (DE) EGN1 S&A expansion - $ 3.47MM[Null][Null][Null][Null]
NAM  NME Approver: 2022 Custom BRS - TUX1 (Vancouver, BC) Full Real Estate - $294.8MM[Null][Null][Null][Null]
EUR  NME Approver:: POR 2022  YNS80 - QEJ9 (Berlin, Germany) Real Estate - ($ 193,8MM )[Null][Null][Null][Null]

 

If you can help that would be greatly appreciated.

mpennington
11 - Bolide

Yes, I will see what I can do with this. But one thing to note, working with unstructured strings is challenging. If your data formats and structure changes, it can cause problems. The 5 strings you note above are structured differently than the 3 examples you provided in the first case. RegEx is looking for a very precise set of strings and what precedes and follows it. To use RegEx or any comparable method you need consistensy in the data. Regardless, I will follow up after I reevaluate the new information to see if this is an easy fix.

 

Qiu
21 - Polaris
21 - Polaris

@AlYap 
I have to agree with @mpennington that it is very challenging to parse the unstructured data.

With a bit of change of your input data, the parse statement will not be working most likely.

 

AlYap
7 - Meteor

.@mpennington 

Thank you so much for helping. This is has kept me spending more time on reporting than I should, including weekends!  So anything to improve that would be greatly appreciated. 

mpennington
11 - Bolide

No problem, glad to help. Just be aware that this solution only works for finite parameters, if your data set gets too big or too variable, this could break. Inconsistent treatment on the dollar amounts is a prime example. They use millions in some cases, commas, periods, the full amount and the abbreviated amounts. Can get real tricky, real fast. I commented up the formulas and broke them into more easily consumable segments. This worked for both the original set and the new data. I had to guess on some of the Site IDs and building types, but it should be easy for you to modify, just look at the comments. Let me know if you have any further questions. Good luck.

 

OutputOutputCommented out formulae for expansionCommented out formulae for expansion

mpennington
11 - Bolide

@AlYap Did the secondary solution to your updated data set solve your issue?  

AlYap
7 - Meteor

.@mpennington thanks for checking in. It worked for the examples I provided. Unfortunately it didn't for the entire population of data set and it's my fault to select only a small sample size. You are right, the unstructured data set is difficult to automate, so there are some things we have to live with until a solution is available for the data input process. Thanks a lot though!!

Labels
Top Solution Authors