Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extract text strings from HTML language in Excel worksheet

tbuenaflor
7 - Meteor

I need to extract the below text strings hightlighted in yellow from the attached excel file into the the Output format below. The text strings needed are embeded, surrounded by html language. The pattern is that the text string that ends in a "%" symbol is followed by the text string in brackets [ ]

 

I've tried a combination of Regex, and Text to rows but keep on getting stuck. 

 

Input

tbuenaflor_0-1653334434200.png

 

Output

Dispay FieldField
Management Fee %[Management Fee]
Trustee Fee %[TrusteeFee]
 Service Provider Fee %[ServiceProviderFee]
Other Expenses %[OtherFee]
Underlying Funds Fees %[AcquiredFee]
Gross Ratio%[TotalExpRatio]
Fee Waiver %[WaiverFee]
Net Expense Ratio %[NetExpRatio]
5 REPLIES 5
IraWatt
17 - Castor
17 - Castor

Hey @tbuenaflor,

This workflow seems to work:

IraWatt_0-1653337888626.png

let me know if any issues

 

IraWatt
17 - Castor
17 - Castor

Had another go at it @tbuenaflor, I followed the logic of word with % then [word]

 

Regex used: 

([\w\s]+%|\[[\w]+\])

IraWatt_0-1653339314453.png

 

However this logic does not seem to hold here:

IraWatt_1-1653339406651.png

Hope the workflow helps, please ask if any questions.

 

PhilipMannering
16 - Nebula
16 - Nebula

This gets you most of the way,

 

PhilipMannering_0-1653341299933.png

 

tbuenaflor
7 - Meteor

@IraWatt  and @PhilipMannering  Thanks for the replies and solutions! Really appreciate it as I was spining my wheels. 

IraWatt
17 - Castor
17 - Castor

No worries @tbuenaflor glad it helped! 

Labels
Top Solution Authors