Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Parsing local HTML file to extract data from specific headers

aluthra
8 - Asteroid

Hello,

 

Help needed!

 

I have a set of offline HTML files and I need to extract information from some specific headers for each file

How can I do this?

 

when I use the download tool I get an error as it a local file

aluthra_0-1624029673751.png

 

I tried opening as CSV in the input tool which shows HTML code but I am not sure what to do next 

aluthra_1-1624029928063.png

 

 

5 REPLIES 5
KilianL
Alteryx Alumni (Retired)

Hello @aluthra ,

 

you are on the right path by reading the HTML as a csv.

 

You can now use text functions or RegEx to identify and extract the information you need.

To locate the specific headers, try the Filter tool with the contains() function to search for keywords in a row. Once you know the row the information is in, you can extract and clean the information needed.

 

If you can share an example html file and what you want to extract, the community might help you even further.

aluthra
8 - Asteroid

Hello @KilianL,

 

Thanks for the suggestion.

Do you know if I can use the Python tool to parse the information rather than regex?

KilianL
Alteryx Alumni (Retired)

Hello @aluthra,

 

You might be able to avoid RegEx depending on how complex your parsing problem is.

Text functions like left(), right(), contains(), or the text to columns tool can do a lot of basic parsing.

 

You can always use Python, and then your possibilities are nearly endless. But for text parsing RegEx is the gold standard across programs, including Python.

aluthra
8 - Asteroid

Thanks @KilianL 

 

I think the time has come to explore RegEx :-). I will give it a go.

KilianL
Alteryx Alumni (Retired)

Great to hear!

 

To wrap it up, in Alteryx, you can use RegEx with the RegEx Tool, and we also have a couple of RegEx functions to use in formulas.

If you need an example with your data, I am sure the community can help you 🙂

Labels