Alteryx Designer Desktop Discussions

aluthra · ‎06-18-2021

Hello,

Help needed!

I have a set of offline HTML files and I need to extract information from some specific headers for each file

How can I do this?

when I use the download tool I get an error as it a local file

I tried opening as CSV in the input tool which shows HTML code but I am not sure what to do next

KilianL · ‎06-18-2021

Hello @aluthra ,

you are on the right path by reading the HTML as a csv.

You can now use text functions or RegEx to identify and extract the information you need.

To locate the specific headers, try the Filter tool with the contains() function to search for keywords in a row. Once you know the row the information is in, you can extract and clean the information needed.

If you can share an example html file and what you want to extract, the community might help you even further.

aluthra · ‎06-22-2021

Hello @KilianL,

Thanks for the suggestion.

Do you know if I can use the Python tool to parse the information rather than regex?

KilianL · ‎06-22-2021

Hello @aluthra,

You might be able to avoid RegEx depending on how complex your parsing problem is.

Text functions like left(), right(), contains(), or the text to columns tool can do a lot of basic parsing.

You can always use Python, and then your possibilities are nearly endless. But for text parsing RegEx is the gold standard across programs, including Python.

aluthra · ‎06-22-2021

Thanks @KilianL

I think the time has come to explore RegEx :-). I will give it a go.

KilianL · ‎06-22-2021

Great to hear!

To wrap it up, in Alteryx, you can use RegEx with the RegEx Tool, and we also have a couple of RegEx functions to use in formulas.

If you need an example with your data, I am sure the community can help you 🙂

Alteryx Designer Desktop Discussions

Parsing local HTML file to extract data from specific headers