Hello,
Help needed!
I have a set of offline HTML files and I need to extract information from some specific headers for each file
How can I do this?
when I use the download tool I get an error as it a local file
I tried opening as CSV in the input tool which shows HTML code but I am not sure what to do next
Hello @aluthra ,
you are on the right path by reading the HTML as a csv.
You can now use text functions or RegEx to identify and extract the information you need.
To locate the specific headers, try the Filter tool with the contains() function to search for keywords in a row. Once you know the row the information is in, you can extract and clean the information needed.
If you can share an example html file and what you want to extract, the community might help you even further.
Hello @KilianL,
Thanks for the suggestion.
Do you know if I can use the Python tool to parse the information rather than regex?
Hello @aluthra,
You might be able to avoid RegEx depending on how complex your parsing problem is.
Text functions like left(), right(), contains(), or the text to columns tool can do a lot of basic parsing.
You can always use Python, and then your possibilities are nearly endless. But for text parsing RegEx is the gold standard across programs, including Python.
Great to hear!
To wrap it up, in Alteryx, you can use RegEx with the RegEx Tool, and we also have a couple of RegEx functions to use in formulas.
If you need an example with your data, I am sure the community can help you 🙂