This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
After reading many articles about HTML parsing and NOT to use REGEX, which is how I am doing it, with a high level, but not 100% accuracy. Has anyone used a Python HTML parsing package within the Python Tool? I am parsing many fields of HTML/CLOB with REGEX but I am looking for a better way. Thank you
While you wait for someone more knowledgeable than myself to reply, I'll suggest the python library "beautiful soup". Without knowing more about your specific needs and use case I can't say for sure if it's the right solution for you but it's a solid html parsing tool.
My company, unfortunately, only has Alteryx 11.7 installed so I can't create an example workbook for you (the python tool was only added in 2018.1/2). For how to install 3rd party libraries, see this. For help with the python/ beautiful soup code for parsing see this.
Finally everything is set up. Now my question is how do I apply this against one column in a database table? For some reason, the HTML is stored in an Oracle table within the database with other columns. Of course, not a column with just the clean text out of the HTML.
Thank you for you assistance. I am hoping this will finally get me over the hump.