How to extract data from HTML code.
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello, I'm looking for solution to extract the strings present inside an HTML codes. Can you please share the code or Alteryx workflow to achieve final output like :
Input : <span style="color: rgb(255, 153, 0); font-size: 14px;"><span style="font-size:20px;color:rgb(255,153,0);">My name is Ram </span></span>
Output : My name is Ram
Input : <b style="color: rgb(255, 153, 0);">Please Enter Name </b><div><b style="color: rgb(255, 153, 0);">Ram</b></div>
Output : Please Enter Name Ram
- Labels:
- Data Science
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello @BihaniBhavesh.
You can parse HTML inside of Designer in multiple ways. One way is to use the RegEx tool (see the attached workflow).
For more complicated HTML, it may be best to resort to a programming language such as Python. The Python library called beautifulsoup is great at parsing HTML.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Here is a fun challenge for parsing HTML
You can look at some of the solutions and practice parsing yourself but generally, @acarter881 hit it right on the head. You will want to look for patterns in the HTML using Regex to parse
Best of luck!
Tristan
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you so much - i tried to use this regex with tokenize method - I got the data in 2 columns json name and jsonname_value
where I see the column names first followed by the data that belongs to those columns - i have to take this to the next level by validating
