SOLVED
Replace Function Help
Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
bernardo_roschke
5 - Atom
‎04-28-2014
06:29 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I'm trying to scrape web data and clean up html. Using the Replace function, I want to target the area of data by replacing strings with pipes "|".
My first Replace formula works but the second does not. Perhaps it is because of special characters, I'm not sure.
My first Replace formula works but the second does not. Perhaps it is because of special characters, I'm not sure.
Link to module: https://dl.dropboxusercontent.com/u/60455118/BBQ%20Events.yxmd
The data I want to isolate and eventually turn into a table is about 3/4th down in the Download Data field.
Thanks.
The data I want to isolate and eventually turn into a table is about 3/4th down in the Download Data field.
Thanks.
Solved! Go to Solution.
Labels:
- Labels:
- Parse
- Preparation
3 REPLIES 3
ChadM
Alteryx Alumni (Retired)
‎04-29-2014
09:03 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Bernardo,
The link is not working for me, can you please post the REPLACE() function you are trying to use with an example of the data?
Thanks!
Chad
The link is not working for me, can you please post the REPLACE() function you are trying to use with an example of the data?
Thanks!
Chad
‎04-30-2014
01:08 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Try this link.
https://www.dropbox.com/s/2odc64x6k7gtibl/BBQ%20Events.yxmd
here is the replace formula.
replace([DownloadData],'</td></tr></table>»</td>¶','|')
thank you.
https://www.dropbox.com/s/2odc64x6k7gtibl/BBQ%20Events.yxmd
here is the replace formula.
replace([DownloadData],'</td></tr></table>»</td>¶','|')
thank you.
ChadM
Alteryx Alumni (Retired)
‎05-08-2014
08:55 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Bernardo,
Based on the fact that the return data contains newline characters and a few other things, a RegEx script is probably your best bet. Try this in a Formula Tool:
REGEX_Replace([DownloadData], '.*?(?:<h1>)(.*?)(?:</td> ).*', '$1')
If you want to also keep the <H1> tag, try this in your Formula Tool expression:
REGEX_Replace([DownloadData], '.*?((?:<h1>
.*?)(?:</td>
).*', '$1')
Huge thanks to Garth Miles for his help with this!
Chad
Follow me on Twitter! @AlteryxChad
Based on the fact that the return data contains newline characters and a few other things, a RegEx script is probably your best bet. Try this in a Formula Tool:
REGEX_Replace([DownloadData], '.*?(?:<h1>)(.*?)(?:</td> ).*', '$1')
If you want to also keep the <H1> tag, try this in your Formula Tool expression:
REGEX_Replace([DownloadData], '.*?((?:<h1>
Huge thanks to Garth Miles for his help with this!
Chad
Follow me on Twitter! @AlteryxChad
