SOLVED
Parsing out a structured item within an unstructured field
Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
michael
6 - Meteoroid
‎05-19-2015
07:14 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am trying to parse out Twitter handles from numerous fields. The column header is titled "Twitter Body" and this wil capture the text of someones tweet. As you can imagine sometimes when people tweet they will include someone else's twitter handle, sometimes they won't, ....and then sometimes they will mention multiple twitter handles.
So my data is structured in a way that if a twitter handle is mentioned it starts with the "@" symbol. However, this @ symbol can appear anywhere in the body of the text....so in a way it is unstructured.
I would like to parse out into an output field the different twitter handles that are mentioned. I've utilized the RegEx ([@]/<w+>) and this will return one Twitter handle in each cell. However, if the cell contains multiple Twitter handles I would like to output those into addtional output fields as well.
Thank you in advance for your help.
So my data is structured in a way that if a twitter handle is mentioned it starts with the "@" symbol. However, this @ symbol can appear anywhere in the body of the text....so in a way it is unstructured.
I would like to parse out into an output field the different twitter handles that are mentioned. I've utilized the RegEx ([@]/<w+>) and this will return one Twitter handle in each cell. However, if the cell contains multiple Twitter handles I would like to output those into addtional output fields as well.
Thank you in advance for your help.
Solved! Go to Solution.
Labels:
- Labels:
- Parse
2 REPLIES 2
kane_glendenning
10 - Fireball
‎05-19-2015
07:34 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi MIchael,
The way that I would do this would be to use the Text to columns tool and choose split to rows, Before doing this you want to impose a delimiter so that you don't have to split on the @ symbol. So, the solution would be a Record ID, then a formula tool with Replace([Twitter Body],"@","|@") and then split to rows on the Pipe delimiter. you can then use your REGEX formula to extract the Username. and the Record ID will allow you to roll them back together (Using Summarise or something similar).
Kane
The way that I would do this would be to use the Text to columns tool and choose split to rows, Before doing this you want to impose a delimiter so that you don't have to split on the @ symbol. So, the solution would be a Record ID, then a formula tool with Replace([Twitter Body],"@","|@") and then split to rows on the Pipe delimiter. you can then use your REGEX formula to extract the Username. and the Record ID will allow you to roll them back together (Using Summarise or something similar).
Kane
‎05-19-2015
07:49 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Kane. Thakn you very much. That makes sense!
