Hello,
I'm having trouble cleaning this data to as shown below (2nd table),
Entity;"Alternative Name";"Story";"Year";"Records Lost";"Sector";"Method of Leak";"1st source";"2nd source";"3rd source";"Source name" |
River City Media;"Spam operator";"A dodgy backup has allegedly resulted in over a billion leaked email addresses |
Could someone please help me get through cleaning this kind of dirty data.?
Also attached is the file I'm trying to clean.
Output should look like,
Entity | Alternative Name | Story | Year | Records Lost | Sector | Method of Leak | 1st source | 2nd source | 3rd source | Source Name |
River City Media | Spam Operator | A dodgy backup has allegedly resulted in over a million leaked email addresses | 2017 | 1,37e+09 | Web | Accidentally Published | https://betanews.com/2017/03/06/river-city-media-spam-database-leak/ | Beta News |
Thanks in advance.
Solved! Go to Solution.
Hi @Esha_Reddy
What I'd do is to grab your input data, and read it as a csv, using "\0" as delimiter and untick "First Row Contains Filed Names".
Once the 278 records (including the headers) are into Alteryx, just use a Cleanse Tool to get rid of the punctuation in the values (If you want, you can keep Field 8 out this cleaning process, since all ":", "/" from the urls are going to be removed). You can remove the quotes of this field with the Formula tool later (ReplaceChar([Field],"Char to find", "Char to replace with")
Next, use a Text to Columns tool to separate the 11 columns using ";"as the delimiter.
Finally, with the Dynamic Rename Tool, connect your data to the top input and select Take Field Names from the First Row of Data.
Probably, your data may need further cleaning, but this will help you start. I'm attaching the previous steps in a WF to give you a better idea of the whole process.
Best,
_AG_
PS: As a best practice, use a Select Tool after the Text to Columns, and get rid of the Original field.