I have a text file with the following data:
var_a var_b var_c xx xx var_d
12 6/0 25 28
12 6/0 25 28
12 23 6/0 25 28
12 23 6/0 28
The first line contains column headings. Note that "xx xx" belongs to the same column and not all cells contain a data point.
How do I cleanse this data?
Solved! Go to Solution.
You can input your text file like so:
Then, arrange your layout as such (By clicking, you will set the boundary in red of where the column starts and ends - if you get it wrong, just cancel and re-do):
Your file looks like this:
Once that is settled, make a text file or use an Excel file if you'd like, to create Headers (since xx xx is a column in itself, it is better to declare it):
From there, connect them together:
This will take some time and effort to play around with.
For more ideas, refer here:
Hope this helps give you some ideas & experimentation!
-Cal
you could also try using regex to parse out the text
@binuacs Thanks for asking. That solution solved the issue, but I'm not sure how accurate would that manual divider works. The sample is contained in the prompt. The original file is not shareable as it contains sensitive information. If possible, I would appreciate a solution done in RegEx with example.
RegEx requires a pattern for you to match. Because you can have null data in some columns, your RegEx may not be consistent and you will run into warnings or even errors.
I would suggest to find out the fixed length of each column, and give a value to each column when you use the delimiter method I showed above. Otherwise you'll need to account for each column scenario - which is tedious if you have a lot of columns.