Hi,
I am trying to create a workflow to properly import txt data. My data looks like this and does not contain field names:
These are the options I select in the input tool:
This is the output I get:
Is there any way to build a workflow to properly import the data or will I ned to fix these errors manually in the txt file? Example data file attached.
Appreciate any guidance. Thank you.
Solved! Go to Solution.
You are gonna have to read it in with no delimiter and use RegEx afterwards to parse through it in my opinion...having quotes to group data together when there are quotes in the values makes this a challenge to read in normally
Please provide what the results should look like because it looks like some rows have more columns than the others.
Hi @MasoB
If you read it in with no delimiter as @alexnajm suggested you can use a Data Cleansing tool to remove leading & trailing whitespace, plus tabs, line breaks and duplicate whitespace. Then using a Text to Columns tool you can split on comma, and set it to ignore delimiters in quotes.
This works for the first and second rows, but not the third as there is either a missing comma, or a comma too many around the point of the "PIPE, 4 " part. If you remove the quote after 4 it works in line with the other rows.
Of course if that is the exact file layout then it makes it more tricky, and Regex parsing will have to be the way forward.
This is what it should look like ideally.
You can use the following workflow to create the output that you're looking for.
Use the python tool and update the file_path.
from ayx import Alteryx
import re
import pandas as pd
import csv
file_path = r"C:\Users\Warcry\Downloads\EXAMPLE UPDATED.txt"
@MasoB if one of these solutions works for you, please mark it as accepted so that other's can find / use it.
Thank you for the simple and easy to follow solution.
Your file has consistent spacing and so it's just a matter of working out how many characters per field.
This is the character numbers for where commas occur in your sample data.
Record 1 has an erroneous comma at position 35 and Record 3 has one at position 58.
So, your fields are defined by the size and these formulas will get you there:
You can wrap them all in Trim(formula,'"') to trim all the quotes off the end/start but that will also remove any ending quotes in the data. Depending on what the full data looks like you may choose to just change the substrings to 1,5 instead of 0,7 for instance.
I've attached a workflow that has more than I intended but shows my investigation as well. The Dynamic Formula is not essential, as you can copy/paste each of the formulas into a formula tool if you like.
User | Count |
---|---|
106 | |
82 | |
72 | |
54 | |
40 |