Let’s talk Alteryx Copilot. Join the live AMA event to connect with the Alteryx team, ask questions, and hear how others are exploring what Copilot can do. Have Copilot questions? Ask here!
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Importing TXT delimited by comma with quotes & commas in data

MasoB
6 - Meteoroid

Hi,

 

I am trying to create a workflow to properly import txt data. My data looks like this and does not contain field names:

 

Screenshot 2025-03-04 110011.png

 

These are the options I select in the input tool:

Screenshot 2025-03-04 105715.png

 

 

 This is the output I get:

Screenshot 2025-03-04 105805.png

 

Is there any way to build a workflow to properly import the data or will I ned to fix these errors manually in the txt file? Example data file attached.

 

Appreciate any guidance. Thank you.

10 REPLIES 10
alexnajm
18 - Pollux
18 - Pollux

You are gonna have to read it in with no delimiter and use RegEx afterwards to parse through it in my opinion...having quotes to group data together when there are quotes in the values makes this a challenge to read in normally

Warcry
9 - Comet

Please provide what the results should look like because it looks like some rows have more columns than the others.

davidskaife
14 - Magnetar

Hi @MasoB 

 

If you read it in with no delimiter as @alexnajm suggested you can use a Data Cleansing tool to remove leading & trailing whitespace, plus tabs, line breaks and duplicate whitespace. Then using a Text to Columns tool you can split on comma, and set it to ignore delimiters in quotes.

 

This works for the first and second rows, but not the third as there is either a missing comma, or a comma too many around the point of the "PIPE, 4 " part. If you remove the quote after 4 it works in line with the other rows.

 

Of course if that is the exact file layout then it makes it more tricky, and Regex parsing will have to be the way forward.

MasoB
6 - Meteoroid

Screenshot 2025-03-04 131953.png

 

This is what it should look like ideally. 

 

chuckleswk
11 - Bolide

You can use the following workflow to create the output that you're looking for.

 

  • You read the text in with a delimiter of \n
  • Use the Formula Tool to clean up the data and get a proper delimiter

2025-03-04_14-33-15.jpg

  • Use the Text to Column tool to split it up based on your new delimiter
  • Use the Data Cleanse tool to get rid of the leading and trailing whitespace

 

2025-03-04_14-31-11.jpg

Warcry
9 - Comet

Use the python tool and update the file_path.

 

from ayx import Alteryx
import re
import pandas as pd
import csv


file_path = r"C:\Users\Warcry\Downloads\EXAMPLE UPDATED.txt"

 

parsed_data = []
 
# Open and read the file using the CSV reader
with open(file_path, "r", encoding="utf-8") as file:
    reader = csv.reader(file, quotechar='"', delimiter=',', skipinitialspace=True)
    for row in reader:
        parsed_data.append(row)
 
# Determine the maximum number of columns dynamically
max_columns = max(len(row) for row in parsed_data)
 
# Define column headers based on the maximum number of columns
columns = [
    "A", "B", "C", "D", "E", "F", "G"
][:max_columns]  # Truncate or pad headers to match the actual data
 
# Create a Pandas DataFrame
df = pd.DataFrame(parsed_data, columns=columns)
 
# Clean the data
df["D"] = df["D"].str.strip().replace(r'"', '', regex=True)
df["F"] = df["F"].str.strip().replace(r'"', '', regex=True)
df["G"] = pd.to_numeric(df["G"], errors="coerce")  # Ensure 'Amount' is numeric
 
print(df) # for testing
Alteryx.write(df, 1)
 
Screenshot 2025-03-04 164254.png

chuckleswk
11 - Bolide

@MasoB if one of these solutions works for you, please mark it as accepted so that other's can find / use it.

MasoB
6 - Meteoroid

Thank you for the simple and easy to follow solution.

KGT
13 - Pulsar

Your file has consistent spacing and so it's just a matter of working out how many characters per field.

 

This is the character numbers for where commas occur in your sample data.

 

AlteryxGui_9cS6tc48Y1.png

 

Record 1 has an erroneous comma at position 35 and Record 3 has one at position 58.

So, your fields are defined by the size and these formulas will get you there:

  • Format: Substring(fieldname,startChar,numChar)
  • Substring([Field],0,7)
  • Substring([Field],8,8)
  • Substring([Field],17,34)
  • Substring([Field],52,30)
  • Substring([Field],83,12)
  • Substring([Field],96,3)

You can wrap them all in Trim(formula,'"') to trim all the quotes off the end/start but that will also remove any ending quotes in the data. Depending on what the full data looks like you may choose to just change the substrings to 1,5 instead of 0,7 for instance.

 

I've attached a workflow that has more than I intended but shows my investigation as well. The Dynamic Formula is not essential, as you can copy/paste each of the formulas into a formula tool if you like.

Labels
Top Solution Authors