Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
解決済み

How to check file format and delimiter for few files in a folder

Marcin_Drabik
アトム

Before loading files to another system I need to run few checks, one of them is to ensure all files in a given folder are in .csv or .txt format.
Second one is delimiters, how to check if I have | , or ; without manually opening files (there's a lot of them in the folder).

Third case is DateTime format check, in my files I have for example 03/06/2024 17:23:41.217000 and I need to ensure that time is always in milliseconds format (6 digits after .). As far as I know Alteryx can't read such format, do you have any idea how to run this check? I'd like to bulletproof it against random values after '.', only numbers are allowed there, so I need to highlight records with for example 03/06/2024 17:23:41.217ABC.


Ideally I'd like to have a workflow that would check above and put comment in new column what delimiter and file format is.



I'd really appreciate you help on this one, I am new Alteryx user and have no clue how to fix it.

6件の返信6
Marcin_Drabik
アトム

Ok I've found that file format (extension) can be extracted using Directory tool, that's one fixed. 
But with delimiters I still need help, looking forward if anyone has idea how to do it.

Thanks,

Marcin

apathetichell
アルタイル

@Marcin_Drabik You cannot do this natively in Alteryx - I'd recommend https://docs.python.org/3/library/csv.html#csv.Sniffer

 

More resources:

https://stackoverflow.com/questions/3952132/how-do-you-dynamically-identify-unknown-delimiters-in-a-...

 

import csv


def get_delimiter(file_path: str) -> str:
    with open(file_path, 'r') as csvfile:
        delimiter = str(csv.Sniffer().sniff(csvfile.read()).delimiter)
        return delimiter
Marcin_Drabik
アトム

Actually I managed to solve my problem and here's how, for future generations :)

To get file extension I used Directory function followed by formulas in screenshot 1.

 

Issue with delimiters I fixed by changing option in Input Data; Delimiters I set to \0 which means no delimiter and 'Ignore Delimiters in' I set to None (screenshot 2).
By doing that I received string for each row of data including delimiters, then I had to only count how many different delimiters are in the string with REGEX_CountMatches function. Separately for each delimiter: , . | tab and ;.

Syntax thing I resolved by performing multiple formula actions, having date format 03/06/2024 17:23:41.217000 I checked if 3rd position from the left is '/', the same for 6th position and so on, using simple right(left,x) formula.


It may be not the most efficient way of doing that but at least is working for me needs.

NeoInfiniTech
ファイアボール

Hello @Marcin_Drabik,

 

While being aware that the problem is already solved and this might be a bit off topic in that it reads in the file rather than reporting the delimiter info (the macro would have to be modified to append the delimiter info to each row), I would still like to share a macro that can automatically read in the CSV/TXT files (alongside other formats) in a provided folder given they all have the same encoding: CSV Friendly Multi-Input (Inspired by CReW Wildcard Input)

 

The stackoverflow link here also helped me refine the macro so thanks @apathetichell for linking the discussion.

 

Hope this will be of help to you or anyone else that might encounter a similar situation in the future.

apathetichell
アルタイル

@NeoInfiniTech - yeah - I guess I over complicated thing. 100% the brute force - read in and determine the delimiter may work - but for large files that would be a nighmare.

 

If you try to build this - and hit any issues - let me know and I can build out a macro for you. This seems like something we should have floating around marketplace/community (like the inspect files in .zip files).

apathetichell
アルタイル

If anyone is curious about this - I took a stab at it here:

https://github.com/apathetichell/2024_AlteryxMacros

ラベル