This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Hi,
I am trying to parse a flat file by different, known-in-advance lengths (ie, characters 1-25, 25-50, 50-65, etc.) with each entry ~600 characters long. I am getting Null outputs and the input field cells show that they have truncated characters. This is shown below.
Can anyone help me figure out what is going wrong here?
Thanks!
Hi @kmshannon
Have you tried to put a ".*" at the end of the RegEx, after your parsing groups?
Cheers,
Hi Thableaus,
Can you provide an example of what you mean? I have tried adding ".*" in a few different place in the Regex with no success. Also if you can direct me to somewhere that has all-encompassing Regex syntax, that would be helpful as well. I haven't been able to find a good source yet that explains what all the syntax items (\d, ., *, etc) mean.
Thanks!
Hi @kmshannon
When I read a .txt fixed file, I like to import a file layout. Here's what a flat file layout looks like:
<flatfile version="1"> <file path="C:\Users\MFrisch\Desktop\temp.asc" eoltype="crlf" /> <fields> <field name="Field1" type="String" length="1"/> <field name="Field2" type="Byte" length="3"/> <field name="Field3" type="Date" length="10"/> </fields> </flatfile>
I'll use a tool like Excel or Notepad++ to help me. If I construct the <Fields> values and paste them into a document named MyLayout.FLAT I can import that as part of the INPUT tool and create the right NAMES and filetypes.
Using excel's concatenate function, I can read an input list of fields and sizes and quickly construct the xml. Yes, i could do that in Alteryx. too.
Cheers,
Mark
Each row probably has a newline character, which is not covered by (.) itself. Dot means any character, but if you want to include "\n" (newline), you need to add a "?s" modifier (DOTALL mode).
I recommend this website and also the book "Mastering Regular Expressions" that covers the entire engine behind it.
Try to put ?s before your Regex and see what happens.
Cheers,
I converted the .dat file I was using to .xlsx, but am still getting the character truncation error. Can anyone explain what is happening here? I am trying to read in string records with whitespaces that are up to ~600 characters in length.
Hi @kmshannon
Could you please provide 1 row of your data?
It looks like your Regex is wrong.
it is trying to match .(any character) and then multiple spaces "\s"{n} in a sequence, which does not seem to be the case.
Cheers,
Thableaus,
A cleaned sample file is attached with two rows of data. The file has 24 columns of different, but known lengths. The lengths of each field (including empties) that I am trying to extract is as follows:
25
25
15
100
15
22
240
35
8
5
4
7
6
6
5
8
5
6
2
3
16
6
20
22
Let me know what you think the best approach is. I can't believe that Alteryx doesn't have a simpler way to parse .dat files based on field lengths.
Thanks