Hi, everyone! I'm a new user of Alteryx and especially new to transforming .csv files.
I have these files (6 in total, ~15 million lines) that contain line breaks. My team and I have a script that can detect these line breaks using Text to Columns. The first column indicates which records have errors: if it does not follow the file pattern, it has a line break.
By applying a filter using any column that should not have a [Null] record, we can also see all the line breaks. Upon closer inspection, there are some records that follow the pattern of the first column with the fields, but they are incomplete (they don't have all the delimiters), and the rest of this field is in the next row or divided across the next few rows (as far as I know, the maximum is 5 rows).
Is there any way for me to move these records up to their rightful place while updating the fields in the process? If it helps, the "rightful place" always starts with a zero.
Thanks in advance!
Are you able to share some example data, even if its dummy data? If this can include the expected output as well that would be great!
Thanks for the detailed info! Try this solution, it will be just one of the ways that can handle this issue:
Using a Multi-Row Formula tool after your Input tool, simply update Field_1 where if the field starts with 0 then do nothing, but if it doesn't then join the row above to that row with ' - ' in between.
This then can flow through the rest of your workflow, and any that start with NULL will filter out
Workflow attached, let me know if this doesn't work or if it needs any tweaks!
After coming across this issue regarding the correction of CSV files maybe more than a thousand times, I decided that it was necessary to share what may be considered an ultimate workflow (an enhanced and more dynamic version of a workflow that I previously shared here) that may be achieved using Alteryx Designer (though I admit the workflow itself may not be well structured and documented as it ideally should be), while sharing additional notes based on my previous experiences.
Please take the time to carefully read all of the notes in the workflow I am sharing as an attachment before configuring and running the workflow. This workflow, without any configuration, will look for all the CSV files in the folder it was located, although I highly advise that you look out for any lines that may contain additional delimiter characters (such as an additional ";" character in a column, something that this workflow cannot automatically address and therefore needs to be corrected manually).
As for the solution @davidskaife shared, while it was a method I used to utilize before and might still be helpful for specific scenarios where the offset is consistent, there are cases where the Multi-Row Formula has to be adjusted manually (Num Rows option) as the offset can vary in a file and across files.
Hi @LucasPereira4 ,
If the patten of the start and end of line can be logically defined, I would concatenate all the rows into one record and then split to rows with RegEx.
Here I use the below regular expression at the RegEx Tool;
0012025\d{10}(?:.*?;){10}\d{4}
I am assuming
- the first column starts with "0012025" followed by 10 numbers and
- then followed by 10 columns delimited with ";", and
- the last column is always 4 numbers.
You can modify the patten to make it more robust.
I hope this helps. Good luck.
User | Count |
---|---|
106 | |
85 | |
76 | |
54 | |
40 |