I have a csv report format where the first x rows are things like report name, layout name, who ran it, when they ran it, which filters they had, then there's a blank row, and then the data actually starts. The report can be >1M lines (and Excel "helpfully" modifies my data), so Save As-ing as xlsx isn't an option
I can import this csv into Alteryx by setting it start the data on row 10. However, I'm trying to make the workflow more dynamic so if someone adds an extra filter, which would push the data down, or removes a filter, so it would pull the data up, Alteryx would just automatically adjust without me needing to modify the "start import on row" line. Also, the number of columns may change.
I can import the data by using the "\0" delimiter and then (theoretically) figure out the RegEx to parse it out.
I saw an answer about using an Analytics App for this, but because of how we're currently running our workflows, it's not an option (happy to elaborate)
I'm wondering if there's a simpler way. I want Alteryx to just look at it and recognize that the data doesn't start until row 10 and just happily consume the report starting on row 10
Solved! Go to Solution.
Is every row in column b populated after the header?
If so, you could use a filter tool to remove any row where b is empty and then a dynamic rename to add the appropriate headers.
I would read in the data, use a Filter to move those extra rows (something like [Field_2] = Null() AND [Field_3] = Null() or whatever logic would work well), then use a Dynamic Rename to move that first row (which presumably you want to be your header) to be the header row!
Thanks for the reply.
I don't know if every row in Column B will be populated and don't want to assume. But I figure I can do a Filter to exclude if Col D and Col E and Col F is null or some version of that (like alexnajm) described
My issue with just reading everything in and then ignoring the top rows is I can't get the csv to import if the first row is set to 1
There are input data settings you can adjust to make sure it avoids those errors - however to best help can you provide a sample dataset?
If I do an import with the delimiter set to "\0", I can get the data in fine and then did a Multi Row + Filter to exclude the Header rows. Which works but then I'm stuck with trying to figure out how to parse it out, since I need to ignore commas in double quotes and fields sometimes have double quotes but sometimes don't, and I haven't figured out how to do that with RegEx yet
There’s an option in Text to Columns to ignore delimiter in double quotes!
OMG THERE IS!! THANK YOU!!!!
Happy to help!