csv import where the starting row changes
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have a csv report format where the first x rows are things like report name, layout name, who ran it, when they ran it, which filters they had, then there's a blank row, and then the data actually starts. The report can be >1M lines (and Excel "helpfully" modifies my data), so Save As-ing as xlsx isn't an option
I can import this csv into Alteryx by setting it start the data on row 10. However, I'm trying to make the workflow more dynamic so if someone adds an extra filter, which would push the data down, or removes a filter, so it would pull the data up, Alteryx would just automatically adjust without me needing to modify the "start import on row" line. Also, the number of columns may change.
I can import the data by using the "\0" delimiter and then (theoretically) figure out the RegEx to parse it out.
I saw an answer about using an Analytics App for this, but because of how we're currently running our workflows, it's not an option (happy to elaborate)
I'm wondering if there's a simpler way. I want Alteryx to just look at it and recognize that the data doesn't start until row 10 and just happily consume the report starting on row 10
Solved! Go to Solution.
- Labels:
- Data Investigation
- Datasets
- Input
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Is every row in column b populated after the header?
If so, you could use a filter tool to remove any row where b is empty and then a dynamic rename to add the appropriate headers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I would read in the data, use a Filter to move those extra rows (something like [Field_2] = Null() AND [Field_3] = Null() or whatever logic would work well), then use a Dynamic Rename to move that first row (which presumably you want to be your header) to be the header row!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks for the reply.
I don't know if every row in Column B will be populated and don't want to assume. But I figure I can do a Filter to exclude if Col D and Col E and Col F is null or some version of that (like alexnajm) described
My issue with just reading everything in and then ignoring the top rows is I can't get the csv to import if the first row is set to 1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
There are input data settings you can adjust to make sure it avoids those errors - however to best help can you provide a sample dataset?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If I do an import with the delimiter set to "\0", I can get the data in fine and then did a Multi Row + Filter to exclude the Header rows. Which works but then I'm stuck with trying to figure out how to parse it out, since I need to ignore commas in double quotes and fields sometimes have double quotes but sometimes don't, and I haven't figured out how to do that with RegEx yet
 
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
There’s an option in Text to Columns to ignore delimiter in double quotes!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
OMG THERE IS!! THANK YOU!!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Happy to help!
