Hi,
I'm pulling an SAP report that gives an output like what I have attached. All the data is stored in a single column. The hard part is it's formatted with some information listed above that pertains to the rows below. I need to get the material number (the 8 digit number at the top left starting with a 2), the plant (the 4 digit number) that comes after the material name. and then I need a sum of the quantities listed below that pertain to that material and plant. The inventory quantities below the material may be one or more lines, it varies.
How can I extract this information?
Thanks,
James
Solved! Go to Solution.
a trick i use to parse really dirty data is to bring it in as csv with "none" delimiter - you can change the delimiter in the tool by defining it as "\0"
then from there you can filter the rows and treat the header rows differently than the "data" rows. if you have a sample file i can show you how i'd do it.
so i think i might not have the correct pattern down (looks like it's either tab delimited or fixed width), so i guessed and just split any time there were 2 or more spaces. essentially, i just used a multi-row tool to identify the row type and then split into fields, transpose join, then you'll roll up when you have the correct data. or you can just simply rename the fields as you see fit manually. let me know if this doesn't get you all the way there and if you have a sample of the data.
Thanks Jarrod,
It's really close! The multi-row formula is genius. I'm not very good with those. All I need is another multi-row formula that takes the volumes Sample 4 column after the Text-to-Columns tool and sums them up for every material listed Sample 2. Something like "if RowType = header, sum each row of Sample 4 for data underneath it until you reach the next header" but I don't know how to do that. Can you help me? I've listed the excel output of the data.
Thanks,
James
yeah, you pretty much have it, except i would flip that logic.
Create a new field called Sum_Sample4 (to keep it clean)
iif([RowType]='Header',0,tonumber(replace([Sample4],',',''))+[Row-1:Sum_Sample4])
there are two pieces i added in for the formula above though tonumber() and Replace(). the inner most formula is replacing the commas for nothing then we are turning the field into a number. If you don't replace the commas, Alteryx thinks it's another field and cuts off everything after the first comma, so 19,000 becomes 19. then tonumber() to translate to a number so you can add it to the Sum_Sample4.