Hello Community,
I'm sorry to bother but can't find a solution of my problem.
I'm pulling out the data from an API which gives me ~209 mln of records and this works good until I use Cross tab tool.
Then in crosstab output I got only 152 records and it looks that this is based only on partial results from step before not knowing why. I’ve checked the file when exported the results and it contains only mentioned 152 records.
So it seems that no matter how many records I will pass to crosstab it always give me no more than 152 records, maybe it's memory issue
I'm sorry but can't provide a reproducible workflow as having only small portion of data it doesn’t reflect the problem and this is sensitive business data.
Any idea what could be a cause ?
If this is memory issue how can I handle ~209 mln of rows to be processed correctly in Cross tab tool ?
Solved! Go to Solution.
@Rafal_Pietrak what is most likely happening is that values with the same column and and group are being aggregated
To stop this you will need to ether group by more information or differentiate each column more.
Hi @Rafal_Pietrak,
What's the configuration of your crosstab tool? You should be grouping on the groupID field, setting your column_name as the column header and the JSON_ValueString as the values and the method for aggregation should probabbly be concatenate rather than the likes of first as it will show up if your merging multiple values into one field.
Also worth checking what the data type your groupID field and doing some simple sense checks such as grouping by groupID in a summarise to see how many unique records should come out of your crosstab tool.
Kind regards,
Jonathan
Thank You @Jonathan-Sherman and @IraWatt for quick reaction. It looks that cross tab is configured properly and column names differ so it should be ok
I've checked as well the way of aggregation and concatenation doesn't work for me as it needs to be first or last. What I tried as well is to limit the urls from which I'm pulling out the data to just one and get as well 152 records as an output of cross tab.
Hi @Jonathan-Sherman here is the workflow with sample data. Hope it can help understand the issue.
Hi @Rafal_Pietrak,
I'd start by taking a look at the data you're bringing in, it doesn't appear to be unique at the groupID and Column_name level? The way the crosstab is set up and works would only take the first record in this example set of 5.
Kind regards,
Jonathan
You absolutely right @Jonathan-Sherman The API structure seems not to be fully correct.
Thank You for spotting and excuse me to bother as I thought that 's purely Alteryx related issue.
@Rafal_Pietrak .As your data is not unique at any column level, so I would suggest to create an identity column and then cross tab the data on basis of identity column so that your data is not aggregated by Group ID.