Hi,
I am trying to read from a website using its API. The data come in CSV format and in chunk of 10MB. The first issue is that the data cut off right at that 10 MB and the next chunk will continue when the last chunk cut off. So I need to find a way to concatenate all the chunks together and use the result to do some data manipulation. I want it to be dynamic so I can use the result in the same workflow and it can work with different number of chunk.
With my experience with Alteryx I don't think it is possible to do. I just wanted to take a chance by asking the question here.
I added as an attachment the end and the beginning of 2 chunks I have read with the browse tool.
I need some help.
Thanks
Solved! Go to Solution.
Could you use a summarize tool and concatenate the column without a delimiter? That would turn the entire column into one string which you could then use a text to columns tool to split to rows, use a multi-row tool to create kind of like a record ID that starts from 1 again once it increments up to the total number of desired columns, and then crosstab the data with this record ID as the headers. Then you could use a Dynamic Rename tool afterwards which would set the first row of data which contains your headers to be the actual column headers.
This example should work for your data. You may need to update the first multi-row tool to a different number than 5 if you have a different number of columns but that is a quick fix. This approach will accommodate any number of chunks. Workflow is attached.
Data in "chunks" as you mentioned:
Solution
Hi @fnkankeu
Can you provide this in excel and also can you show us how the 2nd chunk continuing the 1st chunk looks like.
We will be happy to help : )
Hi Brandon,
Thank you for your reply. It helped me a little bit. I will share with you a picture of a part of the new workflow I did based on your answer.
Like you can see on my workflow, I need to change the chunk number in the URL of the formula tool to make the call to receive each chunk of data. What I understand of your answer is that after each call, I should directly append the output of each chunk in a file and once the append of all the chunks is done, I read the file to make the data manipulation you explained in your first answer.
How can I append each chunk in a file since Alteryx only allow the overwriting of an CSV file.
Assuming I was able to append each chunk, how can I make sure that when I am reading the file, the appending process is over? I want to read the file in the same workflow as the appending process.
I hope I was clear.
Thanks for your help
Hi @fnkankeu how do you know how many chunks there are in a file?
If there's a way of knowing before the download tool, then you could use a generate rows tool to make the requisite number of urls which would then pass through the download tool. After this you can do as @BrandonB suggested and concatenate all the downloaddatas (grouped by file name), so you get the full data per file.
If you can only know how many chunks there are by incrementing the chunk value until no data comes out, then you will probably have to make an iterative macro, so you can poll the API until the data runs out. After this point you can implement the concatenate method as before.
Hope that helps,
Ollie
@fnkankeu sorry, I would also add that you should sort your data by file name and chunk value before you concatenate it, as the download tool can output values in a slightly different order to its input (due to different response times along with the ability to have concurrent connections)
Hi @ollieClarke,
Yes there is a way to know how much chunk I have. You said "then you could use a generate rows tool to make the requisite number of urls which would then pass through the download tool."
Can you give me an example please of how to do that?
Thanks
Hi @fnkankeu
As I can see you are mentioning the chunk.
You can generate chunk number using generate rows tool like below and pass them through the download tool at once it can process multiple rows of request URLs like below
Hope this helps : )
Hey @fnkankeu sure 🙂
so here I started with a field telling me how many chunks (Num_Chunks)
As your chunks start from 0, I've assumed that 5 chunks would be the range 0-4. I set up the generate rows tool to create a new column (Chunk) like this:
which will generate every number between 0 and Num_Chunks-1
We can then create the url with a formula tool as you did
Hope that helps,
Ollie