This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I am trying to read from a website using its API. The data come in CSV format and in chunk of 10MB. The first issue is that the data cut off right at that 10 MB and the next chunk will continue when the last chunk cut off. So I need to find a way to concatenate all the chunks together and use the result to do some data manipulation. I want it to be dynamic so I can use the result in the same workflow and it can work with different number of chunk.
With my experience with Alteryx I don't think it is possible to do. I just wanted to take a chance by asking the question here.
I added as an attachment the end and the beginning of 2 chunks I have read with the browse tool.
Could you use a summarize tool and concatenate the column without a delimiter? That would turn the entire column into one string which you could then use a text to columns tool to split to rows, use a multi-row tool to create kind of like a record ID that starts from 1 again once it increments up to the total number of desired columns, and then crosstab the data with this record ID as the headers. Then you could use a Dynamic Rename tool afterwards which would set the first row of data which contains your headers to be the actual column headers.
This example should work for your data. You may need to update the first multi-row tool to a different number than 5 if you have a different number of columns but that is a quick fix. This approach will accommodate any number of chunks. Workflow is attached.
Thank you for your reply. It helped me a little bit. I will share with you a picture of a part of the new workflow I did based on your answer.
Like you can see on my workflow, I need to change the chunk number in the URL of the formula tool to make the call to receive each chunk of data. What I understand of your answer is that after each call, I should directly append the output of each chunk in a file and once the append of all the chunks is done, I read the file to make the data manipulation you explained in your first answer.
How can I append each chunk in a file since Alteryx only allow the overwriting of an CSV file.
Assuming I was able to append each chunk, how can I make sure that when I am reading the file, the appending process is over? I want to read the file in the same workflow as the appending process.
Hi @fnkankeu how do you know how many chunks there are in a file?
If there's a way of knowing before the download tool, then you could use a generate rows tool to make the requisite number of urls which would then pass through the download tool. After this you can do as @BrandonB suggested and concatenate all the downloaddatas (grouped by file name), so you get the full data per file.
If you can only know how many chunks there are by incrementing the chunk value until no data comes out, then you will probably have to make an iterative macro, so you can poll the API until the data runs out. After this point you can implement the concatenate method as before.
@fnkankeu sorry, I would also add that you should sort your data by file name and chunk value before you concatenate it, as the download tool can output values in a slightly different order to its input (due to different response times along with the ability to have concurrent connections)