Hi all,
I have the following scenario and want to check how I can build a macro to perform the task:
I have a dataset of 300-800k rows (varies daily depending on various factors). I want to build a solution that can split this data into chunks with an upper limit of 100k as the API can only process up to 100k rows at a time. Furthermore, I want this data to go through 3 different tools before it gets stitched together and is pushed as a single Excel output file.
So I am looking for something dynamic that can divide the data based on the total rows in the input file and then pass all the sets of data through the same 3 tools to process and append so there is only one output file.
I have not be able to make it work till now. Would appreciate if anyone has an idea on how this can be achieved.
Solved! Go to Solution.
We need a bit more info here. Is the 100K limit on the input side or on the output side?
If it's on the input side then you need to build a macro that can read all the data. Build an iterative macro where you pass in the URL of the first API call as it's input. The loop exit logic depends on the format of the meta info returned by the API.
Most APIs return the total number of pages/rows and current page/row number for your entire dataset on each page of data. Compare the total pages to engine.iterationnumber+1(engine.iterationnumber is zero based) and if it's less modify the URL to get the next page of data and pass it back to the input. If it's equal, remove the URL record from the input which will cause the iterative macro to stop and return all the data.
Some API's don't provide the total pages, but instead return no rows when the data has been completely transferred. In this case use a Count on the number of returned rows and if it's >0 then pass the modified URL back to the input. If it's 0, remove the URL record
If the API limit is on the output side, you can use a simple batch macro that send one batch of records at a time and set up the number of batches in main workflow.
Dan
So the API is one of the tools that I require once the data has been pulled. Basically the initial data is coming from an Excel file and the API is pulling in another set of data that needs to be joined using a unique key. And the data that is coming from API can been pulled 100k at a time which is core issue.
I hope that makes a bit more sense.
Use an iterative macro to pull your paged API data. Here's an excellent example to get you started. Of course, you'll have to modify it to handles the details of the specific API that you're calling
Dan.
Thanks Dan. I will take a look.