Important Community update: The process for changing your account details was updated on June 25th. Learn how this impacts your Community experience and the actions we suggest you take to secure your account here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Speeding up a macro with API calls

m_v
8 - Asteroid

Hello clever people,

 

I've built a macro that goes to a 3rd party website and fetches a considerable amount of data (we're talking millions of rows, dozens of columns). It is very stable and has delivered great value to a lot of people. The users input their API key, the model ID, the table ID, and click the run button. 

 

It is, however, slow. It takes about 9-10 minutes to fetch about 1M rows, and it's not uncommon for me to get as much as 10 M rows. The data changes frequently, so I do need to update the whole thing every time.

 

The 3rd party has a page limit of 10 K rows and 10 MB, and they can't change that. However, they have recently been two new pieces of information that I learnt:

1) the 3rd party developed a new API endpoint

2) the 3rd party can issue multiple API keys for service accounts

 

So with that new info: can I finally make my macro faster?

 

Now I will explain how the version of the endpoint works.

 

This new endpoint returns a unique page key given the target page size, table ID and model ID. So if we have a table with 15000 rows and we set the page size to 5000, we can call the data using these three URLS:

 

https://api.blablabla/pageSize=5000&key=[###]

https://api.blablabla/pageSize=5000&key=[###1]

https://api.blablabla/pageSize=5000&key=[###C]

 

My macro is an iterative macro within a standard macro. The iterative macro fetches all the URLS that are needed to get the data from the target dataset. The standard macro takes those URLs and sticks them into a download tool, then parses out the JSON. 

 

iterative_macro.JPG

standard_macro.JPG

 

I'm looking for creative ideas on how to make this whole thing work faster, considering that I can use several different API keys. I want to get a substantial speed improvement (x2 at least). 

 

What I have tried so far (and nothing produced results):

1) In the standard macro piece, tile  the output of the iterative macro and parallel the downloads

2) Increasing the number of connections in the download tool - cannot go over 32

 

Many thanks in advance for  the help to this wonderful community!

15 REPLIES 15
apathetichell
18 - Pollux

Probably depends upon how you'd write it. Like in Alteryx - I'd recommend using multiple rows fed into a single download tool vs an iterative macro or something like that. a batch macro would be useful is core memory is an issue. 

 

A Python script (outside of Alteryx) could be faster depending upon how you code this. You are stlil going to face memory and bandwidth issues.  If this is written poorly - it will not be faster. I do not love gsheets native support for Python - and expect this might be an issue for you down the line. Python in Alteryx will not work.

 

If this was my project - I'd dump the macros and do this entirely in a standard workflow. I do not believe an iterative macro can ever help performance. A batch macro can because of memory allocation (at least on some versions of Alteryx).

 

You're current performance sounds decent - and my next step would be to do this via python in a containerized cloud environment on schedule (ie ECS/EC2) - and then push the data into my DB/S3. 

m_v
8 - Asteroid

The bottleneck of my process is getting the page keys, and I really don't see a way to do that using anything besides an iterative macro. The API call returns current key and next key, that's why I'm looping through the list one by one. Perhaps there is a better way? I've never used batch macros before. 

apathetichell
18 - Pollux

So many of these apis use a rowcount/offset system - with call 1 returning the total count. This allows you to use generate rows and then use floor/ceil division to create the offset commands needed to create the syntax needed for successive calls. If you do not have that row count - you will not be able to do that and you would have to use an interative macro. Is the API documentation public - and if so - can you link to it?

 

 

m_v
8 - Asteroid

The documentation isn't public but - I used this exact approach in the previous version of my macro: getting the total count of rows and splitting it into "batches". Then for every batch (record ID) I would run the same iterative macro (green on  this screenshot) getting the data. The batches could run in parallel and then the outputs would get unioned together. 

Are you saying that I can use a batch macro instead of an iterative macro here?

 

v3_macro.JPG

m_v
8 - Asteroid

I figured it out - many thanks, @apathetichell !!

 

All this time I did not realize that the download tool can ingest multiple URLs at the same time. 

So I completely removed the iterative piece and instead created a formula that assembles urls based on the total page count:

 

api.blablabla/pageindex=1&pagezie=9000
api.blablabla/pageindex=2&pagezie=9000
api.blablabla/pageindex=3&pagezie=9000

etc

 

And this whole thing goes in a download tool, no problem. 

 

I was able to decrease my macro run time almost x5 times!!

apathetichell
18 - Pollux

Great to hear @m_v - can you mark my suggestion as a solution? I'm a big of the multiple call/offset/limit approach.

Labels