Important Community update: The process for changing your account details was updated on June 25th. Learn how this impacts your Community experience and the actions we suggest you take to secure your account here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Speeding up a macro with API calls

m_v
8 - Asteroid

Hello clever people,

 

I've built a macro that goes to a 3rd party website and fetches a considerable amount of data (we're talking millions of rows, dozens of columns). It is very stable and has delivered great value to a lot of people. The users input their API key, the model ID, the table ID, and click the run button. 

 

It is, however, slow. It takes about 9-10 minutes to fetch about 1M rows, and it's not uncommon for me to get as much as 10 M rows. The data changes frequently, so I do need to update the whole thing every time.

 

The 3rd party has a page limit of 10 K rows and 10 MB, and they can't change that. However, they have recently been two new pieces of information that I learnt:

1) the 3rd party developed a new API endpoint

2) the 3rd party can issue multiple API keys for service accounts

 

So with that new info: can I finally make my macro faster?

 

Now I will explain how the version of the endpoint works.

 

This new endpoint returns a unique page key given the target page size, table ID and model ID. So if we have a table with 15000 rows and we set the page size to 5000, we can call the data using these three URLS:

 

https://api.blablabla/pageSize=5000&key=[###]

https://api.blablabla/pageSize=5000&key=[###1]

https://api.blablabla/pageSize=5000&key=[###C]

 

My macro is an iterative macro within a standard macro. The iterative macro fetches all the URLS that are needed to get the data from the target dataset. The standard macro takes those URLs and sticks them into a download tool, then parses out the JSON. 

 

iterative_macro.JPG

standard_macro.JPG

 

I'm looking for creative ideas on how to make this whole thing work faster, considering that I can use several different API keys. I want to get a substantial speed improvement (x2 at least). 

 

What I have tried so far (and nothing produced results):

1) In the standard macro piece, tile  the output of the iterative macro and parallel the downloads

2) Increasing the number of connections in the download tool - cannot go over 32

 

Many thanks in advance for  the help to this wonderful community!

15 REPLIES 15
simonaubert_bd
13 - Pulsar

Hello @m_v  Very interesting topic. Question : have you identified where it is slow exactly?

m_v
8 - Asteroid

It's the iterative piece (URL fetching) that  takes a long time.

m_v
8 - Asteroid

Specifically (just ran a test): current version which I describe in the post took 7 min for 1 M records, and 2 min 18 sec with a static list of URLs. 

 

urls_static.JPG

KGT
8 - Asteroid

I don't have a clear answer for you but hoping my commentary helps see another avenue.

 

Looks to me like there are 3 specific avenues

  • Speed up each call in the iteration
    • This could be checking if multiple calls can be made at once, or testing what the fastest method is to work with this particular API.
  • Speed up each iteration
    • The workflow looks pretty simple, I don't think there's anything you can move outside the iterative macro from after the call... Might just want to check that there's no throttling happening. Try it with 100k, and see if the time is somewhat linear. Also, Try feeding it 9-10k and then wait 3 seconds, and another 9-10k. Just to check for throttling at 10k requests. 
  • Change the iteration
    • I'm not sure exactly how, and maybe the only options are related to the above.
    • Can you construct the first call of each group into one iteration (assuming that there is groups and it's not that every call except the first one needs the response to generate the next row), and then all the responses for the 2nd pages into the next iteration etc? Rather than one call, one response, one iteration.

Basically, each iteration is going to run in series, rather than parallel, so the number of iterations will affect the run time, but API's differ in the way they process/respond and so there may be ways to get quicker responses.

m_v
8 - Asteroid

Thanks a lot for the detailed response. 

 

1) Yes, multiple calls can be made at once. I just don't know how to use this to my benefit. 

 

2) I never do 10K rows as there is always a danger of exceeding 10 MB. I use 9000 and 5000 for certain tables (my logic is embedded in  the macro). I experimented with throttling and did not see any difference. 

 

3) Unfortunately I can only get the page keyes in sequence. "Generic" request returns current key and next key only, so I construct the URL to include next key in the call. Perhaps there are ways around it, but I'm not aware of those. 

KGT
8 - Asteroid

I might be at my limit then... hopefully someone else has another idea to try.

 

By multiple calls at once, I meant asking for multiple returns in one call. Some API's have a way to request multiple Customer ID's on one request for instance. Doesn't sound like that's what this is though as all your calls have to be one after the next.

apathetichell
18 - Pollux

A few things:

 

1) As long as your process is stateful (ie on your personal machine) - you will be limited by your personal machine resources. That means memory and bandwidth in this case are limiting you  more so than Alteryx performance - I would expect that these (particularly bandwidth) would limit you seeing a 2X imporvement. 

2) What is your primary goal here - improve this process - or improve this process in Alteryx? Can you tell us a) is the API endpoint you are hitting in a VPN? Do you need any special access beyond crednetials for it. b) what are you doing with your data after processing?

 

 

m_v
8 - Asteroid

1) I understand this. Moving the process to the server is possible, but not in scope of what I'm focusing on right now.

2) My goal is to improve the process in Alteryx since I have no other means of getting the data in an automated manner. 

a) I do not know anything a VPN present in the endpoint. I only need an API key to access the data. 

b) There are multiple workflows where I join the data with other data sources, summarize, output to google sheets, etc. 

m_v
8 - Asteroid

Does anybody know if scripting this in python would make the process faster? 

Labels