Hi everyone,
I’m currently using Alteryx Designer 2023.1 to create a workflow that pulls daily step count and heart rate data from a wearable device API. The REST API call works fine for smaller data ranges, but when I try to fetch a full month’s worth of records, the workflow slows down drastically and sometimes times out.
I’ve tried breaking the request into weekly intervals and using the Download tool with pagination, but I’m not sure if that’s the most efficient approach. Storing the data in a staging table before processing also crossed my mind, but that adds extra steps I was hoping to avoid.
I recall a discussion by Wellness Hub on data optimization in analytics tools, and I’m wondering if similar strategies could help improve API ingestion performance in Alteryx.
Would appreciate any advice or example workflows from the community. Thanks!
Hey --- without knowing the api, the quantity of data you are using,your system specs --- and the other tools in your workflow --- it's difficult to identify why you are seeing this behavior.
The most likely cause is ---> your api is trying to download too much data and it's either being throttled by the server or your network is still transmitting the data. If this is say 10,000,000+ events --- each with 20 json fields --- this would make a ton of sense. That's a lot of data -- -and it takes a long time to process/retrieve it.
One the alteryx side --- 1) you have to be able to store your data in memory 2) certain tools (looking at you data cleanse) can be terrible with memory handling. --- on a 16gb machine --- I've been able to process tens of millions of records --- but it can take time. if your api allow field limiting (this would be in the api specs) make sure that you are limiting the api call to only the fields you need.
if you are using a join AFTER you load your data --- make sure you are not duplicating records in your join. Use a summarize tool on your smaller data set to make sure only unique values are flowing through. 10,000,000 records is one thing ---> duplicate joins creating 100,000,000 is another.
Run it on schedule and store that data on daily basis in a repository. Then you can just pick the data from there whenever you need to.