Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Filtering Service Data for AS_Results MongoDB

Inactive User
Not applicable

Hi, Looking to filter the MongoDB input tool for the collection AS_Results. Have 20 GB of data and it takes forever without filtering. Looking to filter the __ServiceData blob prior to inputting; specifically ExecutionStartTime:

 

Tried this {"__ServiceData.ExecutionStartTime": {"$gte":"2019-11-01T00:00:00.000Z"}} in the Criteria but no luck.

7 REPLIES 7
TrevorS
Alteryx Alumni (Retired)

Hello @altysuser987

Can you please provide some additional information? Please provide a copy of your workflow and more details of what you have attempted so far.

 

Thanks,
TrevorS

Community Moderator
rej
7 - Meteor

I realize this is an older thread, but I also have the same need, and I don't see this was ever resolved.  Here is more detail:

 

In my Mongo, the AS_Results collection contains 831K records, which takes 15 mins to query.  Given the volume of workflows run on our servers, this collection will continue to grow at a rapid pace going forward.  Needless to say, as the volume continues to increase, so will the retrieval time.

 

It seems pointless to retrieve all 831K records each time I need the most recent data, so I am looking for a way to filter the output from this collection, based on a datetime field.

 

When I query the collection unfiltered, the only fields available in the collection output are:

 

  • AS_Queue__ID
  • UserName
  • __ServiceData
  • __Version

rej_0-1611243558196.png

 

So, there is no datetime field exposed upon which I can filter.

 

However, when I view the parsed __ServiceData field, there is indeed a datetime field that would be helpful for filtering, ExecutionStartTime:

 

rej_1-1611243656940.png

 

Unfortunately, when I try to filter the collection on this field, it doesn't return any results:

 

{"__ServiceData.ExecutionStartTime": { "$gte": "2021-01-01 00:00:00"}}

 

rej_2-1611243765539.png

 

My questions are:

 

  1. Is it possible to query on the ExecutionStartTime field? 
  2. If so, can someone please provide assistance with the filter criteria syntax? 
  3. If not, is there any other way to effectively limit the output from this collection?
Alekh
9 - Comet

I've needed a solution like this for a while now. Looked into it today. What I needed was to be able to filter the MongoDB results for the day prior to read the logs of scheduled workflows. The steps I took:

1) Create a batch macro to filter the AS_Queue collection for yesterday.

2) Use the Job_ID and use the in filter query in the criteria for filtering the AS_Results collection. Created a batch macro to dynamically update the AS_Results criteria. 

I've attached a workflow that does this.

rej
7 - Meteor

This is what I've ultimately done as well.  It works, but doesn't seem to be an optimally efficient design (in my humble opinion).

dwalker3rd_
5 - Atom

Although a very large AS_Results collection might take a while to query, the primary bottleneck is the time it takes to parse the service data from the AS_Results collection.  So the solution is to filter the data BEFORE we parse the service data.  

 

Instead of using the filter tool after querying the collection, we could use the criteria field in the MongoDB Input tool. But since the AS_Results collection has no datetime field available prior to parsing the service data, we have no choice but to pull the entire collection. However ...

 

While there is no datetime field with which we could filter the AS_Results collection from the MongoDB Input tool, there are datetime fields on the AS_Queue collection. Adding the criteria in the MongoDB Input tool for the CreationDateTime field allows us to filter the AS_Queue collection. The _id field, once parsed (renamed JobId by the Alteryx Server Usage Report macro), can then be joined to the AS_Queue__ID field from the AS_Results collection to limit the volume of AS_Results service data that must be parsed.

 

Here's a snip of my mod to the Alteryx Server Usage Report macro:

 

Filter Criteria for Job Analysis Dataset.png

 

JSON criteria for a datetime field depends on (1) whether the MongoDB Input tool's mode is Manual (criteria is specified by element) or Automatic (criteria is specified for the collection), and (2) whether the field is stored as a string or as a datetime.

 

The CreationDateTime field in the AS_Queue collection is stored as a string. The criteria to filter the AS_Queue collection where the CreationDateTime field is greater than a specified date would be:

 

  • Manual mode: { "$gt" : "2021-07-01T00:00:00.000+0000" }
  • Automatic mode: { "CreationDateTime" : { "$gt" : "2021-07-01T00:00:00.000+0000" } }

The CreationDate field in the sessions collection [in the AlteryxGallery database] is stored as a datetime. The criteria to filter the sessions collection where the CreationDate field is greater than a specified date would be:

 

  • Manual mode: { "$gt" : { "$date" : "2021-07-01T00:00:00.000+0000" } }
  • Automatic mode: { "CreationDate" : { "$gt" : { "$date" : "2021-07-01T00:00:00.000+0000" } } }

Hope this helps folks with the same performance issues and/or folks looking for examples of using criteria in the MongoDB Input tool to filter datetime fields.

MarqueeCrew
20 - Arcturus
20 - Arcturus

Hi ,

As a leader in the Alteryx Community, I have the ability to identify & mark accepted solutions on behalf of community members - and recently did so on this thread. If you have any questions or concerns with the solution(s) I selected please let me know by replying to this post.

As the original author, you also have the ability to mark replies as solutions! Going forward, I’d encourage you to identify the solution or solutions that helped you solve your problem, as it's a big help to other community members. Learn more about Accepted Solutions here.

Thank you!

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
TonyAdam
8 - Asteroid

Thanks for this description. That helped me a lot!

 

I even took it one step further:

I used the JobId from the AS_Queue collection as Criteria for querying the AS_Result collection. To do so I grouped and concatenated the JobId and used it as a Control Parameter for a Batch macro that contains the connection to the AS_Result collection. In the macro each batch replaces the AS_Queue__ID. So I can query only the service data for the jobs of the defined timeframe. I grouped it for 1000 JobIds to avoid that the Criteria gets too long and that works fine. I changed the Icons of the two macros to the MongoDB Input because it is basically that tool with added parameters.

TonyAdam_0-1675172170755.png

TonyAdam_2-1675172891917.png

 

 

 

Labels