Hi there, I'm facing an error on Alteryx Server and want to know if some of you have found the same issue and how to overcome it.
The steps I did are as follows:
1 - Create a workflow that reads data from a table on Google BigQuery, using DCM Quick Connect to do it.
2 - Write data to a table on Google BigQuery, using DCM bulk to do it.
3 - Publish the workflow on server.
4 - Synchronize your credentials on Connection Manager to server.
5 - Share your workflow in a collection where a college has access.
6 - Share your credentials on DCM Data Sources with that college.
7 - Ask him to schedule your workflow.
When doing this, the workflow scheduled by my college ran once with no errors, but from the second atempt i got the following error.
Which translates to "Failed to complete write to Google BigQuery: Missing object."
I've made the tests and it only happend on this specific cenario.
I was following this link to make the connection possible What is Data Connection Manager (DCM)?
My Alteryx server version is 2022.3.1.430
I'm having the same exact issue on 22.3. This is probably a bug.
Slightly different circumstance though:
1. Opened workflow from server Collection that is owned by a colleague with an existing schedule.
2. Changed output to Bulk - DCM with Connection that both myself and colleague have OAuth access to. Sync'd Connections successfully.
3. Saved workflow back to Collection.
4. Shared Connection via Gallery > DCM Data Sources
5. Let the schedule run with the new workflow.
When I save to my own Workspace and do a Manual or Scheduled run it always succeeds, but not under the ownership of my colleague who owns the Workflow and the Schedule.
But given that the only Alteryx documentation says, "share the connection", this should work.
So, I'm thinking the issue is on the Google Cloud Storage side, since it's looking for a missing "object", which is either a Python issue or the GCS storage object is not being allocated.
Can anyone else look into this?
Alteryx 2023.1.1.281 Patch 3
We had a user that had a similar issue:
Error - "Missing Object" meaning the DCM tool couldn't find any Data Source or Data Credential object in the DCM.
This seems like erratic behavior from the Server side and this DCM technology.
This is a matrix of the scenarios we have test on our team. If someone could please verify our results with your own tests, that would be great.
Failures are "Missing Object" errors where its like the DCM tech of the user running the workflow can't find the DCM Connection Object that was Shared with it.
Conditions:
Workflow Owner | Data Connection (DCM) | Schedule Owner | Manual Run | Scheduled Run | |||
User 1 | User 2 | User 1 | User 2 | ||||
User 1 | User 1 | User 1 | Yes | No | Yes* | No | |
User 2 | User 1 | User 1 | Yes | No | Yes * | No | |
User 2 | User 2 | User 1 | No | Yes | No | Yes | |
User 1 | User 2 | User 1 | No | Yes | Maybe ** | Yes | |
* May get failures on occasion | |||||||
**Has failed when creating a new schedule. Has failed on the second scheduled run after a successful first run. |
Is your data in a GCS bucket - or just in a BQ table? Does your Server have an underlying GCP identity (ie is a compute instance) or is it otherwise synched to a GCP project (ie can it be granted access to BQ via GCP IAM service account)? If so - I'd recommend skipping the BQ tool/connection and scripting this as a GCP CLI transfer to GCS - and let BQ internally do the heavy work.
Hey @apathetichell thanks for your comments!
Is your data in a GCS bucket - or just in a BQ table? The Google - DCM - Bulk uses the GCS bucket as the methodology. The error occurs before this is instantiated, I'm pretty sure, since it says, "Missing Object" which led me to believe it was the DCM Object(s) (see the documentation). GCS also uses "objects" but I don't think it would give the same error.
Does your Server have an underlying GCP identity (ie is a compute instance) or is it otherwise synched to a GCP project (ie can it be granted access to BQ via GCP IAM service account)? I don't know, I don't think so. And in these cases the Server would be using my User to run the workflow, rather than a Service account. Our org does have a Service Account to GCP but they don't let it out readily. So I think this avenue is a non-starter at this point.
If so - I'd recommend skipping the BQ tool/connection and scripting this as a GCP CLI transfer to GCS - and let BQ internally do the heavy work. What do you mean by scripting this as a GCP CLI transfer? Do you mean the whole workflow as a script in the GCP CLI? If so, that's not advantageous since we need to leverage Alteryx for the transformation layer, but again I'm not quite sure what you mean.
I'm open to any other ideas you have.
Hey - so in GCP a service account isn't exactly a service account in common parlance. A service account in GCP is a both an entity and IAM permission (ie you have permission to use a service account - which is mostly an internal construct but determines what you can access and what your level of access is). This is incredibly confusing if you use both GCP and AWS...
You auth into GCP - you auth via oauth - right? (ie a browser window pops up and you click through) - this would not be recreatable on Server. access via JWT would be. If your InfoSec won't let you - your InfoSec won't let you. If your Server is really an EC2 or GPC Compute Instance it probably has a native role wich you can build up - and which your infosec may be happy with - because all access would be via role/federated identity (at least if they are like my infosec/cloudsec they'd be happier).
In my vision you'd avoid the sillinees of Alteryx's BQ interaction by scripting the last part (ie putting the data in BQ) - via a transfer to GCS and having BQ build the table off of GCS.
And yes - object is probably a DCM object or other internal object - not a GCS object.