As far as I know, the current error logs for a failed Trifacta job do not tell the user which recipe, which recipe step, and on what data the error was thrown.
This lack of basic information on the Trifacta level makes it hard for a normal user to debug Trifacta jobs. Typically, I will have to work backwards in the flow, attaching and running an output for each recipe until I can find the culprit recipe causing the issues. Then, I will have to disable steps one by one until I find the step that causes the recipe to fail. This is time and resource consuming.
As for the offending data triggering the problem, I still don't know how to get that, and that's actually crucially important for an ongoing issue we're having with Spark execution.
Therefore, I suggest that improved and simplified error logging would be very helpful in fixing problems in the future. Thank you for your consideration.
My use case is that by looking at the target table data, i need to have a column which will indicate which flow has loaded data into that target table. this will be useful for bug fix and tracing back data issues to a flow.
Rgt now we are hardcoding this value as a new column in the recipe step, but if some developer changes the flow name he/she has to manually change the recipe step to reflect the flow name, instead if we can have a dynamic flow name like we have $Filepath for filepath on similar fashion it will be useful.
When I export a flow that contains a reference dataset, the name of the JSON file downloaded doesn't match the name of the flow that was exported. Instead it matches the name of the reference dataset inside the flow. I would like to change this so that a flow always keeps its original name when exported from Trifacta.
We need the full steps on GCP Dataprep and GCP to allow us to run scheduled jobs as a true service account (not a user account) and not require authentication of the owning user account (which is timing out in the night due to 16 hour policy for users we have)
So when we schedule a job we should be able to choose a true technical account to "run the job as".
We have an issue as our AD users are synchronised from on premise and a 16 hour timeout policy is applied to each user so any job scheduled with a user will fail after 16 hours and job will be disabled . There is no way for us at our company to sync ad users to GCP IAM without this policy from on premise so we need to be able to run with Service Account.
Please redirect the user back to the page where the session has expired, instead of redirecting to home once the user re-authenticates.
Current scenario we are seeing that the user will be redirected to home page instead of the page he was in when the session has expired after set time in the config, in my case after 30 mins. ( this is because sometime user goes to a meeting forgets about the page he was working and he has to re-open everything from home page after re-auth )
Have an option when scheduling jobs and if they fail to restart after X minutes. Most of the time when I have a job failure and rerun, it completes fine.
Allow a connection to a geo coding system, like USPS or Google, that allow you to join and run a demographics dataset through to have longitude and latitude added the output for mapping. I can see a lot of uses for this and especially in the Marketing and Advertising sector.
We often use hashing functions like fingerprint in SQL (Big Query) to mark or identify rows that match for specific attributes or to generate UUIDs. I know it's possible to do so by adding UDFs, but it would be more convenient to have a native function.
When using an SQL Statement with a WITH Query Expression I am getting the following error: No select statement found. I was told that WITH statements are currently not supported at the moment.
Why this should be changed:
Best regards
Marcel
Details about the syntax:
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax
Related customer questions
Hi,
I use the import by folder for GCS files to import many files in one time (present and futurs files droped in the same folder). Somtimes, the dataschema of files is not exactly the same for all the files but the columns names are always the same ! I'would like to use the "union by name" for the first union of the many files included in the folder that i've imported. With this function, if the dataschema change in the futur, my importation will be ok whatever !
We could have a screen like "recipe union screen" for the "import with union" (for the inports by folder) to select the columns to import and the type of matching for exemple...
This is a real issue for me because when the datascema of one file has changed, the scheduled RUNs are KO...
Sorry for my bad English, I'm French :-)
Thanks !
When you select a column for apply function or transformation, the methods to select columns are :
But this is not possible to select column with a "RegEx math" method on the name of the columns.
It would be much easier!
Why I must open all the recipes to reload each sample ?
For exemple :
I make flows with many recipes (between 60 and 100 - It's a real case for me).
On monday, I make a lot of modifications on "data cleaning" at the start of the data wrangling chain !
On tuesday, when i try to open others recipes, I've a warning message "your sample need to be updated" !!!
=> If I had a buttun "update all sample of the flow", I would run it on Monday before sleeping and Tuesday, i could work with smile !
PS : Sorry for my bad English, I'm a French user :-)
I often receive data sets which have rows above the column headers that I don't need. When importing the data set, there is a dropdown on the edit menu to "make the first row a column header". However, I would like for this dropdown to include an option to for example, "make row 20 the column header and delete all preceding rows". This would allow me to import the data already with column headers. When dealing with one dataset, I can always choose any row to make it the column headers, but when you have to join 20 similar datasets, it is not possible to do the same. Not sure if my idea is clear (lol), but it seems like it's something that could be easily incorporated into the tool. Thanks!
We would strongly like the ability to be able to edit datasets, created with custom SQL that have been shared with us. We think of Trifacta in part as a shared development space so if 1 users needs to make an update to a dataset but wasn't originally the owner - this slows down our workflow considerably.
The ability to apply various interpolation methods (cspline, linear, etc.) between sorted columns of integers.
Use a linked datasets created by GCP Analytic Hub as data source in DataPrep. Detailed informations in link below:
Can I use linked dataset (created by Analytic Hub in GCP) to build flows in DataPrep? (trifacta.com)
Case: 00027615 - created the case for our issue but came to know that functionality is not present
We had OAuth login issue when trying to set up with SNOWFLAKE as we use OKTA as our IDP for SNOWFLAKE.
We want our users to create their own SNOWFLAKE connector using their personal credentials through IDP which will enforce their role in SNOWFLAKE so they can see only the schema's which they are allowed to see.
We can not create generic connector because it will provide more data access then user needed and involve PII too so we want to utilize their snowflake functional roles to restrict it.
Its a really good use case for anyone using snowflake with IDP and have the RBAC set up with SNOWFLAKE.
Allow functionality in app for customizing support page, users to be able to contact our team when there is an issue with the application, page to show our email address, not Trifacta support email address