We need a custom viewer role so that user is able only to use connections shared to him, but not re-share those connections to others. In our case, admin will set up the connections for users and they will just use them. Users should not be able to create or share connections. This will improve the connection security and access to data.
It will be nice Trifacta to be able to export files in CDM format (Common Data Model) to ADLS gen2 so that they are fed automatically in PowerBI for reporting purposes
Please allow connections to be created from Trifacta to SharePoint online using SSO authentication, just like for Azure SQL/DWH.
Being able to Publish outputs directly to Google Sheets would be a major benefit for Sheets users.
It would be great if you can expand the metadata selection to not be limited to 2 elements (row number and file path) but could potentially add the date timestamp (e.g. $datecreated) to be used in the recipes.
We need the ability the create folders underneath the plans. We can create folders underneath flows, but not underneath plans. Additionally, having the ability to create sub folders inside of these parent flow and plan folders is needed. Hard to organize flows and plans without the ability to put them in categories (folders) and subcategories (sub folders) when you approach hundreds of plans and flows.
In order to monitor the status of the plan that has been running several different flows inside, in my case it is around 300, I send the HTTP request to Datadog to display the result of failed and success on a dashboard. The problem is, DATADOG understands only epoch timestamp and not the datetime value. Right now we cannot convert the timestamp into epoch. I was thinking of approaching this problem in the following ways:
1) Having a pre-request script
2) Creating dynamic parameters in Dataprep instead of using a fixed value, that can be used further in the HTTP request body
3) This is just the turnaround - Creating a table that stores the flow name and timestamp in it, and we are supposed to use this table in a plan every time we are running a flow. But this is not the right way. It will work but it is waste of time as we will end up creating separate tables like this one for each flow.
I'm looking for a way to discover which datasets, recipes, or outputs are taking up the most time and resources.
it would also be nice if we were able to view this over time as well.
an example would be sometime like the Unity3d profiler.
https://docs.unity3d.com/uploads/Main/profiler-window-layout.png
this is for a video game engine, but i hope the system can be similar.
in this profiler you can see what resource (ram,cpu, gpu) is being used and by what character/object in your video game.
similarly it would be nice to see what database is being used by what flow in trifacta.
Current syntax for WORKDAY function is workday(date1,numDays,[array_holiday]), and the array_holiday can't be a column a table, for example when there's any unpredictable non-trading days like Typhoon weather, we always need to go and change the public holidays in recipe, would prefer if the holidays can be from a column in a table that we can just import and update the table when needed.
Allow for more then 1 job to be deleted at a time.
Current NIST/NSA standard is SHA-2.
As a data wrangler, I would like to be able to hash a column's data using the SHA-256 hashing algorithm.
I would like the ability to specify a billing project for BigQuery as part of run options. Currently, data queried from BigQuery is associated to the project from which a Dataprep flow is run with no way to change it. For customers we work with in a multi-project environment, they need the flexibility to align queries to specific projects for purposes of cost and usage attribution.
Additionally, for customers on flat-rate BigQuery pricing, a selectable billing project will allow users to move queries to projects under different reservations for workload balancing and/or performance tuning.
We can migrate flows from one environment to other environment using Trifacta APIs.
Export and Import the flow from source to target.
Rename the flow.
Share flow with appropriate user according to environment.
Change the input and output of the flow.
We at Grupo Boticário, who currently have 13k Dataprep licenses and close to the official launch internally, have noticed a recurring request for a translation of the tool. Bearing in mind that it will be an enabler for more users to use in their day-to-day work, I would like to formalize and reinforce the importance of our request for translation into Brazilian Portuguese as well as a forecast of this improvement.
Currently, If there is use-case that the data needs to brought from tables resting in different databases of a same cluster. We have to create n connections for n databases.
But being in same cluster, one should be able to access different databases with a single connection otherwise the connection list gets long and messy.
Currently there is support for parameterizing variables in custom SQL dataset in Dataprep. However it requires that the tables using this feature have the same table structure. This request is to allow this same functionality but with tables that have different table structures.
Example:
Table A
dev.animals.dogs
name | height | weight
Table B
dev.animals.cats
name| isFriendly
Would like to use a query where we have 1 custom SQL dataset where we just say
SELECT * FROM dev.animals.[typeOfAnimal]
typeOfAnimal being the parameterized variable with a default of dogs.
Users to be able to create multiple connections to data lake. Currently user needs to add new data lake path and browse it in order to import data
In our organization we would like to export path to data lake to be enforced and thus user not able to export to any location but a location desired by the application admins.
Users onboarded to Trifacta cannot be deleted from the GUI, only using API. In the GUI users can only be disabled but they still count toward the licensed users. Please allow users to be deleted from the GUI.