It would be great if you can expand the metadata selection to not be limited to 2 elements (row number and file path) but could potentially add the date timestamp (e.g. $datecreated) to be used in the recipes.
I would like the ability to specify a billing project for BigQuery as part of run options. Currently, data queried from BigQuery is associated to the project from which a Dataprep flow is run with no way to change it. For customers we work with in a multi-project environment, they need the flexibility to align queries to specific projects for purposes of cost and usage attribution.
Additionally, for customers on flat-rate BigQuery pricing, a selectable billing project will allow users to move queries to projects under different reservations for workload balancing and/or performance tuning.
When using an SQL Statement with a WITH Query Expression I am getting the following error: No select statement found. I was told that WITH statements are currently not supported at the moment.
Why this should be changed:
Best regards
Marcel
Details about the syntax:
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax
Related customer questions
Current syntax for WORKDAY function is workday(date1,numDays,[array_holiday]), and the array_holiday can't be a column a table, for example when there's any unpredictable non-trading days like Typhoon weather, we always need to go and change the public holidays in recipe, would prefer if the holidays can be from a column in a table that we can just import and update the table when needed.
When i use Dataprep full day, i would prefer to have a dark theme to preserve my eyes ! If we could swith between a dark or a light theme, it would be the best solution !
:-)
PS : Sorry for my bad English, I'm a French user.
When our users request for productionizing a request, we require that users share their flow with our Data Engineering and Data Operations team. Currently sharing with individuals is working, but you have to remember to put everyone's name in there. It would be nice if you could enter a GCP Group name that contains the users so they only have to remember one user.
When I'm on the Flow workspace, if I do a click on a recipe, the steps displays on the float right box.
Unfortunately, i'can't select and copy steps :-/
I'm forced to load the recipe to copy steps before past it in another recipe. I think that everyone would gain some time to copy steps directly from the Flow view !
:-)
PS : Sorry for my bad English, I'm a French user.
Why I must open all the recipes to reload each sample ?
For exemple :
I make flows with many recipes (between 60 and 100 - It's a real case for me).
On monday, I make a lot of modifications on "data cleaning" at the start of the data wrangling chain !
On tuesday, when i try to open others recipes, I've a warning message "your sample need to be updated" !!!
=> If I had a buttun "update all sample of the flow", I would run it on Monday before sleeping and Tuesday, i could work with smile !
PS : Sorry for my bad English, I'm a French user :-)
When you select a column for apply function or transformation, the methods to select columns are :
But this is not possible to select column with a "RegEx math" method on the name of the columns.
It would be much easier!
Current NIST/NSA standard is SHA-2.
As a data wrangler, I would like to be able to hash a column's data using the SHA-256 hashing algorithm.
Users have asked for the ability to create new versions of recipes so that they can collaborate safely. Also there is a need to keep an audit history of changes. Trifacta has recipe level history but that does not fulfil the whole use case of version control.