Hello community,
I am struggling with a task and am yet to find a simple solution to my problem. Maybe someone of you is able to help me.
So, I have header file and a csv file containing the data as input. (The data file already has the field names as header)
What I want to do is: For every column in the data a formula should check for the corresponding data type with the field name in the header file and perform a function on the entire column depending on the result.
For example: The data has a column named "Notes" where users have manually put in notes. The header file specifies that the Field "Notes" has the data type "manual input". Now I want to create a function that removes unwanted characters from all columns with the data type "manual input". In excel this would be very easy with a VLOOKUP formula like this: IF(VLOOKUP("Notes", "header table", 2, TRUE) = "manual input" , REPLACE(.....))
I hope you understand what my goal is, if you need a clearer example please let me know!
regards hasi82
Solved! Go to Solution.
Hi @hasi82 ; I do think we could do with sample inputs for the two datasources, then exactly what the output from this would be. Building this out will help us truly understand the end-to-end and then we can build a workflow that connects the dots!
@benMoss thanks for your reply! I hope I can make it a little clearer:
Sample header:
Field Name | Data Type | Length | Decimals |
Customer | characters | 28 | 0 |
Order Value | currency | 12 | 2 |
Date | date | 8 | 0 |
Notes | manual_input | 36 | 0 |
Sample data:
Customer | Order Value | Date | Notes |
Company 1 | 1234.00 | 20190101 | |
Company 2 | 8764.45 | 20190102 | bought book "alteryx for dummies" |
Company 3 | 8952.00 | 20190103 | |
Company 4 | 5423.23 | 20190104 | |
Company 5 | 863123.00 | 20190105 | |
Company 6 | 87533.67 | 20190106 | |
Company 7 | 432.88 | 20190107 | is an **bleep** |
Company 8 | 2723.00 | 20190108 |
Now what I want to do is: for every column in the data file, Alteryx performs different functions and tests, depending on the data type, length and decimals in the header file.
My example above: For every column in the data, the corresponding data type etc. is being looked up in the header table. In case of Notes that would be "manual_input". Then for all columns with data type "manual_input" the quotes are being removed from the string.
Another example. For all column headers in the data matching with data type "date" in the header table, it is being checked if it conforms with the format YYYYMMDD. In the cases where it doesn't, the string gets replaced with 00000000.
Or for the Order Value it is being checked if the format is conform with the 2 decimals and length of 12.
I hope could make myself more understandable. Note that my data tables have hundreds of columns and I need a solution that is easily transferable for a different data set with different header file, as I can't define functions manually for every single column.
Edit: Basically my overall goal is to make sure that every column in the data is complete, has a certain format and so on
hi @hasi82
Here's the general idea about you would go about something like this
1. Add a rowID to use as a key in your final crosstab,
2. Transpose, giving a set of name/value rows
3. Add a RowValueID to sort your results into the proper order before the crosstab
4. Join with your header table
5. For each type, "manual_input", "date"
a. Filter the records for that type
b. Apply the corrections according to the type
6. Union all the results
7. Sort according to RowValueID
8. Crosstab
Add in what ever rules you need to each of data type streams. The Transpose/Process/Crosstab pattern allows you to handle any number of columns as efficiently as possible
Note: that I changed one of the dates in the data to trigger the 00000000 rule and that I didn't enter all the possible rules.
Dan
thank you @danilang, this helped me a lot. I had already tried it with the Transpose and crossTab but I think I failed with it because I didn't add any IDs to group and sort by.