Hi all,
I am working with a dataset that tracks employees' budgeted and actual hours and dollars, on a week by week basis. My issue is that all weekly columns have embedded newlines within the header. See below for an example.
Employee Name | Budgeted Total Hours | Actual Hours + Forecasted | Variance | 1/1/2019 Actual | 1/7/2019 Actual | 1/14/2019 Actual | 1/21/2019 Forecasted | 1/28/2019 Forecasted |
Bob | 100 | 125 | -25 | 25 | 25 | 25 | 25 | 25 |
Susan | 125 | 125 | 0 | 25 | 25 | 25 | 25 | 25 |
I am trying to track employee hours by week, for ultimate comparison to another dataset that has similar information - however I can't find a way to remove the embedded newlines from those Weekly columns so that I can ultimate compare to true data formats. I've considered using a formula to strip out the words "Actual" or "Forecasted" - however can't do that because other fields in my file also contain those same words. Also, this the various weeks labeled as actual vs forecasted will change on a weekly basis, depending on when I will download an updated file - so anything that involves individual selection of these fields won't be too helpful.
I've also tried using a Transpose / RegEx to do this and replace the newlines with a space, which works, but I cannot then "untranspose" my file and get it back to my original format/layout for further scrubbing.
Any insight into this would be greatly appreciated! - Mike
Solved! Go to Solution.
Could you read in the data using the "First row contains data" option in the Input Data tool, use the Multi-Field Formula tool to cleanse newlines from all the columns in the header row, Union the cleansed header back with the data, and then use Dynamic Rename tool to rename the columns?
To further simplify you can probably even skip the Sample and Union steps and cleanse newlines from every row in the dataset. Add a Select tool after the Dynamic Rename to change the fields to the correct data types and you;re done.
The renaming of the column can be achieved by dynamic rename tool using formula to select only columns that have character "/" and for rest you can strip the newline character and replace by space . The dyanmic rename tool also allows you to choose any new columns that will come through .
Thanks for the help guys. Both are helpful. I'm attempting something a little bit different in line with Avinash's suggestion. Your expression in the Drynamic Rename tool removes the words Actual and Forecasted appropriately, but doesn't quite remove the embedded newlines. I can take those out with a DataCleanse - but is there someway to bake that into the formula?
IF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Actual") THEN Trim([_CurrentField_], " Actual")
ELSEIF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Forecasted") THEN Trim([_CurrentField_], "Forecasted")
Else Regex_Replace( [_CurrentField_], "\n", "")
ENDIF
I've tried using Trim and the Regex within both If expressions - but then it ends up stipping out the weeks from the header entirely. Here is what I am trying.
IF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Actual") THEN Trim([_CurrentField_], " Actual") and Regex_Replace( [_CurrentField_], "\n", "")
ELSEIF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Forecasted") THEN Trim([_CurrentField_], "Forecasted") and Regex_Replace( [_CurrentField_], "\n", "")
Else Regex_Replace( [_CurrentField_], "\n", "") Not sure what to do with this final "else" piece of the formula.
ENDIF
I know this is probably basic - still learning the language/logic of these formulas.
Hi,
You were super close with your second formula, but instead of doing "AND" to apply two functions (trim and replace newlines), you will want to wrap one inside the other.
Here's a revised formula, and a breakdown of just the piece i modified.
IF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Actual") THEN Regex_Replace(Trim([_CurrentField_], " Actual"), "\n", "") ELSEIF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Forecasted") THEN Regex_Replace(Trim([_CurrentField_], "Forecasted") , "\n", "") Else Regex_Replace( [_CurrentField_], "\n", "") Not sure what to do with this final "else" piece of the formula. ENDIF
Here's the first piece I modified:
Regex_Replace(Trim([_CurrentField_], " Actual"), "\n", "")
Basically, this uses an order of operations and runs the innermost function first. So it will trim the current field of the word "Actual". Then, it will run the Regex_Replace function on our new string (The current field without the word "Actual").
That way you will end up with the appropriate value in all cases.
Hope this helps!
Awesome, thanks Claje! Worked perfectly now!
To play a safer game to ensure no leading or trailing spaces in the column names (which will cause some other issues), suggest to use below formular.
Trim(Regex_Replace(Trim([_CurrentField_], " Actual"), "\n", ""))