Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Removing embedded newlines from multiple field headers

mready
5 - Atom

Hi all,

 

I am working with a dataset that tracks employees' budgeted and actual hours and dollars, on a week by week basis.  My issue is that all weekly columns have embedded newlines within the header. See below for an example.

Employee NameBudgeted Total HoursActual Hours + ForecastedVariance1/1/2019 Actual1/7/2019 Actual1/14/2019 Actual1/21/2019 Forecasted1/28/2019 Forecasted
Bob100125-252525252525
Susan12512502525252525

 

I am trying to track employee hours by week, for ultimate comparison to another dataset that has similar information - however I can't find a way to remove the embedded newlines from those Weekly columns so that I can ultimate compare to true data formats.  I've considered using a formula to strip out the words "Actual" or "Forecasted" - however can't do that because other fields in my file also contain those same words.  Also, this the various weeks labeled as actual vs forecasted will change on a weekly basis, depending on when I will download an updated file - so anything that involves individual selection of these fields won't be too helpful. 

 

I've also tried using a Transpose / RegEx to do this and replace the newlines with a space, which works, but I cannot then "untranspose" my file and get it back to my original format/layout for further scrubbing. 

 

Any insight into this would be greatly appreciated! - Mike

6 REPLIES 6
Garrett
11 - Bolide

Could you read in the data using the "First row contains data" option in the Input Data tool, use the Multi-Field Formula tool to cleanse newlines from all the columns in the header row, Union the cleansed header back with the data, and then use Dynamic Rename tool to rename the columns?

 

Newlines.PNG

To further simplify you can probably even skip the Sample and Union steps and cleanse newlines from every row in the dataset. Add a Select tool after the Dynamic Rename to change the fields to the correct data types and you;re done.

Avinash_K
8 - Asteroid

The renaming of the column can be achieved by dynamic rename tool using formula to select only columns that have character "/" and for rest you can strip the newline character and replace by space . The dyanmic rename tool also allows you to choose any new columns that will come through .

Capture.PNG

mready
5 - Atom

Thanks for the help guys.  Both are helpful.  I'm attempting something a little bit different in line with Avinash's suggestion. Your expression in the Drynamic Rename tool removes the words Actual and Forecasted appropriately, but doesn't quite remove the embedded newlines. I can take those out with a DataCleanse - but is there someway to bake that into the formula? 

 

 

IF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Actual") THEN Trim([_CurrentField_], " Actual")
ELSEIF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Forecasted") THEN Trim([_CurrentField_], "Forecasted")
Else Regex_Replace( [_CurrentField_], "\n", "")
ENDIF

 

I've tried using Trim and the Regex within both If expressions - but then it ends up stipping out the weeks from the header entirely. Here is what I am trying.

 

IF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Actual") THEN Trim([_CurrentField_], " Actual") and Regex_Replace( [_CurrentField_], "\n", "")
ELSEIF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Forecasted") THEN Trim([_CurrentField_], "Forecasted")  and Regex_Replace( [_CurrentField_], "\n", "")
Else Regex_Replace( [_CurrentField_], "\n", "") Not sure what to do with this final "else" piece of the formula. 
ENDIF

 

I know this is probably basic - still learning the language/logic of these formulas.

 

 

 

 

Claje
14 - Magnetar

Hi,


You were super close with your second formula, but instead of doing "AND" to apply two functions (trim and replace newlines), you will want to wrap one inside the other.

 

Here's a revised formula, and a breakdown of just the piece i modified.

 

 

IF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Actual") THEN Regex_Replace(Trim([_CurrentField_], " Actual"), "\n", "")
ELSEIF Contains([_CurrentField_], "/") and Contains([_CurrentField_], "Forecasted") THEN Regex_Replace(Trim([_CurrentField_], "Forecasted") , "\n", "")
Else Regex_Replace( [_CurrentField_], "\n", "") Not sure what to do with this final "else" piece of the formula. 
ENDIF

 

Here's the first piece I modified:

Regex_Replace(Trim([_CurrentField_], " Actual"), "\n", "")

Basically, this uses an order of operations and runs the innermost function first.  So it will trim the current field of the word "Actual".  Then, it will run the Regex_Replace function on our new string (The current field without the word "Actual").

That way you will end up with the appropriate value in all cases.

 

Hope this helps!

mready
5 - Atom

Awesome, thanks Claje!  Worked perfectly now!

tonyzhu
6 - Meteoroid

To play a safer game to ensure no leading or trailing spaces in the column names (which will cause some other issues), suggest to use below formular.

 

Trim(Regex_Replace(Trim([_CurrentField_], " Actual"), "\n", ""))
Labels