Alteryx Designer Desktop Discussions

rnewby · ‎08-22-2018

I am attempting to turn an R script that I have into an Alteryx flow. Unfortunately, I can't share the data behind this, but I changed it a bit so I can share the following R code. My issue is that this code doesn't do the correct things in Alteryx. It works perfectly fine in R Studio, however.

Note: I think stringr and Hmisc are the only packages I use in this section of code, although I have about seven packages read in altogether for this script.

Going into this step of the code, a sample of the column headers look like this:

[1] "ABCD.ns1:awardID.ns1:awardContractID.ns1:companyID"

[2] "ABCD.ns1:awardID.ns1:awardContractID.ns1:modNumber"

[3] "ABCD.ns1:typeID.ns1:referenceID.ns1:companyID"
[4] "ABCD.ns1:typeID.ns1:referenceID.ns1:companyCity"
[5] "ABCD.ns1:purchaserInformation.ns1:shippingCompanyID"
[6] "ABCD.ns1:purchaserInformation.ns1:shippingCompanyID.departmentID"

After running this code in R Studio, the columns look like:

[1] "AwardContractID_companyID(ABCD)"

[2] "AwardContractID_modNumber(ABCD)"

[3] "ReferenceID_companyID(ABCD)"

[4] "ReferenceID_companyCity(ABCD)"

[5] "ShippingCompanyID(ABCD)"

[6] "ShippingCompanyID.departmentID(ABCD)"

However, after running the same code in the R tool in Alteryx, the columns look like:

[1] "ABCD.Ns1.Awardid.Ns1:Awardcontractid.Ns1.Companyid(ABCD)"

[2] "ABCD.Ns1.Awardid.Ns1:Awardcontractid.Ns1.Modnumber(ABCD)"

[3] "ABCD.Ns1.Typeid.Ns1.Referenceid.Ns1.Companyid(ABCD)"
[4] "ABCD.Ns1.Typeid.Ns1.Referenceid.Ns1.Companycity(ABCD)"
[5] "ABCD.Ns1.Purchaserinformation.Ns1.Shippingcompanyid(ABCD)"
[6] "ABCD.Ns1.Purchaserinformation.Ns1.Shippingcompanyid.Departmentid(ABCD)"

There must be some reason why Alteryx isn't running this correctly, but I can't figure it out. The rest of my R script seems to work fine. It is only this section that is causing me problems.

i = 0
ind = c()
name_list <- c()
for (name in colnames(Main_Data)) {
  i = i + 1
  x <- str_detect(name, "^ABCD")
  if (x == TRUE) {
    a <- str_detect(name, "awardContractID")
    b <- str_detect(name, "referenceID")
    if (a == TRUE) {
      ind <- c(ind,i)
      y <- unlist(strsplit(name, split = ":", fixed = TRUE))
      z <- y[length(y)]
      name2 <- paste0("awardContractID_",z,"(ABCD)")
      name_list <- c(name_list, name2)
    } else if (b == TRUE) {
      ind <- c(ind,i)
      y <- unlist(strsplit(name, split = ":", fixed = TRUE))
      z <- y[length(y)]
      name2 <- paste0("referenceID_",z,"(ABCD)")
      name_list <- c(name_list, name2)
    } else {
      ind <- c(ind,i)
      y <- unlist(strsplit(name, split = ":", fixed = TRUE))
      z <- y[length(y)]
      name2 <- paste0(z,"(ABCD)")
      name_list <- c(name_list, name2)
    }
  }
}
i <- 1
for (j in ind) {
  colnames(Main_Data)[j] <- name_list[i]
  i = i + 1
}

colnames(Main_Data) <- capitalize(colnames(Main_Data))

Any help would be greatly appreciated. Thanks!

JohnJPS · ‎08-22-2018

A wild guess is to assess the data types coming in: I like v_string or v_wstring of sufficient max length, trimmed of excess whitespace. Perhaps sone sanity checks within R too, to ensure it's not the data

rnewby · ‎08-23-2018

It looks like everything going in is v_string and everything coming out of the R code is v_wstring. I'm not sure if that is normal behavior, as I'm not too familiar with Alteryx data types, but it seems OK. The data check in R seems fine. I just can't figure out why the code doesn't do the exact same thing in both places.

JohnJPS · ‎08-23-2018

Since these are columns names of pretty intense complexity, could you try simply reading in your dataset in R, then outputting it again, literally doing nothing to it.

You may find that the column names are altered as part of the hand-off from Alteryx to R. Doing this may help to understand what's happening there and allow you to work around it somehow.

PS, you could also do a Field Info to get the column names, then use normal Alteryx tools to develop the new names, then a Dynamic Rename to swap the new names back in. :-)

Hope that helps!

rnewby · ‎08-23-2018

Good catch! I tried outputting the data with no changes, and apparently, Alteryx-to-R replaces all colons with periods. So when my code tries to split on the colon, it is not splitting on anything whatsoever.

Out of curiosity, what would you recommend to do complex renaming? RegEx is the first thing that comes to mind, but that is brutal to use for all the odd text changes I need.

JohnJPS · ‎08-23-2018

In Alteryx, I would probably do the Field Info / Formulas / Dynamic Rename approach for modifying complex names. The Formula tool has some pretty good string manipulation functions, and also has RegEx functions (similar to the RegEx tool. The nice thing about doing it this way is the intermediate view that Alteryx always gives you at each tool in the flow... (don't have to code in print statements in R to spit out intermediate data or whatever have you.)

DavidxL · ‎08-27-2018

I've also run into this issue before and my solution was to read in the column names as a second data source. The original data will go to the R tool as input #1, and I also pass it through a Field Info tool which will give the column names as input #2.

(The Select tool is there to deselect all the other Field Info which aren't needed, but you don't necessarily have to do this). Then in the R tool you can read in the names from the Field Info side and then assign them to the column names of your original data, like so:

x <- read.Alteryx("#1", mode="data.frame")
xnames <- read.Alteryx("#2", mode="data.frame")
colnames(x) <- xnames$Name

rnewby · ‎08-28-2018

Thank you. This worked perfectly.

It's strange -- I wonder why reading the column names in separately doesn't affect them in the same way as reading the entire data frame. An Alteryx mystery...

DavidxL · ‎08-29-2018

I suspect this is more of an R issue, because similarly if you try to read in a CSV directly in R (i.e. in RStudio using read.csv() ) it will also try to convert the column names into "valid" names. See the R help for ?make.names for more info. I think this is because in R you can attach and call columns of a dataframe as "bare name" R variables, so it's generally good practice that the column names of your data frame are actually legal R variable names.

Reading in the column names separately is passing them in as data rather than column names, and after they're read in you can assign them to the data frame even if it can cause potential issues.

rnewby · ‎08-30-2018

Interesting! I had no idea that R changed the data columns to fit with any particular standard. I always use the readr package to bring in CSVs to R, so I never noticed this. A made up a few lines of test data with "-" and ":" in the column names. And then I read the first in using the default read.csv() and the second using the readr read_csv():

Apologies for blaming Alteryx for this behavior!

Are you aware of any alternative way of reading data into Alteryx besides read.Alteryx("#1", mode="data.frame")? That's the only way I've seen data read into the R Tool.

Alteryx Designer Desktop Discussions

R Code running differently in Alteryx