Alteryx Designer Desktop Discussions

afreseman · ‎03-01-2018

I'm banging my head against the wall here.

I have a very simply flow. I take in a CSV (granted its 24 million rows) and put it into an R script tool. No matter what I do I keep getting:

R (2) Caught an Error in WriteYXDBStreaming()
R (2) Unknown field type: Unknown

I have put a select tool infront of the R tool and unselected the "Unknown" field. I have also limited the CSV to only 10k rows to make sure it wasn't a volume issue.

My R script is as follows:

library(lubridate)
library(plyr)

#What did UTAS say i months ago about j months from today


#AD <- read_csv("~/Documents/Demand/AggregatedDemand.csv", 
#               col_types = cols(DEMAND_DATE = col_date(format = "%Y-%m-%d"), 
#                                MATL_ID = col_character(), SNPSHT_RUN_ID = col_datetime(format = "%Y-%m-%d %H:%M:%S")))

AD <- read.Alteryx("#1", mode="data.frame")

AD$PLANT_ID <- as.character(AD$PLANT_ID)
AD$MATL_ID <- as.character(AD$MATL_ID)
AD$DEMAND_DATE <- as.Date(AD$DEMAND_DATE)
AD$SNPSHT_RUN_ID <- as.POSIXct(AD$SNPSHT_RUN_ID)
AD$SUM_DEMAND_QTY <- as.numeric(AD$SUM_DEMAND_QTY)
AD$DEMAND_TYPE_CD <- as.character(AD$DEMAND_TYPE_CD)
#AD$IS_PAST_DUE_IND <- as.factor(AD$IS_PAST_DUE_IND)

#format SNPSHT_RUN_ID into a date for alteryx sillyness
#AD$SNPSHT_RUN_ID = ymd(AD$SNPSHT_RUN_ID)

#Would current date need to be SNPSHT_RUN_ID - 1 day?
current_date <- max(AD$SNPSHT_RUN_ID)
#i months ago ## 0 indicates current forecast from today 
months_ago_list <- c(0,1) #,3,6) you cant do beyond 3 at this point , we dont have the data 
#j months ahead
months_ahead_list <- c(1,3,6,12)


final = NULL
 for (i in months_ago_list) {
   
   
   month_ago = current_date - months(i)
   
   for (j in months_ahead_list){
     month_ahead_end = current_date + months(j)
#     
#     #i month ago UTAS said the demand for today  plus j month isthe sum of the demand qty by each material
#     #Previous Forecast
     PF<- subset(AD, SNPSHT_RUN_ID == month_ago)
     PF<- subset(PF, DEMAND_DATE < month_ahead_end)
     PF<- subset(PF, DEMAND_DATE >= current_date)
     PF<- ddply(PF, "MATL_ID", summarise,i_month_ago_j_month_of_demand = sum(SUM_DEMAND_QTY))
#     
#     
#     ## Current Forecast , need this for 1, 3, 6, 12 months
#     #i=0
     CF<- subset(AD, SNPSHT_RUN_ID == current_date)
     CF<- subset(CF, DEMAND_DATE < month_ahead_end)
     CF<- subset(CF, DEMAND_DATE >= current_date)
     CF<- ddply(CF, "MATL_ID", summarise,zero_month_ago_j_month_of_demand = sum(SUM_DEMAND_QTY))
#     
#     
#     
#     
#     #Merge the current forecast with the previous forecast 
#     ## What happens when the current forecast did not have this material in the previous forecast 
     merge<-join(CF, PF, by = 'MATL_ID')
#		 mergedf <- as.data.frame(merge)
#			mergedf["i"] <- NA
#			mergedf$i <- i
#		 mergedf[,j] <- i
#		mutate(merge,newi = i)
#     merge$i <- i
#     merge$j <- j
#     merge$SNP_SHT_DAY <- current_date
     merge$i_month_ago_j_month_of_demand[is.na(merge$i_month_ago_j_month_of_demand)] <- 0
     merge$zero_month_ago_j_month_of_demand[is.na(merge$zero_month_ago_j_month_of_demand)] <- 0
#     
     final<- rbind(final, merge)
#     
   }
 }
# 
# ############################################################## oNLY USE CODE ABOVE, NEED TO CREATE REST IN aLTERYX
# 
# 
# 
 final$churn <- with(final, (zero_month_ago_j_month_of_demand - i_month_ago_j_month_of_demand)/i_month_ago_j_month_of_demand)
 final$churn2 <- with(final, (pmax(zero_month_ago_j_month_of_demand,i_month_ago_j_month_of_demand)-pmin(zero_month_ago_j_month_of_demand,i_month_ago_j_month_of_demand))/pmax(zero_month_ago_j_month_of_demand,i_month_ago_j_month_of_demand))




#merge<-  mutate(merge, churn = ((zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand)/i.month.ago.j.month.of.demand))
#merge<- mutate(merge, churn2 = (pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))



#merge2<- ddply(merge2, summarise,churn.sum = sum(churn))


#             zero.sum = sum(zero.month.ago.j.month.of.demand),
#            i.j.sum = sum(i.month.ago.j.month.of.demand),
#            total.churn.pct = ((zero.sum - i.j.sum)/i.j.sum)*100)



#I think we need to build out a table like this. My thought is also, DT could eventually store a table 
#like this then discard snapshots older than a certain amount of time. 

#Plant ID / i / j / material.id / demand.type / snapsht.day / CF(0,j) / PF(i,j) / Churn = ((CF -PF)/PF) /
# In the future another column that can be added to this is Daily.Demade (Items that were actually shipped)
#Apurvas notes says to use Firm Sales Orders (Not Delivery)
# this table could thenbe used to forecast accuracy

#final[] <- lapply(final, as.factor)

final$RecordID <- as.character(final$RecordID)
final$PLANT_ID <- as.character(final$PLANT_ID)
final$MATL_ID <- as.character(final$MATL_ID)
final$SNPSHT_RUN_ID <- as.POSIXct(final$SNPSHT_RUN_ID)
final$DEMAND_DATE <- as.Date(final$DEMAND_DATE)
final$SUM_DEMAND_QTY <- as.numeric(final$SUM_DEMAND_QTY)
final$DEMAND_TYPE_CD <- as.character(final$DEMAND_TYPE_CD)
#final$IS_PAST_DUE_IND <- as.factor(final$IS_PAST_DUE_IND)
final$RecordID2 <- as.character(final$RecordID)
final$PLANT_ID2 <- as.character(final$PLANT_ID)
final$MATL_ID2 <- as.character(final$MATL_ID)
final$SNPSHT_RUN_ID2 <- as.POSIXct(final$SNPSHT_RUN_ID)
final$DEMAND_DATE2 <- as.Date(final$DEMAND_DATE)
final$SUM_DEMAND_QTY2 <- as.numeric(final$SUM_DEMAND_QTY)
final$DEMAND_TYPE_CD2 <- as.character(final$DEMAND_TYPE_CD)
#final$IS_PAST_DUE_IND2 <- as.factor(final$IS_PAST_DUE_IND)
final$i_month_ago_j_month_of_demand <- as.numeric(final$i_month_ago_j_month_of_demand)
final$zero_month_ago_j_month_of_demand <- as.numeric(final$zero_month_ago_j_month_of_demand)
final$churn <- as.numeric(final$churn)
final$churn2 <- as.numeric(final$churn2)

reallyfinal <- as.data.frame(final)


write.Alteryx(reallyfinal, 1)

It's not a terribly complicated script so I'm lost as to why Alteryx is behaving like this.

BenMoss · ‎03-02-2018

This is going to be hard to debug without the .csv file you are using. Could you perhaps replicate the issue with a non-sensitve dataset, or perhaps a more simple option, anonymise your current dataset and attach it to this post.

Ben

BenMoss · ‎03-02-2018

You could also check out this post which seems to have a similar issue.

Ben

https://community.alteryx.com/t5/Alteryx-Connect-Gallery/WriteYXDBStreaming-Error-using-R-node/td-p/...

afreseman · ‎03-02-2018

@BenMosswrote:
You could also check out this post which seems to have a similar issue.

Ben

https://community.alteryx.com/t5/Alteryx-Connect-Gallery/WriteYXDBStreaming-Error-using-R-node/td-p/...

Yeah I tried that stuff there to no avail. I'll work on trying to sanitize my dataset and get it down to a smaller chunk.

afreseman · ‎03-02-2018

@BenMosswrote:
This is going to be hard to debug without the .csv file you are using. Could you perhaps replicate the issue with a non-sensitve dataset, or perhaps a more simple option, anonymise your current dataset and attach it to this post.

Ben

Here's the sanitized dataset. I limited it to 10k rows and still get the error on my end. BTW, I'm also on Alteryx 10.6 which is R 3.2.3. Unfortunately, we can't upgrade yet.

DylanB · ‎03-02-2018

Are you getting the YXDBStreaming errors when running on the data you sent? I'm getting errors, but they're not yxdb streaming errors. When I get to

final$RecordID <- as.character(final$RecordID)
final$PLANT_ID <- as.character(final$PLANT_ID)
final$MATL_ID <- as.character(final$MATL_ID)
final$SNPSHT_RUN_ID <- as.POSIXct(final$SNPSHT_RUN_ID)
final$DEMAND_DATE <- as.Date(final$DEMAND_DATE)
final$SUM_DEMAND_QTY <- as.numeric(final$SUM_DEMAND_QTY)
final$DEMAND_TYPE_CD <- as.character(final$DEMAND_TYPE_CD)
#final$IS_PAST_DUE_IND <- as.factor(final$IS_PAST_DUE_IND)
final$RecordID2 <- as.character(final$RecordID)
final$PLANT_ID2 <- as.character(final$PLANT_ID)
final$MATL_ID2 <- as.character(final$MATL_ID)
final$SNPSHT_RUN_ID2 <- as.POSIXct(final$SNPSHT_RUN_ID)
final$DEMAND_DATE2 <- as.Date(final$DEMAND_DATE)
final$SUM_DEMAND_QTY2 <- as.numeric(final$SUM_DEMAND_QTY)
final$DEMAND_TYPE_CD2 <- as.character(final$DEMAND_TYPE_CD)
#final$IS_PAST_DUE_IND2 <- as.factor(final$IS_PAST_DUE_IND)
final$i_month_ago_j_month_of_demand <- as.numeric(final$i_month_ago_j_month_of_demand)
final$zero_month_ago_j_month_of_demand <- as.numeric(final$zero_month_ago_j_month_of_demand)
final$churn <- as.numeric(final$churn)
final$churn2 <- as.numeric(final$churn2)

there are only 5 columns in final prior to that part of the script:

MAIL_ID
zero_month_ago_j_month_of_demand
i_month_ago_j_month_of_demand
churn
churn2

So the as.[type](column) is returning NULLs in a lot of cases, and there'll be errors about "replacement has zero rows, data has [some higher number]"

My guess is that the YXDBStreaming error is popping up from within write.Alteryx, but I can't replicate the issue on my end, as I can't get that far and confirm or troubleshoot.

Here's what might work (a bit of a shot in the dark since I can't replicate) - instead of doing the type coercions (the lines I posted above), would it work okay for your purpose to write out 'final' before, and then perform the type coercions in Alteryx with a select tool? My guess is that there is some R type used in 'final' that isn't supported by the Alteryx-R interface. Perhaps one of those type coercions results in such an invalid type.

afreseman · ‎03-02-2018

Thanks. I tried that, but still got errors. A colleague of mine worked on getting this script to 3.2.3 and working and the only thing failing is the output.

library(lubridate)
library(plyr)

#What did UTAS say i months ago about j months from today


#AD <- read_csv("~/Documents/Demand/AggregatedDemand.csv", 
#               col_types = cols(DEMAND_DATE = col_date(format = "%Y-%m-%d"), 
#                                MATL_ID = col_character(), SNPSHT_RUN_ID = col_datetime(format = "%Y-%m-%d %H:%M:%S")))

AD <- read.Alteryx("#1", mode="data.frame")

AD$DEMAND_DATE<-as.Date(AD$DEMAND_DATE)
AD$DEMAND_DATE<-as.POSIXct(AD$DEMAND_DATE)
AD$SNPSHT_RUN_ID <-as.POSIXct(AD$SNPSHT_RUN_ID)


#Would current date need to be SNPSHT_RUN_ID - 1 day?
current.date <- (max(AD$SNPSHT_RUN_ID))
#i months ago ## 0 indicates current forecast from today 
months.ago.list <- c(0,1) #,3,6) you cant do beyond 3 at this point , we dont have the data 
#j months ahead
months.ahead.list <- c(1,3,6,12)


#newdate <- current.date - 1*60*60*24*30

final = NULL
for (i in months.ago.list) {
  
  
  month.ago = current.date - (i*60*60*24*30)
  
  for (j in months.ahead.list){
    month.ahead.end = current.date + (j*60*60*24*30)
    
    #i month ago UTAS said the demand for today  plus j month isthe sum of the demand qty by each material
    #Previous Forecast
    PF<- subset(AD, SNPSHT_RUN_ID == month.ago)
    PF<- subset(PF, DEMAND_DATE < month.ahead.end)
    PF<- subset(PF, DEMAND_DATE >= current.date)
    PF<- ddply(PF, "MATL_ID", summarise,i.month.ago.j.month.of.demand = sum(SUM_DEMAND_QTY))
    
    
    ## Current Forecast , need this for 1, 3, 6, 12 months
    #i=0
    CF<- subset(AD, SNPSHT_RUN_ID == current.date)
    CF<- subset(CF, DEMAND_DATE < month.ahead.end)
    CF<- subset(CF, DEMAND_DATE >= current.date)
    CF<- ddply(CF, "MATL_ID", summarise,zero.month.ago.j.month.of.demand = sum(SUM_DEMAND_QTY))
    
    
    
    
    #Merge the current forecast with the previous forecast 
    ## What happens when the current forecast did not have this material in the previous forecast 
    merge<-join(CF, PF, by = 'MATL_ID')
    merge$i <- i
    merge$j <- j
    merge$SNP_SHT_DAY <- current.date
    merge$i.month.ago.j.month.of.demand[is.na(merge$i.month.ago.j.month.of.demand)] <- 0
    merge$zero.month.ago.j.month.of.demand[is.na(merge$zero.month.ago.j.month.of.demand)] <- 0
    
    final<- rbind(final, merge)
    
  }
}

############################################################## oNLY USE CODE ABOVE, NEED TO CREATE REST IN aLTERYX



final$churn <- with(final, (zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand)/i.month.ago.j.month.of.demand)
final$stable <- with(final, (1-(pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)))
final$unstable <- with(final, ((pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)))



final$churn[is.na(final$churn)] <- 0
final$stable[is.na(final$stable)] <- 0
final$unstable[is.na(final$unstable)] <- 0
final$impact <- with(final, (zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand))



#write.csv(final, "C:/Users/KORMANJJ/Documents/My Tableau Repository/Workbooks/stability.csv")
#merge<-  mutate(merge, churn = ((zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand)/i.month.ago.j.month.of.demand))
#merge<- mutate(merge, churn2 = (pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))



#merge2<- ddply(merge2, summarise,churn.sum = sum(churn))


#             zero.sum = sum(zero.month.ago.j.month.of.demand),
#            i.j.sum = sum(i.month.ago.j.month.of.demand),
#            total.churn.pct = ((zero.sum - i.j.sum)/i.j.sum)*100)



#I think we need to build out a table like this. My thought is also, DT could eventually store a table 
#like this then discard snapshots older than a certain amount of time. 

#Plant ID / i / j / material.id / demand.type / snapsht.day / CF(0,j) / PF(i,j) / Churn = ((CF -PF)/PF) /
# In the future another column that can be added to this is Daily.Demade (Items that were actually shipped)
#Apurvas notes says to use Firm Sales Orders (Not Delivery)
# this table could then be used to forecast accuracy

write.Alteryx(final, 1)

At this point. I can't for the life of me understand why Alteryx would have a problem.

DylanB · ‎03-02-2018

I see. I can reproduce now. The issue is with the I/O between Alteryx and R on POSIXct types. A minimal example to reproduce would be

vec <- as.POSIXct(c("2018-03-02"))
df <- data.frame(dates = vec)
write.Alteryx(df)

Which throws the same error for the same reasons.

For your code, I was able to resolve this by adding the line

final["SNP_SHT_DAY"] <- as.character(final["SNP_SHT_DAY"])

before the write.Alteryx

and it will write the dates out as strings.

As a note, for the most recent script you posted, I also had to do

AD$SUM_DEMAND_QTY <- as.numeric(AD$SUM_DEMAND_QTY)

after the read.Alteryx (otherwise, it would be a factor, and you can't take the sum in the for loop)

POSIXct times in R aren't stored in the date formats that you see them printed in. They're stored as more complex objects (with a format for printing and an int representing seconds since 01-01-1970). Convert them to a character or int, then output them. That said, I'd say it's a bug that the I/O doesn't support converting these to datetime Alteryx types. For now, you'll need to do it manually (output as character, use select tool).

afreseman · ‎03-04-2018

I need to buy you a beer or another beverage of your choice. That was it! I can't believe it was that. I might have to put a ticket into Alteryx to get that taken care of. Or at least give a better error message than what it gives. Thank you so much!

afreseman · ‎03-05-2018

Last question: adding that last statement to convert to a character gives us a character vector like:

c(1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000,...)

Seems to me that it's converting EVERY value in the frame and sticking it in a character vector in that part of the frame. Is there a better way to do this?

Alteryx Designer Desktop Discussions

R Output Failure