Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

R Output Failure

afreseman
7 - Meteor

I'm banging my head against the wall here.  

 

I have a very simply flow.  I take in a CSV (granted its 24 million rows) and put it into an R script tool.  No matter what I do I keep getting:

R (2) Caught an Error in WriteYXDBStreaming()
R (2) Unknown field type: Unknown

 

I have put a select tool infront of the R tool and unselected the "Unknown" field. I have also limited the CSV to only 10k rows to make sure it wasn't a volume issue. 

 

My R script is as follows:

library(lubridate)
library(plyr)

#What did UTAS say i months ago about j months from today


#AD <- read_csv("~/Documents/Demand/AggregatedDemand.csv", 
#               col_types = cols(DEMAND_DATE = col_date(format = "%Y-%m-%d"), 
#                                MATL_ID = col_character(), SNPSHT_RUN_ID = col_datetime(format = "%Y-%m-%d %H:%M:%S")))

AD <- read.Alteryx("#1", mode="data.frame")

AD$PLANT_ID <- as.character(AD$PLANT_ID)
AD$MATL_ID <- as.character(AD$MATL_ID)
AD$DEMAND_DATE <- as.Date(AD$DEMAND_DATE)
AD$SNPSHT_RUN_ID <- as.POSIXct(AD$SNPSHT_RUN_ID)
AD$SUM_DEMAND_QTY <- as.numeric(AD$SUM_DEMAND_QTY)
AD$DEMAND_TYPE_CD <- as.character(AD$DEMAND_TYPE_CD)
#AD$IS_PAST_DUE_IND <- as.factor(AD$IS_PAST_DUE_IND)

#format SNPSHT_RUN_ID into a date for alteryx sillyness
#AD$SNPSHT_RUN_ID = ymd(AD$SNPSHT_RUN_ID)

#Would current date need to be SNPSHT_RUN_ID - 1 day?
current_date <- max(AD$SNPSHT_RUN_ID)
#i months ago ## 0 indicates current forecast from today 
months_ago_list <- c(0,1) #,3,6) you cant do beyond 3 at this point , we dont have the data 
#j months ahead
months_ahead_list <- c(1,3,6,12)


final = NULL
 for (i in months_ago_list) {
   
   
   month_ago = current_date - months(i)
   
   for (j in months_ahead_list){
     month_ahead_end = current_date + months(j)
#     
#     #i month ago UTAS said the demand for today  plus j month isthe sum of the demand qty by each material
#     #Previous Forecast
     PF<- subset(AD, SNPSHT_RUN_ID == month_ago)
     PF<- subset(PF, DEMAND_DATE < month_ahead_end)
     PF<- subset(PF, DEMAND_DATE >= current_date)
     PF<- ddply(PF, "MATL_ID", summarise,i_month_ago_j_month_of_demand = sum(SUM_DEMAND_QTY))
#     
#     
#     ## Current Forecast , need this for 1, 3, 6, 12 months
#     #i=0
     CF<- subset(AD, SNPSHT_RUN_ID == current_date)
     CF<- subset(CF, DEMAND_DATE < month_ahead_end)
     CF<- subset(CF, DEMAND_DATE >= current_date)
     CF<- ddply(CF, "MATL_ID", summarise,zero_month_ago_j_month_of_demand = sum(SUM_DEMAND_QTY))
#     
#     
#     
#     
#     #Merge the current forecast with the previous forecast 
#     ## What happens when the current forecast did not have this material in the previous forecast 
     merge<-join(CF, PF, by = 'MATL_ID')
#		 mergedf <- as.data.frame(merge)
#			mergedf["i"] <- NA
#			mergedf$i <- i
#		 mergedf[,j] <- i
#		mutate(merge,newi = i)
#     merge$i <- i
#     merge$j <- j
#     merge$SNP_SHT_DAY <- current_date
     merge$i_month_ago_j_month_of_demand[is.na(merge$i_month_ago_j_month_of_demand)] <- 0
     merge$zero_month_ago_j_month_of_demand[is.na(merge$zero_month_ago_j_month_of_demand)] <- 0
#     
     final<- rbind(final, merge)
#     
   }
 }
# 
# ############################################################## oNLY USE CODE ABOVE, NEED TO CREATE REST IN aLTERYX
# 
# 
# 
 final$churn <- with(final, (zero_month_ago_j_month_of_demand - i_month_ago_j_month_of_demand)/i_month_ago_j_month_of_demand)
 final$churn2 <- with(final, (pmax(zero_month_ago_j_month_of_demand,i_month_ago_j_month_of_demand)-pmin(zero_month_ago_j_month_of_demand,i_month_ago_j_month_of_demand))/pmax(zero_month_ago_j_month_of_demand,i_month_ago_j_month_of_demand))




#merge<-  mutate(merge, churn = ((zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand)/i.month.ago.j.month.of.demand))
#merge<- mutate(merge, churn2 = (pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))



#merge2<- ddply(merge2, summarise,churn.sum = sum(churn))


#             zero.sum = sum(zero.month.ago.j.month.of.demand),
#            i.j.sum = sum(i.month.ago.j.month.of.demand),
#            total.churn.pct = ((zero.sum - i.j.sum)/i.j.sum)*100)



#I think we need to build out a table like this. My thought is also, DT could eventually store a table 
#like this then discard snapshots older than a certain amount of time. 

#Plant ID / i / j / material.id / demand.type / snapsht.day / CF(0,j) / PF(i,j) / Churn = ((CF -PF)/PF) /
# In the future another column that can be added to this is Daily.Demade (Items that were actually shipped)
#Apurvas notes says to use Firm Sales Orders (Not Delivery)
# this table could thenbe used to forecast accuracy

#final[] <- lapply(final, as.factor)

final$RecordID <- as.character(final$RecordID)
final$PLANT_ID <- as.character(final$PLANT_ID)
final$MATL_ID <- as.character(final$MATL_ID)
final$SNPSHT_RUN_ID <- as.POSIXct(final$SNPSHT_RUN_ID)
final$DEMAND_DATE <- as.Date(final$DEMAND_DATE)
final$SUM_DEMAND_QTY <- as.numeric(final$SUM_DEMAND_QTY)
final$DEMAND_TYPE_CD <- as.character(final$DEMAND_TYPE_CD)
#final$IS_PAST_DUE_IND <- as.factor(final$IS_PAST_DUE_IND)
final$RecordID2 <- as.character(final$RecordID)
final$PLANT_ID2 <- as.character(final$PLANT_ID)
final$MATL_ID2 <- as.character(final$MATL_ID)
final$SNPSHT_RUN_ID2 <- as.POSIXct(final$SNPSHT_RUN_ID)
final$DEMAND_DATE2 <- as.Date(final$DEMAND_DATE)
final$SUM_DEMAND_QTY2 <- as.numeric(final$SUM_DEMAND_QTY)
final$DEMAND_TYPE_CD2 <- as.character(final$DEMAND_TYPE_CD)
#final$IS_PAST_DUE_IND2 <- as.factor(final$IS_PAST_DUE_IND)
final$i_month_ago_j_month_of_demand <- as.numeric(final$i_month_ago_j_month_of_demand)
final$zero_month_ago_j_month_of_demand <- as.numeric(final$zero_month_ago_j_month_of_demand)
final$churn <- as.numeric(final$churn)
final$churn2 <- as.numeric(final$churn2)

reallyfinal <- as.data.frame(final)


write.Alteryx(reallyfinal, 1)


It's not a terribly complicated script so I'm lost as to why Alteryx is behaving like this.

11 REPLIES 11
BenMoss
ACE Emeritus
ACE Emeritus

This is going to be hard to debug without the .csv file you are using. Could you perhaps replicate the issue with a non-sensitve dataset, or perhaps a more simple option, anonymise your current dataset and attach it to this post.

 

Ben

BenMoss
ACE Emeritus
ACE Emeritus

You could also check out this post which seems to have a similar issue.

 

Ben

 

https://community.alteryx.com/t5/Alteryx-Connect-Gallery/WriteYXDBStreaming-Error-using-R-node/td-p/...

afreseman
7 - Meteor

@BenMosswrote:

You could also check out this post which seems to have a similar issue.

 

Ben

 

https://community.alteryx.com/t5/Alteryx-Connect-Gallery/WriteYXDBStreaming-Error-using-R-node/td-p/...


Yeah I tried that stuff there to no avail.  I'll work on trying to sanitize my dataset and get it down to a smaller chunk.

afreseman
7 - Meteor

 


@BenMosswrote:

This is going to be hard to debug without the .csv file you are using. Could you perhaps replicate the issue with a non-sensitve dataset, or perhaps a more simple option, anonymise your current dataset and attach it to this post.

 

Ben


Here's the sanitized dataset.  I limited it to 10k rows and still get the error on my end. BTW, I'm also on Alteryx 10.6 which is R 3.2.3.  Unfortunately, we can't upgrade yet.

DylanB
Alteryx Alumni (Retired)

Are you getting the YXDBStreaming errors when running on the data you sent? I'm getting errors, but they're not yxdb streaming errors. When I get to 

final$RecordID <- as.character(final$RecordID)
final$PLANT_ID <- as.character(final$PLANT_ID)
final$MATL_ID <- as.character(final$MATL_ID)
final$SNPSHT_RUN_ID <- as.POSIXct(final$SNPSHT_RUN_ID)
final$DEMAND_DATE <- as.Date(final$DEMAND_DATE)
final$SUM_DEMAND_QTY <- as.numeric(final$SUM_DEMAND_QTY)
final$DEMAND_TYPE_CD <- as.character(final$DEMAND_TYPE_CD)
#final$IS_PAST_DUE_IND <- as.factor(final$IS_PAST_DUE_IND)
final$RecordID2 <- as.character(final$RecordID)
final$PLANT_ID2 <- as.character(final$PLANT_ID)
final$MATL_ID2 <- as.character(final$MATL_ID)
final$SNPSHT_RUN_ID2 <- as.POSIXct(final$SNPSHT_RUN_ID)
final$DEMAND_DATE2 <- as.Date(final$DEMAND_DATE)
final$SUM_DEMAND_QTY2 <- as.numeric(final$SUM_DEMAND_QTY)
final$DEMAND_TYPE_CD2 <- as.character(final$DEMAND_TYPE_CD)
#final$IS_PAST_DUE_IND2 <- as.factor(final$IS_PAST_DUE_IND)
final$i_month_ago_j_month_of_demand <- as.numeric(final$i_month_ago_j_month_of_demand)
final$zero_month_ago_j_month_of_demand <- as.numeric(final$zero_month_ago_j_month_of_demand)
final$churn <- as.numeric(final$churn)
final$churn2 <- as.numeric(final$churn2)

there are only 5 columns in final prior to that part of the script:

  • MAIL_ID
  • zero_month_ago_j_month_of_demand
  • i_month_ago_j_month_of_demand
  • churn
  • churn2

So the as.[type](column) is returning NULLs in a lot of cases, and there'll be errors about "replacement has zero rows, data has [some higher number]"

 

 

My guess is that the YXDBStreaming error is popping up from within write.Alteryx, but I can't replicate the issue on my end, as I can't get that far and confirm or troubleshoot.

 

 

Here's what might work (a bit of a shot in the dark since I can't replicate) - instead of doing the type coercions (the lines I posted above), would it work okay for your purpose to write out 'final' before, and then perform the type coercions in Alteryx with a select tool? My guess is that there is some R type used in 'final' that isn't supported by the Alteryx-R interface. Perhaps one of those type coercions results in such an invalid type.

afreseman
7 - Meteor

Thanks.  I tried that, but still got errors.  A colleague of mine worked on getting this script to 3.2.3 and working and the only thing failing is the output.

 

library(lubridate)
library(plyr)

#What did UTAS say i months ago about j months from today


#AD <- read_csv("~/Documents/Demand/AggregatedDemand.csv", 
#               col_types = cols(DEMAND_DATE = col_date(format = "%Y-%m-%d"), 
#                                MATL_ID = col_character(), SNPSHT_RUN_ID = col_datetime(format = "%Y-%m-%d %H:%M:%S")))

AD <- read.Alteryx("#1", mode="data.frame")

AD$DEMAND_DATE<-as.Date(AD$DEMAND_DATE)
AD$DEMAND_DATE<-as.POSIXct(AD$DEMAND_DATE)
AD$SNPSHT_RUN_ID <-as.POSIXct(AD$SNPSHT_RUN_ID)


#Would current date need to be SNPSHT_RUN_ID - 1 day?
current.date <- (max(AD$SNPSHT_RUN_ID))
#i months ago ## 0 indicates current forecast from today 
months.ago.list <- c(0,1) #,3,6) you cant do beyond 3 at this point , we dont have the data 
#j months ahead
months.ahead.list <- c(1,3,6,12)


#newdate <- current.date - 1*60*60*24*30

final = NULL
for (i in months.ago.list) {
  
  
  month.ago = current.date - (i*60*60*24*30)
  
  for (j in months.ahead.list){
    month.ahead.end = current.date + (j*60*60*24*30)
    
    #i month ago UTAS said the demand for today  plus j month isthe sum of the demand qty by each material
    #Previous Forecast
    PF<- subset(AD, SNPSHT_RUN_ID == month.ago)
    PF<- subset(PF, DEMAND_DATE < month.ahead.end)
    PF<- subset(PF, DEMAND_DATE >= current.date)
    PF<- ddply(PF, "MATL_ID", summarise,i.month.ago.j.month.of.demand = sum(SUM_DEMAND_QTY))
    
    
    ## Current Forecast , need this for 1, 3, 6, 12 months
    #i=0
    CF<- subset(AD, SNPSHT_RUN_ID == current.date)
    CF<- subset(CF, DEMAND_DATE < month.ahead.end)
    CF<- subset(CF, DEMAND_DATE >= current.date)
    CF<- ddply(CF, "MATL_ID", summarise,zero.month.ago.j.month.of.demand = sum(SUM_DEMAND_QTY))
    
    
    
    
    #Merge the current forecast with the previous forecast 
    ## What happens when the current forecast did not have this material in the previous forecast 
    merge<-join(CF, PF, by = 'MATL_ID')
    merge$i <- i
    merge$j <- j
    merge$SNP_SHT_DAY <- current.date
    merge$i.month.ago.j.month.of.demand[is.na(merge$i.month.ago.j.month.of.demand)] <- 0
    merge$zero.month.ago.j.month.of.demand[is.na(merge$zero.month.ago.j.month.of.demand)] <- 0
    
    final<- rbind(final, merge)
    
  }
}

############################################################## oNLY USE CODE ABOVE, NEED TO CREATE REST IN aLTERYX



final$churn <- with(final, (zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand)/i.month.ago.j.month.of.demand)
final$stable <- with(final, (1-(pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)))
final$unstable <- with(final, ((pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)))



final$churn[is.na(final$churn)] <- 0
final$stable[is.na(final$stable)] <- 0
final$unstable[is.na(final$unstable)] <- 0
final$impact <- with(final, (zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand))



#write.csv(final, "C:/Users/KORMANJJ/Documents/My Tableau Repository/Workbooks/stability.csv")
#merge<-  mutate(merge, churn = ((zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand)/i.month.ago.j.month.of.demand))
#merge<- mutate(merge, churn2 = (pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))



#merge2<- ddply(merge2, summarise,churn.sum = sum(churn))


#             zero.sum = sum(zero.month.ago.j.month.of.demand),
#            i.j.sum = sum(i.month.ago.j.month.of.demand),
#            total.churn.pct = ((zero.sum - i.j.sum)/i.j.sum)*100)



#I think we need to build out a table like this. My thought is also, DT could eventually store a table 
#like this then discard snapshots older than a certain amount of time. 

#Plant ID / i / j / material.id / demand.type / snapsht.day / CF(0,j) / PF(i,j) / Churn = ((CF -PF)/PF) /
# In the future another column that can be added to this is Daily.Demade (Items that were actually shipped)
#Apurvas notes says to use Firm Sales Orders (Not Delivery)
# this table could then be used to forecast accuracy

write.Alteryx(final, 1)

At this point. I can't for the life of me understand why Alteryx would have a problem.

 

DylanB
Alteryx Alumni (Retired)

I see. I can reproduce now. The issue is with the I/O between Alteryx and R on POSIXct types. A minimal example to reproduce would be 

vec <- as.POSIXct(c("2018-03-02"))
df <- data.frame(dates = vec)
write.Alteryx(df)

Which throws the same error for the same reasons. 

 

For your code, I was able to resolve this by adding the line

 

final["SNP_SHT_DAY"] <- as.character(final["SNP_SHT_DAY"])

before the write.Alteryx

 

and it will write the dates out as strings.

 

As a note, for the most recent script you posted, I also had to do

AD$SUM_DEMAND_QTY <- as.numeric(AD$SUM_DEMAND_QTY)

after the read.Alteryx (otherwise, it would be a factor, and you can't take the sum in the for loop)

 

 

POSIXct times in R aren't stored  in the date formats that you see them printed in. They're stored as more complex objects (with a format for printing and an int representing seconds since 01-01-1970). Convert them to a character or int, then output them. That said, I'd say it's a bug that the I/O doesn't support converting these to datetime Alteryx types. For now, you'll need to do it manually (output as character, use select tool).

afreseman
7 - Meteor

I need to buy you a beer or another beverage of your choice.  That was it!  I can't believe it was that.  I might have to put a ticket into Alteryx to get that taken care of.  Or at least give a better error message than what it gives. Thank you so much!

afreseman
7 - Meteor

Last question: adding that last statement to convert to a character gives us a character vector like:

c(1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000,...)

 

Seems to me that it's converting EVERY value in the frame and sticking it in a character vector in that part of the frame.  Is there a better way to do this?

Labels