I'm banging my head against the wall here.
I have a very simply flow. I take in a CSV (granted its 24 million rows) and put it into an R script tool. No matter what I do I keep getting:
R (2) Caught an Error in WriteYXDBStreaming()
R (2) Unknown field type: Unknown
I have put a select tool infront of the R tool and unselected the "Unknown" field. I have also limited the CSV to only 10k rows to make sure it wasn't a volume issue.
My R script is as follows:
library(lubridate) library(plyr) #What did UTAS say i months ago about j months from today #AD <- read_csv("~/Documents/Demand/AggregatedDemand.csv", # col_types = cols(DEMAND_DATE = col_date(format = "%Y-%m-%d"), # MATL_ID = col_character(), SNPSHT_RUN_ID = col_datetime(format = "%Y-%m-%d %H:%M:%S"))) AD <- read.Alteryx("#1", mode="data.frame") AD$PLANT_ID <- as.character(AD$PLANT_ID) AD$MATL_ID <- as.character(AD$MATL_ID) AD$DEMAND_DATE <- as.Date(AD$DEMAND_DATE) AD$SNPSHT_RUN_ID <- as.POSIXct(AD$SNPSHT_RUN_ID) AD$SUM_DEMAND_QTY <- as.numeric(AD$SUM_DEMAND_QTY) AD$DEMAND_TYPE_CD <- as.character(AD$DEMAND_TYPE_CD) #AD$IS_PAST_DUE_IND <- as.factor(AD$IS_PAST_DUE_IND) #format SNPSHT_RUN_ID into a date for alteryx sillyness #AD$SNPSHT_RUN_ID = ymd(AD$SNPSHT_RUN_ID) #Would current date need to be SNPSHT_RUN_ID - 1 day? current_date <- max(AD$SNPSHT_RUN_ID) #i months ago ## 0 indicates current forecast from today months_ago_list <- c(0,1) #,3,6) you cant do beyond 3 at this point , we dont have the data #j months ahead months_ahead_list <- c(1,3,6,12) final = NULL for (i in months_ago_list) { month_ago = current_date - months(i) for (j in months_ahead_list){ month_ahead_end = current_date + months(j) # # #i month ago UTAS said the demand for today plus j month isthe sum of the demand qty by each material # #Previous Forecast PF<- subset(AD, SNPSHT_RUN_ID == month_ago) PF<- subset(PF, DEMAND_DATE < month_ahead_end) PF<- subset(PF, DEMAND_DATE >= current_date) PF<- ddply(PF, "MATL_ID", summarise,i_month_ago_j_month_of_demand = sum(SUM_DEMAND_QTY)) # # # ## Current Forecast , need this for 1, 3, 6, 12 months # #i=0 CF<- subset(AD, SNPSHT_RUN_ID == current_date) CF<- subset(CF, DEMAND_DATE < month_ahead_end) CF<- subset(CF, DEMAND_DATE >= current_date) CF<- ddply(CF, "MATL_ID", summarise,zero_month_ago_j_month_of_demand = sum(SUM_DEMAND_QTY)) # # # # # #Merge the current forecast with the previous forecast # ## What happens when the current forecast did not have this material in the previous forecast merge<-join(CF, PF, by = 'MATL_ID') # mergedf <- as.data.frame(merge) # mergedf["i"] <- NA # mergedf$i <- i # mergedf[,j] <- i # mutate(merge,newi = i) # merge$i <- i # merge$j <- j # merge$SNP_SHT_DAY <- current_date merge$i_month_ago_j_month_of_demand[is.na(merge$i_month_ago_j_month_of_demand)] <- 0 merge$zero_month_ago_j_month_of_demand[is.na(merge$zero_month_ago_j_month_of_demand)] <- 0 # final<- rbind(final, merge) # } } # # ############################################################## oNLY USE CODE ABOVE, NEED TO CREATE REST IN aLTERYX # # # final$churn <- with(final, (zero_month_ago_j_month_of_demand - i_month_ago_j_month_of_demand)/i_month_ago_j_month_of_demand) final$churn2 <- with(final, (pmax(zero_month_ago_j_month_of_demand,i_month_ago_j_month_of_demand)-pmin(zero_month_ago_j_month_of_demand,i_month_ago_j_month_of_demand))/pmax(zero_month_ago_j_month_of_demand,i_month_ago_j_month_of_demand)) #merge<- mutate(merge, churn = ((zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand)/i.month.ago.j.month.of.demand)) #merge<- mutate(merge, churn2 = (pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)) #merge2<- ddply(merge2, summarise,churn.sum = sum(churn)) # zero.sum = sum(zero.month.ago.j.month.of.demand), # i.j.sum = sum(i.month.ago.j.month.of.demand), # total.churn.pct = ((zero.sum - i.j.sum)/i.j.sum)*100) #I think we need to build out a table like this. My thought is also, DT could eventually store a table #like this then discard snapshots older than a certain amount of time. #Plant ID / i / j / material.id / demand.type / snapsht.day / CF(0,j) / PF(i,j) / Churn = ((CF -PF)/PF) / # In the future another column that can be added to this is Daily.Demade (Items that were actually shipped) #Apurvas notes says to use Firm Sales Orders (Not Delivery) # this table could thenbe used to forecast accuracy #final[] <- lapply(final, as.factor) final$RecordID <- as.character(final$RecordID) final$PLANT_ID <- as.character(final$PLANT_ID) final$MATL_ID <- as.character(final$MATL_ID) final$SNPSHT_RUN_ID <- as.POSIXct(final$SNPSHT_RUN_ID) final$DEMAND_DATE <- as.Date(final$DEMAND_DATE) final$SUM_DEMAND_QTY <- as.numeric(final$SUM_DEMAND_QTY) final$DEMAND_TYPE_CD <- as.character(final$DEMAND_TYPE_CD) #final$IS_PAST_DUE_IND <- as.factor(final$IS_PAST_DUE_IND) final$RecordID2 <- as.character(final$RecordID) final$PLANT_ID2 <- as.character(final$PLANT_ID) final$MATL_ID2 <- as.character(final$MATL_ID) final$SNPSHT_RUN_ID2 <- as.POSIXct(final$SNPSHT_RUN_ID) final$DEMAND_DATE2 <- as.Date(final$DEMAND_DATE) final$SUM_DEMAND_QTY2 <- as.numeric(final$SUM_DEMAND_QTY) final$DEMAND_TYPE_CD2 <- as.character(final$DEMAND_TYPE_CD) #final$IS_PAST_DUE_IND2 <- as.factor(final$IS_PAST_DUE_IND) final$i_month_ago_j_month_of_demand <- as.numeric(final$i_month_ago_j_month_of_demand) final$zero_month_ago_j_month_of_demand <- as.numeric(final$zero_month_ago_j_month_of_demand) final$churn <- as.numeric(final$churn) final$churn2 <- as.numeric(final$churn2) reallyfinal <- as.data.frame(final) write.Alteryx(reallyfinal, 1)
It's not a terribly complicated script so I'm lost as to why Alteryx is behaving like this.
Solved! Go to Solution.
This is going to be hard to debug without the .csv file you are using. Could you perhaps replicate the issue with a non-sensitve dataset, or perhaps a more simple option, anonymise your current dataset and attach it to this post.
Ben
You could also check out this post which seems to have a similar issue.
Ben
@BenMosswrote:You could also check out this post which seems to have a similar issue.
Ben
Yeah I tried that stuff there to no avail. I'll work on trying to sanitize my dataset and get it down to a smaller chunk.
@BenMosswrote:This is going to be hard to debug without the .csv file you are using. Could you perhaps replicate the issue with a non-sensitve dataset, or perhaps a more simple option, anonymise your current dataset and attach it to this post.
Ben
Here's the sanitized dataset. I limited it to 10k rows and still get the error on my end. BTW, I'm also on Alteryx 10.6 which is R 3.2.3. Unfortunately, we can't upgrade yet.
Are you getting the YXDBStreaming errors when running on the data you sent? I'm getting errors, but they're not yxdb streaming errors. When I get to
final$RecordID <- as.character(final$RecordID) final$PLANT_ID <- as.character(final$PLANT_ID) final$MATL_ID <- as.character(final$MATL_ID) final$SNPSHT_RUN_ID <- as.POSIXct(final$SNPSHT_RUN_ID) final$DEMAND_DATE <- as.Date(final$DEMAND_DATE) final$SUM_DEMAND_QTY <- as.numeric(final$SUM_DEMAND_QTY) final$DEMAND_TYPE_CD <- as.character(final$DEMAND_TYPE_CD) #final$IS_PAST_DUE_IND <- as.factor(final$IS_PAST_DUE_IND) final$RecordID2 <- as.character(final$RecordID) final$PLANT_ID2 <- as.character(final$PLANT_ID) final$MATL_ID2 <- as.character(final$MATL_ID) final$SNPSHT_RUN_ID2 <- as.POSIXct(final$SNPSHT_RUN_ID) final$DEMAND_DATE2 <- as.Date(final$DEMAND_DATE) final$SUM_DEMAND_QTY2 <- as.numeric(final$SUM_DEMAND_QTY) final$DEMAND_TYPE_CD2 <- as.character(final$DEMAND_TYPE_CD) #final$IS_PAST_DUE_IND2 <- as.factor(final$IS_PAST_DUE_IND) final$i_month_ago_j_month_of_demand <- as.numeric(final$i_month_ago_j_month_of_demand) final$zero_month_ago_j_month_of_demand <- as.numeric(final$zero_month_ago_j_month_of_demand) final$churn <- as.numeric(final$churn) final$churn2 <- as.numeric(final$churn2)
there are only 5 columns in final prior to that part of the script:
So the as.[type](column) is returning NULLs in a lot of cases, and there'll be errors about "replacement has zero rows, data has [some higher number]"
My guess is that the YXDBStreaming error is popping up from within write.Alteryx, but I can't replicate the issue on my end, as I can't get that far and confirm or troubleshoot.
Here's what might work (a bit of a shot in the dark since I can't replicate) - instead of doing the type coercions (the lines I posted above), would it work okay for your purpose to write out 'final' before, and then perform the type coercions in Alteryx with a select tool? My guess is that there is some R type used in 'final' that isn't supported by the Alteryx-R interface. Perhaps one of those type coercions results in such an invalid type.
Thanks. I tried that, but still got errors. A colleague of mine worked on getting this script to 3.2.3 and working and the only thing failing is the output.
library(lubridate) library(plyr) #What did UTAS say i months ago about j months from today #AD <- read_csv("~/Documents/Demand/AggregatedDemand.csv", # col_types = cols(DEMAND_DATE = col_date(format = "%Y-%m-%d"), # MATL_ID = col_character(), SNPSHT_RUN_ID = col_datetime(format = "%Y-%m-%d %H:%M:%S"))) AD <- read.Alteryx("#1", mode="data.frame") AD$DEMAND_DATE<-as.Date(AD$DEMAND_DATE) AD$DEMAND_DATE<-as.POSIXct(AD$DEMAND_DATE) AD$SNPSHT_RUN_ID <-as.POSIXct(AD$SNPSHT_RUN_ID) #Would current date need to be SNPSHT_RUN_ID - 1 day? current.date <- (max(AD$SNPSHT_RUN_ID)) #i months ago ## 0 indicates current forecast from today months.ago.list <- c(0,1) #,3,6) you cant do beyond 3 at this point , we dont have the data #j months ahead months.ahead.list <- c(1,3,6,12) #newdate <- current.date - 1*60*60*24*30 final = NULL for (i in months.ago.list) { month.ago = current.date - (i*60*60*24*30) for (j in months.ahead.list){ month.ahead.end = current.date + (j*60*60*24*30) #i month ago UTAS said the demand for today plus j month isthe sum of the demand qty by each material #Previous Forecast PF<- subset(AD, SNPSHT_RUN_ID == month.ago) PF<- subset(PF, DEMAND_DATE < month.ahead.end) PF<- subset(PF, DEMAND_DATE >= current.date) PF<- ddply(PF, "MATL_ID", summarise,i.month.ago.j.month.of.demand = sum(SUM_DEMAND_QTY)) ## Current Forecast , need this for 1, 3, 6, 12 months #i=0 CF<- subset(AD, SNPSHT_RUN_ID == current.date) CF<- subset(CF, DEMAND_DATE < month.ahead.end) CF<- subset(CF, DEMAND_DATE >= current.date) CF<- ddply(CF, "MATL_ID", summarise,zero.month.ago.j.month.of.demand = sum(SUM_DEMAND_QTY)) #Merge the current forecast with the previous forecast ## What happens when the current forecast did not have this material in the previous forecast merge<-join(CF, PF, by = 'MATL_ID') merge$i <- i merge$j <- j merge$SNP_SHT_DAY <- current.date merge$i.month.ago.j.month.of.demand[is.na(merge$i.month.ago.j.month.of.demand)] <- 0 merge$zero.month.ago.j.month.of.demand[is.na(merge$zero.month.ago.j.month.of.demand)] <- 0 final<- rbind(final, merge) } } ############################################################## oNLY USE CODE ABOVE, NEED TO CREATE REST IN aLTERYX final$churn <- with(final, (zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand)/i.month.ago.j.month.of.demand) final$stable <- with(final, (1-(pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))) final$unstable <- with(final, ((pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))) final$churn[is.na(final$churn)] <- 0 final$stable[is.na(final$stable)] <- 0 final$unstable[is.na(final$unstable)] <- 0 final$impact <- with(final, (zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand)) #write.csv(final, "C:/Users/KORMANJJ/Documents/My Tableau Repository/Workbooks/stability.csv") #merge<- mutate(merge, churn = ((zero.month.ago.j.month.of.demand - i.month.ago.j.month.of.demand)/i.month.ago.j.month.of.demand)) #merge<- mutate(merge, churn2 = (pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)-pmin(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand))/pmax(zero.month.ago.j.month.of.demand,i.month.ago.j.month.of.demand)) #merge2<- ddply(merge2, summarise,churn.sum = sum(churn)) # zero.sum = sum(zero.month.ago.j.month.of.demand), # i.j.sum = sum(i.month.ago.j.month.of.demand), # total.churn.pct = ((zero.sum - i.j.sum)/i.j.sum)*100) #I think we need to build out a table like this. My thought is also, DT could eventually store a table #like this then discard snapshots older than a certain amount of time. #Plant ID / i / j / material.id / demand.type / snapsht.day / CF(0,j) / PF(i,j) / Churn = ((CF -PF)/PF) / # In the future another column that can be added to this is Daily.Demade (Items that were actually shipped) #Apurvas notes says to use Firm Sales Orders (Not Delivery) # this table could then be used to forecast accuracy write.Alteryx(final, 1)
At this point. I can't for the life of me understand why Alteryx would have a problem.
I see. I can reproduce now. The issue is with the I/O between Alteryx and R on POSIXct types. A minimal example to reproduce would be
vec <- as.POSIXct(c("2018-03-02")) df <- data.frame(dates = vec) write.Alteryx(df)
Which throws the same error for the same reasons.
For your code, I was able to resolve this by adding the line
final["SNP_SHT_DAY"] <- as.character(final["SNP_SHT_DAY"])
before the write.Alteryx
and it will write the dates out as strings.
As a note, for the most recent script you posted, I also had to do
AD$SUM_DEMAND_QTY <- as.numeric(AD$SUM_DEMAND_QTY)
after the read.Alteryx (otherwise, it would be a factor, and you can't take the sum in the for loop)
POSIXct times in R aren't stored in the date formats that you see them printed in. They're stored as more complex objects (with a format for printing and an int representing seconds since 01-01-1970). Convert them to a character or int, then output them. That said, I'd say it's a bug that the I/O doesn't support converting these to datetime Alteryx types. For now, you'll need to do it manually (output as character, use select tool).
I need to buy you a beer or another beverage of your choice. That was it! I can't believe it was that. I might have to put a ticket into Alteryx to get that taken care of. Or at least give a better error message than what it gives. Thank you so much!
Last question: adding that last statement to convert to a character gives us a character vector like:
c(1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000, 1520226000,...)
Seems to me that it's converting EVERY value in the frame and sticking it in a character vector in that part of the frame. Is there a better way to do this?