Alteryx Designer Desktop Discussions

florini · ‎07-04-2017

Hi, please see attached macro.

Practically what I want to achieve is to read dates in one table and compare them to intervals in the other table. Make a join based on this.

I found the solution in R here: https://stackoverflow.com/questions/41132081/find-which-interval-row-in-a-data-frame-that-each-eleme...

I works well in R studio but not in Alteryx.

Error: R (7): Error in grep(regexp, the.map$names, ignore.case = TRUE, perl = (length(regions) < :
Info: R (7): invalid regular expression '(^~)|(^.x %[]% data.frame(intervals[, c("start", "end")]))'
Info: R (7): Calls: %>% -> eval -> eval -> map -> map.poly -> grep
Info: R (7): In addition: Warning message:
Info: R (7): In grep(regexp, the.map$names, ignore.case = TRUE, perl = (length(regions) < :
Info: R (7): PCRE pattern compilation error
Info: R (7): 'unmatched parentheses'
Info: R (7): at ')'
Error: R (7): Execution halted

Any idea how to interpret/fix this?

Thank you,

F

florini · ‎07-05-2017

I use 3.3.2 in Alteryx, 3.4 in R studio.... but I am sure this is not the cause... any other idea?

Philip · ‎07-06-2017

@florini, if you can't get the R code to work, here is an alternative Alteryx workflow that splits the data into grouping levels. I ran it against your stated 3.8 millions dates and 30,000 intervals and it ran in less than 3 minutes on my laptop (including creating the data sets). You could optimize the indexing levels to improve performance a bit more, and add in an additional level(s) with smaller tiles groups to improve performance more.

florini · ‎08-14-2017

Thanks guys, at the end I took a different R code approach, I used foverlaps, see below the code. Hope this helps someone one day.

I compared this to 'Alteryx way' of adding one line for each day of the period and... Alteryx way was 3x faster in my case... I assume if I grow the number of intervals and their span, the process might reverse...

#require(data.table)
library(data.table)

access = read.Alteryx("#1", mode="data.frame")
suborders = read.Alteryx("#2", mode="data.frame")

access <-na.omit(access, cols="accesstime_out")
suborders <-na.omit(suborders, cold="startDate", "endDate")

access$accesstime_out2 <- access$accesstime_out

access$accesstime_out <- as.POSIXct(access$accesstime_out)
access$accesstime_out2 <- as.POSIXct(access$accesstime_out2)
suborders$startDate <- as.POSIXct(suborders$startDate)
suborders$endDate <- as.POSIXct(suborders$endDate)

x = data.table(access)
y= data.table(suborders)

setkey(y, cid, startDate, endDate)
w <- foverlaps(x, y, by.x=c("custid", "accesstime_out", "accesstime_out2"),
type="within", mult = "all")

df <- data.frame(w)
df[] <- lapply(df, as.character)

write.Alteryx(df)

Alteryx Designer Desktop Discussions

R code non parsed well