Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Finding the latest version of records

SasiMon
6 - Meteoroid

To give an example scenario.

 

I have a UserInformation in a table with below columns.

UserId,

UserName,

UserAddress,

UpdatedDate.

 

I need to get the latest Updated record for each User.(only one record per user)

 

I am using Hive Query to get the data like below.

 

 

SELECT 
u.UserId,
u.UserName,
u.UserAddress
FROM
( SELECT 
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY UpdatedDate DESC) AS RowRank,
UserId,
UserName,
UserAddress,
UpdatedDate,
FROM UserInformation 
) u
WHERE u.RowRank = 1;

Is it possible to do the same in Alteryx with In-DB modules?

 

 

 

5 REPLIES 5
JoeS
Alteryx
Alteryx

I am not quite sure how efficient it will be but you could:

 

  1. Use a Summarize in-DB to group by UserID and take the max UpdatedDate.
  2. Join back on User.ID and Max_UpdatedDate
jdunkerley79
ACE Emeritus
ACE Emeritus

I don't have access to a Hive server but connected to a SQL SERVER you can create the RowNumber using a formula in-db and then user a filter in-db to select just the latest record:

 

2017-06-06_16-10-47.jpg

SasiMon
6 - Meteoroid

Both Answers work on Hive, the second one is faster.

This first answer works but doesn't look to be very efficient, The answer by "jdunkerley79" looks better to me.

 

Thanks for the contributing.

 

I have one additional question on the same dataset.  I am trying to capture change of Name or Address. I use HQL to do this and looking to see if this can be done in-DB using Alteryx.

 

Select 
UserId
from 
(
Select
UserId,
UserName,
(LEAD(UserName, 1) OVER  (PARTITION BY UserId    ORDER BY UpdatedDate DESC)) AS prev_UserName,
UserAddress,
(LEAD(UserAddress, 1) OVER  (PARTITION BY UserId    ORDER BY UpdatedDate DESC)) AS prev_UserAddress

From
UserInformation
) u 

Where 

u.UserName <> u.prev_UserName
or
u.UserAddress <> u.prev_UserAddress

 

 

SasiMon
6 - Meteoroid

A formula-InDB can solve the LAG and LEAD window function as well.

 

LEAD(UserAddress, 1) OVER  (PARTITION BY UserAddress ORDER BY   UpdatedDate DESC).

 

Thanks

 

vaishnavika
5 - Atom

@JoeS thank you, this suggestion helped me in my use case and was pretty effective considering i have a large data set to work with.

Labels