Finding the latest version of records

SasiMon

To give an example scenario.

I have a UserInformation in a table with below columns.

UserId,

UserName,

UserAddress,

UpdatedDate.

I need to get the latest Updated record for each User.(only one record per user)

I am using Hive Query to get the data like below.

SELECT 
u.UserId,
u.UserName,
u.UserAddress
FROM
( SELECT 
ROW_NUMBER() OVER (PARTITION BY UserId ORDER BY UpdatedDate DESC) AS RowRank,
UserId,
UserName,
UserAddress,
UpdatedDate,
FROM UserInformation 
) u
WHERE u.RowRank = 1;

Is it possible to do the same in Alteryx with In-DB modules?

In Database

In-Database

Accepted answers

jdunkerley79

I don't have access to a Hive server but connected to a SQL SERVER you can create the RowNumber using a formula in-db and then user a filter in-db to select just the latest record:

All comments

JoeS

I am not quite sure how efficient it will be but you could:

Use a Summarize in-DB to group by UserID and take the max UpdatedDate.
Join back on User.ID and Max_UpdatedDate

jdunkerley79

I don't have access to a Hive server but connected to a SQL SERVER you can create the RowNumber using a formula in-db and then user a filter in-db to select just the latest record:

SasiMon

Both Answers work on Hive, the second one is faster.

This first answer works but doesn't look to be very efficient, The answer by "jdunkerley79" looks better to me.

Thanks for the contributing.

I have one additional question on the same dataset. I am trying to capture change of Name or Address. I use HQL to do this and looking to see if this can be done in-DB using Alteryx.

Select 
UserId
from 
(
Select
UserId,
UserName,
(LEAD(UserName, 1) OVER  (PARTITION BY UserId    ORDER BY UpdatedDate DESC)) AS prev_UserName,
UserAddress,
(LEAD(UserAddress, 1) OVER  (PARTITION BY UserId    ORDER BY UpdatedDate DESC)) AS prev_UserAddress

From
UserInformation
) u 

Where 

u.UserName <> u.prev_UserName
or
u.UserAddress <> u.prev_UserAddress

vaishnavika

@JoeS thank you, this suggestion helped me in my use case and was pretty effective considering i have a large data set to work with.

Quick Links

This months top contributors

atcodedog05 19598

Qiu 15880

binu_acs 15708

MarqueeCrew 13708

apathetichell 13703