Hi Alteryx Experts,
I am looking for some help joining two different data sets. I am trying to match a user to an IP address, I have the user and source port in one log along with login time and the source port and source IP in a second log. The problem is the time stamps are not an exact match and source ports can be reused - the logs with the source source IP information are not time stamped, only have a time stamp which relates to the time when it was entered into the log which can be up to a few minutes delay. See some fake sample data below.
Application log:
user | LoginTime | port |
abc | 5/26/2025 7:00 | 12345 |
def | 5/26/2025 7:02 | 54321 |
ghi | 5/26/2025 7:03 | 98587 |
jkl | 5/26/2025 7:07 | 12345 |
src_ip | src_port | _time |
10.100.100.1 | 12345 | 5/26/2025 7:05 |
10.200.200.2 | 54321 | 5/26/2025 7:06 |
10.300.300.3 | 98587 | 5/26/2025 7:04 |
10.400.400.4 | 12345 | 5/26/2025 7:08 |
Desired result table:
user | src_ip |
abc | 10.100.100.1 |
def | 10.200.200.2 |
ghi | 10.300.300.3 |
jkl | 10.400.400.4 |
For this sample the tricky part is making sure that user abc is only linked to 10.100.100.1 and jkl to 10.400.400.4 since they share a source port and happened somewhat close together
Any help on how I can achieve this?
Thanks!
Solved! Go to Solution.
my recommendation ---> 1 of 2 ways:
1) convert time stamps to milliseconds --- create buckets based upon your metric (ten seconds, a minute/whatever) --- join on that field and port.
2) join on port --- use datetimediff to find closest time stamp. use sort/tile to choose closest time stamp.
@barnese I made a workflow for you. There would be times when the difference of times would be negative, so you need to remove that as well. Hope this helps.