How to filter for documents with the same ID created within 3 seconds of each other
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have unique files getting revisions. Each file can have many revisions and I only care about the first two.
I have filtered the data so "docrevnum" is 0 or 1 .
What I would like to do from here is count the number of times those pairs have "docrevdate" within 3 seconds of each other.
Thank you,
ArmyGroo
- Labels:
- Fuzzy Match
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@ArmyGrooSMH What are the expected output results?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
The ideal output to the above is a count of the number of times "itemnum" shows a "docrevdate" within 3 seconds of each other. The example data should show 6.
There is one trick in the sample data.
88343237 |
Shows up 3 times. Twice with a docrevnum of 0. This has been fixed on my side by adding a unique on docrevnum+itemnum+docrevdate.
I have 100million records and ideally want to know the number of times an item revision comes in within 3 seconds of the original.
I will be adding a groupby to the filetypenum after getting a count.
