Checking duplicates in table before loading new data in SQL Server
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello all! I'm new to the Alteryx community! There’s a lot of useful information in the discussions but I'm running into issues finding a unique workflow for checking duplicates in a table.
Essentially, I have an excel file I need to check against the table "ORDERS" in a database.
Excel file:
ID_NUMBER | YEAR | CODE | STATUS | RESPONSE | SOURCE | MONTH | SENT DATE |
0000123456 | 2020 | ABC20 | A | 1 | DD | 12 | 12/1/2020 |
0000548759 | 2020 | ABC20 | A | 1 | DD | 12 | 12/1/2020 |
0000842169 | 2020 | ABC20 | A | 1 | DD | 12 | 12/1/2020 |
0000254856 | 2020 | ABC20 | A | 1 | DD | 12 | 12/1/2020 |
0000205872 | 2020 | ABC20 | A | 1 | DD | 12 | 12/1/2020 |
0000284567 | 2020 | ABC20 | A | 1 | DD | 12 | 12/1/2020 |
0000102489 | 2020 | ABC20 | A | 1 | DD | 12 | 12/1/2020 |
0000854753 | 2020 | ABC20 | A | 1 | DD | 12 | 12/1/2020 |
0000248648 | 2020 | ABC20 | A | 1 | DD | 12 | 12/1/2020 |
I would like to view the results where
ID_NUMBER
YEAR
CODE
STATUS
RESPONSE
SOURCE
MONTH
SENT DATE
are identical in addition to the total number of records. I will exclude all duplicates before adding the new data to the ORDERS table.This is a sample set - the original file has over 150,000 rows.
So far, I have the following workflow but this data already exists in the Orders table; I’m not sure why all rows populate in the results view.
Any help that you may have would be greatly appreciated. Thanks everyone!
- Labels:
- Parse
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @rr112,
I am not sure I completely understand what you are trying to accomplish, but this workflow should help you identify the duplicates. I had to alter your sample to get this to give results.
Hope that helps,
Greg
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks Greg_Murray!
I'm sorry it wasn't clear, still learning the ropes. I want to check the text input against a table that already exists in the database to confirm if duplicate rows exist. The output should return duplicate rows only meaning all field names are identical. Thanks so much for your response!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I am not sure if I am interpreting your challenge correctly either but here is an idea that might help based on what I understand.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi tsturge,
Thanks so much for your response. This workflow is helpful; the challenge I'm facing is checking the excel file against data that already exists in the table. I'd like to create a new table that shows all records that already exist in the Orders table using the excel file. For example, load Database > View Orders table > check excel file for duplicate records in Order table > browse duplicates (meaning show data where rows already exist in orders table - duplicates in this case are all identical fields ). I'm sorry if this still doesn't make sense. Thank you so much for your help, I'm very grateful to have such a helpful and responsive community of experts!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
If you are looking to compare your Excel file to see which records already exist in your SQL table then the join will achieve that purpose. I assume you don't have a single unique ID field that exist on both the excel AND the SQL, assuming no then the join between the two tables would solve this.
For example, if you have 5 fields of data on your excel file and those same 5 fields in SQL table then set up the join tool using all 5 fields. Only records where all 5 fields are a 100% match will join, these are your 'duplicates', these are your records that exist in both sources of data, these records will output from the J in the join tool
The Excel records that don't join (aka outer left) are records that don't have a like record in SQL, these are you new records.
