Handling Duplicates

 1. How to find duplicates records in table 


select first_name, last_name , email, phone_number ,count(*)  from employees

group by first_name, last_name , email, phone_number 

having count(*)>1;


2. How to delete duplicates by keeping one :

 DELETE FROM EMPLOYEES

WHERE ROWID NOT IN (

  SELECT MIN(ROWID)

  FROM EMPLOYEES

  GROUP BY first_name, last_name , email, phone_number

);

3. How to restrict data getting duplicates in EMPLOYEES table ?

Answer :  

Option 1 

- Create Composite unique key or  USER KEY    -(Best Practice)

Create user key with above 4 columns, so that no one can insert record with this combination again in the database.

ALTER TABLE employees 

ADD CONSTRAINT emp_unique_key UNIQUE (first_name, last_name, email, phone_number);


Option 2 -

Create Index -

CREATE UNIQUE INDEX emp_unique_idx
ON employees (first_name, last_name, email, phone_number);


4 : Unique Key Constraint vs Unique Index -





5. Ways to identify duplicates -

Example Student table with below duplicates -


1. Using Distinct


2. Using Group By 
Group by with Min(ROWID)



Add Condition having count >1 

3. Use Rank 


Add Condition Rank =1 will
shows you records after eliminating Duplicates 
Add Condition Rank >1
Will show you Duplicates which are eliminated 

4. Use Union 
Not recommended due to Performance issue
Instead Union with NULL can be used as shown in below


Union with NULL 
( Not Performance Issue )
Here 1=2 Does not returns any row
But UNION operator removes the duplicates in the result .

1=2 can be used with same table as well as it does not returns any data and union does its Job .




5. User Intersect 


6. Use Minus with NULL


Minus with same table 1=2










No comments:

Post a Comment

About This Blog

SQL and PL/SQL are essential for database management, enabling efficient data retrieval, manipulation, and transaction control. SQL (Structu...