Fully managed cloud. 30-day FREE full featured trial. Start Now
cancel
Showing results for 
Search instead for 
Did you mean: 

Applying query from "Verification of the Uniqueness (UNIQUE)" - Selecting duplicated values

3_stripes
Contributor

Hello everyone!

I have 2 questions regarding the query in  Verification of the Uniqueness (UNIQUE) | Exasol Documentation

1) I need to count how many duplicates ID exist in a given table.

My first approach was the more traditional one:

Select COUNT(DISTINCT ID)

FROM TABLE A

HAVING COUNT(ID) > 1

 

But I came across Verification of the Uniqueness (UNIQUE) | Exasol Documentation and I used the following query:

SELECT COUNT(DISTINCT ID) WITH INVALID UNIQUE(ID)

FROM TABLE A


The results I am obtaining are abysmally different! I have the felling I do not fully understand what this UNIQUE query is doing. Shouldn't they be the same?

 

2) I want to add an extra column to the following query:

SELECT COUNT(DISTINCT ID) WITH INVALID UNIQUE(ID), COUNT( DISTINCT COUNTRY)

FROM TABLE A

It doesn't work and it generates the following error:

SQL Error [42000]: syntax error, unexpected ',', expecting INTO_ or FROM_ [line 1, column 61] (Session: 1723181631447242946)

Any idea how to combine those two columns?

 

Thank you very much!

 

1 ACCEPTED SOLUTION

Accepted Solutions

exa-SimonR
Team Exasol
Team Exasol

Regarding 1)

Let's use an example:

create  schema test;
create or replace table test.A (ID int, COUNTRY int);
SELECT COUNT(DISTINCT ID) WITH INVALID UNIQUE(ID), COUNT( DISTINCT COUNTRY)FROM A;
insert into test.A values (1, 1), (2, 1), (3, 1), (3,2);

So the id 3 is duplicated, every other id is unique.

select count(distinct id) from test.a having count(id) > 1;
-- => returns 3

This counts how many distinct ids there are. If there were no ids at all, i.e. on an empty table, it would return zero rows, because the having filters out the result row. If you use an aggregate function and having without group by the result is treated as a single group.

 

SELECT * WITH INVALID UNIQUE(ID) FROM test.A;
-- ID	COUNTRY
-- 3	1
-- 3	2
SELECT count(*) WITH INVALID UNIQUE(ID) FROM test.A;
-- => returns 2

Select with invalid unique, selects all rows that are not unique for the specified columns. You can use COUNT(*) to get a count, or * to get all entries.

 

SELECT count(distinct id) WITH INVALID UNIQUE(ID) FROM test.A;
-- => returns 1

This returns the number of distinct ids that are not unique. In our case there is only one id (the number 3) that is not unique, so the query returns 1. I think this is the query that you want, but I'm not completely sure what your goal is.

View solution in original post

7 REPLIES 7

exa-MathiasHo
Exasol Alumni

Hi @3_stripes and welcome to the community!

I'll see if I can get you the answer to your question internally. Just so you know, it may be that this might take till Monday as the weekend is almost here. 


Maybe in the meantime, other users can help out here. Either a new member (how about you @ysfgrm?) or one of our veterans?

exa-Peggy
Moderator
Moderator

Example 1:

 

CREATE OR REPLACE TABLE A ( ID INT);
INSERT INTO A VALUES (1),(2),(2),(4),(4),(4);

 

1.) Which distinct values do we have?

 

SELECT GROUP_CONCAT(DISTINCT ID), '=>', COUNT(DISTINCT ID) FROM A;

 

Result:

GROUP_CONCAT(A.ID) '=>' Count(DISTINCT A.ID)
1,2,4 => 3

 

2.) Which distinct duplicate values do we have?

 

SELECT COUNT(DISTINCT ID), GROUP_CONCAT(DISTINCT ID), '=>', COUNT( DISTINCT ID)  WITH INVALID UNIQUE(ID) FROM A;

 

Result:

Count(DISTINCT A.ID) GROUP_CONCAT(A.ID) '=>' Count(DISTINCT A.ID)
2 2,4 => 2

 

3.) Without INVALID UNIQUE construct we have to use the following statement:

 

SELECT GROUP_CONCAT("IDs" SEPARATOR ' | '), '=>', count(*)
FROM (
        -- redundant tuples
        SELECT count(ID), GROUP_CONCAT(ID ) as "IDs" FROM A
        GROUP BY ID
        HAVING count(ID)>1
      );

 

 

Result:

GROUP_CONCAT(IDs) '=>' COUNT(*)
2,2 | 4,4,4 => 2

 

exa-Peggy
Moderator
Moderator

Example:

 

CREATE OR REPLACE TABLE
test.A(ID INT,COUNTRY INT);
INSERT INTO test.A VALUES(1,1),(2,2);

 

How many distinct values do we have?

 

SELECT COUNT(DISTINCT ID), COUNT(ID)
FROM TABLE A
--HAVING COUNT(ID) > 1
;

 

Result:

Count(DISTINCT ID) Count(ID) 
------------------ ---------
2 2

Do we have redundant ID values? 

 

SELECT COUNT(DISTINCT ID) WITH INVALID UNIQUE(ID)
FROM  A;

 

Result:

Count(DISTINCT A.ID) 
--------------------
0   

exa-SimonR
Team Exasol
Team Exasol

Regarding 1)

Let's use an example:

create  schema test;
create or replace table test.A (ID int, COUNTRY int);
SELECT COUNT(DISTINCT ID) WITH INVALID UNIQUE(ID), COUNT( DISTINCT COUNTRY)FROM A;
insert into test.A values (1, 1), (2, 1), (3, 1), (3,2);

So the id 3 is duplicated, every other id is unique.

select count(distinct id) from test.a having count(id) > 1;
-- => returns 3

This counts how many distinct ids there are. If there were no ids at all, i.e. on an empty table, it would return zero rows, because the having filters out the result row. If you use an aggregate function and having without group by the result is treated as a single group.

 

SELECT * WITH INVALID UNIQUE(ID) FROM test.A;
-- ID	COUNTRY
-- 3	1
-- 3	2
SELECT count(*) WITH INVALID UNIQUE(ID) FROM test.A;
-- => returns 2

Select with invalid unique, selects all rows that are not unique for the specified columns. You can use COUNT(*) to get a count, or * to get all entries.

 

SELECT count(distinct id) WITH INVALID UNIQUE(ID) FROM test.A;
-- => returns 1

This returns the number of distinct ids that are not unique. In our case there is only one id (the number 3) that is not unique, so the query returns 1. I think this is the query that you want, but I'm not completely sure what your goal is.

exa-SimonR
Team Exasol
Team Exasol

Regarding 2)

You need to specify the additional columns before the WITH INVALID clause.

SELECT COUNT(DISTINCT ID), COUNT( DISTINCT COUNTRY) WITH INVALID UNIQUE(ID)
FROM test.A;
-- Count(DISTINCT A.ID)	Count(DISTINCT A.COUNTRY)
-- 1	2

Basically you specify the columns that you are interested in before the WITH INVALID UNIQUE clause. The list of columns after UNIQUE (in our case just id), specifies according to what columns the rows should be unique.

ysfgrm
Padawan

It looks like INVALID UNIQUE function is filtering table rows based on the columns which you gave for uniqueness check. So if you want to get count(distinct country) for all rows, then it wont work. It could work just for the filtered rows in the way that as @exa-SimonR mentioned in the second post (regarding 2). My suggestion would be get these counts separately and combine them.  

drumcircle
Rising Star

And... use CTAS with SELECT DISTINCT to create a new table without duplicate rows!