Android Question SQLite remove duplicate records.[SOLVED TWICE]

Roger Daley

Well-Known Member
Licensed User
Longtime User
Hi All,

Can anyone point me towards an example of removing duplicate records in a db?
I have searched for methods/commands in SQLite and DBUtils but nothing stood out to me.
Trying to match every record with every other record through iteration does not appeal and I'm sure there is a cleaner way do this.
NOTE: When I say duplicate records, I mean the data in all columns of both records is the same not just matching the "Names" column.

Regards Roger
 

Pendrush

Well-Known Member
Licensed User
Longtime User
Create same table (copy table as new) and then add matching fields as primary key(s), then INSERT IGNORE all records form old table to new table.
 
Upvote 0

Roger Daley

Well-Known Member
Licensed User
Longtime User
Create same table (copy table as new) and then add matching fields as primary key(s), then INSERT IGNORE all records form old table to new table.

Hi Pendrush,

Thanks for the reply, you wouldn't have an example of what you wrote? I should have mentioned I am still learning how to spell SQL. I keep reading the tutorials but they appear to be arcane works of the druids.

Regards Roger
 
Upvote 0

Roger Daley

Well-Known Member
Licensed User
Longtime User
Upvote 0

rraswisak

Active Member
Licensed User
Longtime User
Try this set of query statements:

CREATE TABLE table_temp AS SELECT DISTINCT * FROM table_duplicate;
DELETE FROM table_duplicate;
INSERT INTO table_duplicate SELECT * FROM table_temp;
DROP TABLE table_temp;

or

CREATE TABLE table_temp AS SELECT DISTINCT * FROM table_duplicate;
DROP TABLE table_duplicate;
CREATE TABLE table_duplicate AS SELECT * FROM table_temp;
DROP TABLE table_temp;

or

CREATE TABLE table_temp AS SELECT DISTINCT * FROM table_duplicate;
DROP TABLE table_duplicate;
ALTER TABLE table_temp RENAME TO table_duplicate;

 
Last edited:
Upvote 0

mangojack

Expert
Licensed User
Longtime User
Can anyone point me towards an example of removing duplicate records in a db?


I tried the above code supplied by @rraswisak but could not get it to copy Distinct records ... Maybe I was doing it wrong .

Anyway , as a part of learning new .. I had a play with Unique, Insert Or Ignore.

The attached example is a BASIC example of extracting distinct / unique records from one table to another etc.

It might be of interest to you or others. If so it can be adapted to suit.

I cant emphasize more the need to backup your DB 's before you apply any of this ! I am no expert in this Field.


@Roger Daley .. you mention DBUtils above , I dont know how reliant you are on this in your project/s , But personally when i was new to B4X and knew nothing
of SQLite ect ... I played with DBUtils as well . I found I was spending more time trying to learn the workings of the Util. rather than SQL itself.

I ditched it and concentrated on SQLite directly .. happy that I did. This is my Personal Opinion.

Cheers.
 

Attachments

  • DB Unique example.zip
    10.8 KB · Views: 289
Upvote 0

Roger Daley

Well-Known Member
Licensed User
Longtime User
Hi All

Attempted to use the code below, with the error:
java.lang.ClassCastException: android.database.sqlite.SQLiteCursor cannot be cast to java.util.List
at line:

For i = 0 To Table1.Size - 1
EDIT: 14 Oct. the line:
Table1 = SQL1.ExecQuery("SELECT distinct * FROM sites")
is the real problem. You can't use a LIST [Table1] for a SQL Query.


Most of code in the sub is from the existing code which loads a CSV ito the db. I can't see why it works in the existing code but not in this.
The error line has no meaning to me.

B4X:
Sub SQLDuplicate                
    Private Table1 As List
    Private Table2 As List
    Private Items() As String
    Table1.Initialize
    Table2.Initialize
   
    Table1.Clear
    Table1 = SQL1.ExecQuery("SELECT distinct * FROM sites")
   
    For i = 0 To Table1.Size - 1
        Private Map1 As Map
        Map1.Initialize
        Items = Table1.Get(i)
        Map1.Clear
        Map1.Put("SiteName", Items(0))
        Map1.Put("Longitude", Items(1))
        Map1.Put("Latitude", Items(2))
        Table2.Add(Map1)
    Next
    SQL1.ExecNonQuery("DELETE FROM sites")
    DBUtils.InsertMaps(SQL1, "sites", Table2)
End Sub
Any clues happily received.

Regards Roger
 
Last edited:
Upvote 0

emexes

Expert
Licensed User
Trying to match every record with every other record through iteration does not appeal and I'm sure there is a cleaner way do this.
You don't have to match every record with every other record, ie n^2 operations; the simplest non-bug-prone method would be to go through each record and do a lookup for likely duplicates using keys like name, birthdate, telephone number, and then check just those lookup results for fully-duplicate matches on all other fields (and delete any found). This will be more like a n log n operation.

Another way is to select all records, sorted by all fields, and then all duplicates will be grouped together when you scan through the sorted results in order. eg (psuedocode from another dialect of BASIC):
B4X:
if SortResult.Count Then
    ThisRecord = SortResult.First
    Do Until SortResult.End
        LastRecord = ThisRecord
        ThisRecord = SortResult.Next

        If ThisRecord.FirstName = LastRecord.FirstName Then
        If ThisRecord.LastName.Trim.UpperCase = LastRecord.LastName.Trim.UpperCase
        If ThisRecord.Telephone.Filter("[09]*") = LastRecord.Telephone.Filter("[09]*") Then

            SortResult.Delete    'current record, but leaves cursor in position re: .Next and .End

        End If
        End If
        End If
    Loop
End If
 
Upvote 0

rraswisak

Active Member
Licensed User
Longtime User
could not get it to copy Distinct records
actualy it works for me

NOTE: When I say duplicate records, I mean the data in all columns of both records is the same not just matching the "Names" column.
base on above statement the first my thought is using distinct (*) with all column



I attach working sample project to prove it
 

Attachments

  • remove_duplicate.zip
    10 KB · Views: 296
Upvote 0

mangojack

Expert
Licensed User
Longtime User
I attach working sample project to prove it

Not really necessary ... As stated above, It was most likely something I was doing wrong.

Running this "CREATE TABLE table_temp AS SELECT DISTINCT * FROM MyTable"
for some reason just copied over all records ... Including exact duplicates.

I did not spend much time on it to figure out why.
 
Upvote 0

Roger Daley

Well-Known Member
Licensed User
Longtime User
actualy it works for me


base on above statement the first my thought is using distinct (*) with all column

View attachment 84610

I attach working sample project to prove it


I am not at the computer at the moment but looking at the code as text on the phone it looks brilliant in it's simplicity. I obviously stuffed up the syntax when I tried it.
It will probably be tomorrow before I can try again but will report on the results.

Regards Roger
 
Upvote 0

Roger Daley

Well-Known Member
Licensed User
Longtime User
rraswisak
Many thanks for your working example. For me it was a great tutorial on SQL and I was correct, my first attempts to use your suggestions screwed up the syntax. Working examples are the best teachers, I now have a little bit better understanding of SQL although I don't know what/why the "Transaction" statements do.


From your example I derived the following SUB and it is working perfectly.


Regards Roger



B4X:
Sub SQLDuplicate(SQLD As SQL, TableName As String)   
    SQLD.BeginTransaction
    SQLD.ExecNonQuery("CREATE TABLE table_temp AS SELECT DISTINCT * FROM "& TableName)
    SQLD.ExecNonQuery("DROP TABLE "& TableName)
    SQLD.ExecNonQuery("ALTER TABLE table_temp RENAME TO "& TableName)
    SQLD.TransactionSuccessful
    SQLD.EndTransaction
End Sub
 
Upvote 0

rraswisak

Active Member
Licensed User
Longtime User
no problem, glad that this solution works for you.

When your app dealing with database and had to do some query (CRUD) operation to many table it's better (or exceptionally is a must) using transaction. This will make sure all query statement has been proceed completely without any error or interruption.

Say, on button click - i want to insert data to table A and also insert same data to table B meanwhile update table C base on data inserted in table A.
there are three step that should be execute. if one of this step failed then the other step will be cancelled/rollback. That's how Transaction works.

Any way you can search about this Transaction for better explanation, as far as i know - most of relational database support this feature.
 
Last edited:
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Longtime User

This can be done with one simple SQL.
Do some testing on a simple table with fields: ID, Name, Age.
Make some duplicate records.
Then run this SQL:
select rowid, count(*) as cnt from duptest2 group by id, name, age order by cnt desc

You will work out from this how to delete the duplicates.

RBS
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Actually you will need to run the delete in a code loop as there can be multiple duplicates for one record.
So, one simple SQL won't do it, unless this is possible with a (recursive) CTE.

RBS
 
Upvote 0

RB Smissaert

Well-Known Member
Licensed User
Longtime User
Actually you will need to run the delete in a code loop as there can be multiple duplicates for one record.
So, one simple SQL won't do it, unless this is possible with a (recursive) CTE.

RBS

Actually, it can indeed be done with one simple SQL:

DELETE
FROM duptest
WHERE rowid NOT IN
(SELECT MAX(rowid)
FROM duptest4
GROUP BY id, name, age)

RBS
 
Upvote 0
Cookies are required to use this site. You must accept them to continue using the site. Learn more…