Placeholder

Customer Forum

A way to identify duplicates

Workbooks Support Posted: 2015-04-15 12:56

A Way to Identify Duplicate Records

Identifying duplicates on your database is as much an art as it is a science. We, as humans, are very good at spotting patterns and similarities between things which makes us good at identifying duplicate records when we see them, but with a large dataset, casting the human eye over every record can be impractical and extremely time consuming. Conversely, computers are very good at searching through entire databases in seconds but they don't have the human element to make them good at identifying patterns or similarities. Instead computers rely on finding exact matches of values in order to identify potential duplicates.

Identifying duplicates in your database requires a balance of the two in order to be most effective.

In Workbooks we can write reports which will identify records where a value in a field is an exact match on 2 or more records - Person Name, for example - but with this approach we'll always have to go in and look at both records to truly determine if the duplicate really exists - there can be a lot of unique John Smiths out there! - and if so, which record we need to merge, delete and so on. So the idea is that the reports will narrow your search guide you into finding potential duplicates which you can then examine in detail to figure out exactly what to do.

Depending on your data, you might want to search for records which are exact matches on other fields. In this example we'll just look for duplicate Person records by finding records with matching names.

Begin by creating a pre-populated report based on People (click to enlarge). 

Add any columns to this view which you may wish to use to check for duplicates, preferably something that should be unique to that Person.

Now a Summary View, then click Add value column > Add Summarised Column.

In the Column section, choose the Column as 'Person Reference' and set Calculate to 'Count'. This will give us a column of the count of entries which counts the number of unique instances of our grouped by column. 

The Grouped by section is where you'll be choosing the field which you'll be looking to find records which have the same value. In this example we've chosen Person name as we'll be looking for records for people with the same name.

So that we only see records where the Person name is not unique - i.e., potential duplicates - we set a calculated criteria to only return records where the Object Reference Count is greater than 1. 

Yielding our list of potential duplicates. 

We can drill into each Person Name and then compare each record to determine the next course of action.

In this example, we looked at identifying matches by Person name, it's likely that you'll actually want to look for exact matches with other fields such as email address. We can easily change the matching criteria by setting the Group by column to the field of choice. 

Which will produce the following report ready for further interrogation. 

Of course this method does not just apply to finding duplicate Person records, you can build your report on any record type as the exact same concepts apply.  

Workbooks Support