Placeholder

Customer Forum

Identifying duplicate Organisation records

Workbooks Support Posted: 2016-03-30 15:07

To help you keep your database clean, Workbooks allows you to merge Organisation records together when it finds two records that it considers to be duplicates. More information about de-duplication and merge is available here. This functionality is really helpful but what if you want to see potential duplicates in a Report to give you an idea of how many duplicates might be on your database and to give someone a starting point if you want to tidy things up. For this, you can build a Report, as described below.  (If you prefer, you can use the template already created in Workbooks for you. To do this, at the point where you're asked how you want to create a new report, select 'Create a report from a template report' and from the list provided select 'Template - Possible duplicate organisations - suffix and prefix'.)

Start by building an Organisation Report. Add a column for the Organisation name and one for the Object Reference. When Workbooks is testing to see if records are duplicates, it strips out prefixes such as 'A' and 'The' as well as suffixes such as 'Plc', 'Ltd', 'Limited', 'Corp', etc, so we need to use a function that has a similar effect. To do this, add a calculated column that uses the following formula:

REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE

(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE

(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(Lower(name) ,' limited',''),' plc',''),

' inc',''),' ltd',''),' lp',''),' llp',''),' lllp', ''),' corp', ''),' gmbh',''),' ag',''),' nv', ''),

' bv',''),' pty',''),' oy',''),'ab',''), 'the ',''),' hr',''),'&',''),'.',''),'-',''),'(',''),')',''),' ','')

In our example, we've called the column 'Cleansed name'. The formula looks more complicated than it really is; in essence it's looking for any instances of the prefixes and suffixes and is removing them (by replacing them with nothing). It also removes all spaces and changes all the text to lower case.  Imagine we had four Organisation records on our database with the following names:

Workbooks

The Workbooks

Workbooks Ltd.

Workbooks Plc

The formula above would result in each one of the four being displayed as 

workbooks

This means we can now compare records and count how many are on the database that may be duplicates. To do this, add a summary view to your report and add a Summarised column, that counts the number of Organisation IDs and groups them by the cleansed name in your details view. It should look like this (click to enlarge):

You're not interested in rows where the ID count is only 1 as this suggests there are no duplicates so you can apply a calculated criterion to only show rows where the ID count is > 1. Now you've got a report that looks similar to this:

You can click on the ID count to drill into it and find out exactly which Organisations look like they might be duplicates. Of course, human intervention is still needed to determine if they really are duplicates and if so, which values from which record(s) should be retained and which records can be deleted, but this report does a lot of the initial hard work for you.