Contactous
  • Products
    • Contact Management >
      • Enterprise Contact Manager (ECM)
      • ECM Pricing
    • Data Quality >
      • DeDupe API
      • CRM Data Quality
    • Data Parser >
      • On-Premise Data Parser
      • Cloud-based Data Extractor and Parser
    • AI Content >
      • Personalized Learning
    • RAG-as-a-service >
      • Answerous
      • Free Trial
    • Carbon Estimation API
  • Use Cases
    • Digital Business Cards
    • Customer Golden Record
    • Sales Funnel from Dealers
    • Automated Document Parser
    • Relationship Intelligence
    • Marketing Leads Management
    • Intelligent Data Import
    • CRM Data Consolidation
    • Webinars and Events
    • Physical Business Cards
    • Enterprise Pre-CRM
  • Company
    • Contact Us
    • Terms Of Use
    • Privacy Policy
  • Login
  • Products
    • Contact Management >
      • Enterprise Contact Manager (ECM)
      • ECM Pricing
    • Data Quality >
      • DeDupe API
      • CRM Data Quality
    • Data Parser >
      • On-Premise Data Parser
      • Cloud-based Data Extractor and Parser
    • AI Content >
      • Personalized Learning
    • RAG-as-a-service >
      • Answerous
      • Free Trial
    • Carbon Estimation API
  • Use Cases
    • Digital Business Cards
    • Customer Golden Record
    • Sales Funnel from Dealers
    • Automated Document Parser
    • Relationship Intelligence
    • Marketing Leads Management
    • Intelligent Data Import
    • CRM Data Consolidation
    • Webinars and Events
    • Physical Business Cards
    • Enterprise Pre-CRM
  • Company
    • Contact Us
    • Terms Of Use
    • Privacy Policy
  • Login

Dedupe api: deduplication of any database from anywhere

DeDupe is an API for data deduplication using a standard RESTful framework. It uses advance AI and fuzzy logic techniques to return high quality pattern matches. DeDupe API is used to match strings in any custom or proprietary database of any size and is optimized for performance.

How does it work? - Steps
- Register your project using the API
- Build an index of every important data value (only the encrypted index value is stored by us, not the data)
- Query the index for any exact or probable match

How does it work? - An Example
Say, you have a website registration database, with the fields: Name, Date of Birth, Address and Reference Number. There are 4 fields here of type: name, date, address and text respectively. Let's say there is an existing database of 1 Million records and everyday 1,000 new ones are coming in.

As a first step, you will need to build an index of all data values that is required to be checked. In this example, an index on text and date will need to be created as the requirements (for every new record) are as follows:

if new-reference-number has a match in database then 
   process-A
else 
   if incoming-reference-number has a probable match then
       if the new-record-date has a probable match to an existing record 
           process-B
       fi
   else
       process-C
       create new record
fi


In this example, you will need to call a match API only once - for the reference number. If the control reaches process-C then a create API call will need to be made to update the index.

Data Types 
DeDupe API returns matches for following data types:
- name (eg, "Dr. Albert Einstein")
- company name (eg, "Kruger Brent Pvt Ltd.")
- address (eg, "22/7, Pie Blvd.")
- phone (eg, "+91 98336 90611")
- email (eg, "[email protected]")
- URL (eg, "www.contactous.com")
- text (eg, "H-144234")
- date (eg, "3rd august of 1968")

Data Matches
DeDupe API maintains two indexes - Exact and Probable. An Exact index returns success only if there is a 100% match, including any leading/trailing spaces and is case sensitive. The Probable match uses hundreds of proprietary algorithms, AI approaches and fuzzy logic approaches to create an intelligent index of the data value. 

Let's consider an example: Say a phone number is indexed, the original value of which is "9833690611/". An Exact match will return success only if the queried string is the same as original value, with an "/" at the end, which is probably a typo error. However, the probable match will give success to many high quality matches like: 

- "9833690611"
- "98336 90611"

- "98 33 69 06 11"
- "+91-98336.90611"
- "9 8 3 3 6 9 0 6 1 1"
- "phone: 9 83   36 90611"
and of course, 
- "9833690611/"

In 9 of 10 cases, Probable match is used, which performs a high quality pattern match based on algorithms which have been tested on millions of data points.  

Data Types
The following data types are indexed by DeDupe API: 
- Person's Name
- Company Name
- Email Address
- Mobile Phone Number
- Address
- Website URL
- Text
- Date 

Reference Key
A reference key ("reference_id" in API documentation) is the identifier of record in your system or database. This is the link between DeDupe and your environment. This key is unique identifier, which has one record of multiple data values (corresponding to their data types). 

DeDupe APIs
Besides the API to create a project, the following are 3 APIs that comprise DeDupe API:

CREATE: It takes the data values and reference key and creates exact and probable indexes within DeDupe environment for the project.

DELETE: It deletes all indexes within DeDupe for a reference key. To update a record, you will have to DELETE and then CREATE it. 

MATCH: It finds a match for the incoming string within DeDupe Database. The type of match (Exact or Probable) will need to be defined. It returns a set of reference keys that match the string. 

Examples of Probable Matches
The following examples are set of values that will match each other as 'probable'. 

Name - First Example
​Example of Name de-duplication taken from a medical institution in India. Combination of salutations, qualifications and swap of first name and surnames were considered​:
  • Sheela Joshi
  • Dr. Sheela Joshi, PhD
  • Mrs. Joshi, Sheela
  • Sheela Joshi, M.B.B.S.
  • Joshi Sheela
Name - Second Example
​​Example of Name de-duplication taken from a warranty registration database in Philippines:
  • Ivy Mathew Griffin
  • Ivy M. Griffin
  • Ivy Matt Griffin
Name - Third Example
​A powerful example of algorithm's  capabilities. Example of Name de-duplication taken from a database of a South Asian country:
  • Mohammad Kasim
  • Mohd. Kasim
  • Mhd. Kasim
  • Md. Kasim
  • Muhammad Kasim
Address - First Example
This is one of the best example of Address de-duplication, highlighted by algorithm ​within a massive CRM database in India.  Not only there are inconsistent abbreviations and spelling errors, the old and new official name of the city has been detected as duplicate: 
  • 43/2, Industrial Road, Sector 65, Bangalore
  • sector 65, bangalore - 43/2 (industrial) rd
  • 43\2, sectorr 65 - industrial rd,, BENGALURU
  • 43 2 indl rd sec 65 bangalore 
Address - Second Example
​An example of Address de-duplication, from Singapore:
  • #01-33, 92 Whampoa Annexe, Causeway Drive
  • 01 33, 92-whampoa annx, causeway dr
  • 01--33 whampoa anx #92, causeway drv
  • #92. Whampoa Anex. Causeway= Drive, 01,33
Address - Third Example
​An example of duplicate address cluster from Australia. Note the abbreviations and variations of state name captured in duplicate cluster: 
  • 7th Floor, 43/2 Miller Plaza, Industrial Highway, Sydney, New South Wales
  • 7th fl miller plz, (industrial) hw - 43 2, sydney, nsw
  • Seventh Floor. Indl Hway. 43-2 Plaza. Miller. Sydney. N.S.W.
  • Flr 7th, #43-2, miller pz, sydney indl hwye, ns.w
Phone
​The algorithm ​finds mobile numbers in multiple formats and groups the duplicate together. Here's an example of such a group:
  • +63-906-222-1520
  • 0063 9 06 22 21 520
  • +(906).222.1520
  • 0-906-22-21-520
  • 9 0 6 2 2 2 1 5 2 0
Company Name
An example of Company Name de-duplication, from India:
  • HPE India Private Limited
  • HPE India Private Ltd.
  • HPE Pvt. Ltd. - India
Email
  • [email protected]
  • [email protected]
  • [email protected]
URL
  • contactous.com
  • http://www.contactous.com/index.htm
  • www4.contactous.com/?query=malaysia
  • https://contactous.com:8080/
Text
  • H-144234
  • /H 1 4 4 2 34 ./
  • h144234
Date
  • 15/11/1984
  • 11-15-1984
  • 15.11.84
  • 1984, novembr 15

Data Security
While DeDupe API maintains a customized index of your database, it does not store any data values. It creates a unique encrypted index of every data value and stores it to match query strings.  For example, a value of name field: "John Doe" could be stored in our index as "3d95bc532661d5e56f126b28f4634fc8", which cannot be tracked back to original value. Now we would match an exact value of "John Doe" or probable matches of "Dr. John Doe, PHD" or "Mr. Doe, John" with the same index. 

Audience
DeDupe API is used by Software Developers and IT departments of organizations. ISVs and partners of large software vendors create extensions to add DeDuplication functionality to standard software provided by vendors like Salesforce, Microsoft and Zoho. Organizations use it to enhance the quality of their own systems and provide real-time check for incoming data. 

Try DeDupe API

© 2025 CONTACTOUS PTE LTD | ALL RIGHTS RESERVED

Support

FAQ
Contact Us

Resources

Privacy Policy
Terms of Use

Address

24 Raffles Place, #25-02A
Singapore 048621.
© 2016 CONTACTOUS PTE LTD
ALL RIGHTS RESERVED