Contactous
  • Contact Management
    • Enterprise Contact Manager (ECM)
    • AddOn: Unstructured Data Extractor
    • ECM Pricing
    • Mobile Downloads >
      • Android Mobile App
      • iOS Mobile App
    • CRM Connections >
      • Salesforce AppExchange
      • Microsoft AppSource
      • Zoho (Contact Manager)
    • Presentations >
      • Microsoft Dynamics Integration
      • Zoho Marketplace Webinar
  • Data Quality
    • DataStitch: On-Premise DeDuplication
    • CRM Data Quality
    • Zoho DeDuplication >
      • Zoho CRM Extension
      • DeDupe @Marketplace
  • Use Cases
    • Customer Golden Record
    • Sales Funnel from Dealers
    • Automated Document Parser
    • Relationship Intelligence
    • Marketing Leads Management
    • Intelligent Data Import
    • Events and Tradeshows
    • Business Cards Management
    • Enterprise Pre-CRM
  • Company
    • About Us
    • Contact Us
    • Terms Of Use
    • Privacy Policy
  • Login
  • Contact Management
    • Enterprise Contact Manager (ECM)
    • AddOn: Unstructured Data Extractor
    • ECM Pricing
    • Mobile Downloads >
      • Android Mobile App
      • iOS Mobile App
    • CRM Connections >
      • Salesforce AppExchange
      • Microsoft AppSource
      • Zoho (Contact Manager)
    • Presentations >
      • Microsoft Dynamics Integration
      • Zoho Marketplace Webinar
  • Data Quality
    • DataStitch: On-Premise DeDuplication
    • CRM Data Quality
    • Zoho DeDuplication >
      • Zoho CRM Extension
      • DeDupe @Marketplace
  • Use Cases
    • Customer Golden Record
    • Sales Funnel from Dealers
    • Automated Document Parser
    • Relationship Intelligence
    • Marketing Leads Management
    • Intelligent Data Import
    • Events and Tradeshows
    • Business Cards Management
    • Enterprise Pre-CRM
  • Company
    • About Us
    • Contact Us
    • Terms Of Use
    • Privacy Policy
  • Login

On-premise de-duplication of large datasets

DataStitch: Secure, On-Premise DeDuplication

DataStitch, a Contactous product, is intended for organizations with large datasets in millions of records, that need to be cleaned through quality deduplication, without uploading the data to any external system or cloud. 

On-Premise Execution
DataStitch is designed for execution on premises. The data does not leave the organization's systems. There is no communication between DataStitch and any external system and hence it can run without connectivity in a standalone mode. 

Data Volume
DataStitch​ has been tested and used for tens of millions of records. Still it does not necessarily require a server and performs well on a fast desktop machine. DataStitch​ is supported both on Windows and popular Linux platforms. 

Customization
By default, DataStitch will find patterns across dozens of dimensions and requires no setup. It works on CSV files as input and outputs the records in same format, grouped by similar clusters.  Customized algorithms can be added to DataStitch and data can be extracted from external data sources by DataStitch, if required. As DataStitch accepts the data in standard CSV format, the customization can be decoupled from DataStitch and can be performed by organization's IT department or their preferred partners by extracting the data in CSV and processing it by DataStitch​.  

DataQual Evaluation 
Yes, a fully functional version of DataStitch​ application is available.  

DeDuplication Quality
The algorithms used in DataStitch are exactly same as ones used in our other products which are hosted on cloud for CRM Data Quality, Enterprise Contact Management or as plugins for standard CRM like Zoho. These have been tested on hundreds of millions of records during past decade. The learnings are part of executable software. 

Here are some examples of results that can be expected. Majority of these are close to real cases of duplicate data found by algorithms during their use. The real data has been changed for confidential reasons, but the discovered pattern is intact.

Full Name - First Example
Example of Name de-duplication taken from a medical institution in India. Combination of salutations, qualifications and swap of first name and surnames were considered​:
  • Sheela Joshi
  • Dr. Sheela Joshi, PhD
  • Mrs. Joshi, Sheela
  • Sheela Joshi, M.B.B.S.
  • Joshi Sheela
Full Name - Second Example
​Example of Name de-duplication taken from a warranty registration database in Philippines:
  • Ivy Mathew Griffin
  • Ivy M. Griffin
  • Ivy Matt Griffin
Full Name - Third Example
A powerful example of algorithm's  capabilities. Example of Name de-duplication taken from a database of a South Asian country:
  • Mohammed Qasim
  • Mohammad Kasim
  • Mohd. Kasim
  • Mhd. Kasim
  • Md. Kasim
  • Muhammad Kasim
Full Name - Fourth Example
Example of variations of a name considered in a suspected duplicate cluster: 
  • Casey Pabilla
  • Cassey Pabilla
  • Caseyy Pabilla
  • Caesey Pabilla
  • Caseey Pabilla
  • Caseey Pabillaa
  • Caseey Pabella
  • Caseeey Pabilla
  • Caasaay Pabilla
  • Caasey Pabilla
  • Caseey Pabilla
  • Casey Pabillla
  • Casey Pabellla
  • Casey Pabella
  • Casiy Pabilla
  • Casii Pabilla
  • Casey Pabiilla
  • Casey Pabilla
  • Caseeyy Pabilla
  • Caassey Pabilla
Address - First Example
This is one of the best example of Address de-duplication, highlighted by algorithm ​within a massive CRM database in India.  Not only there are inconsistent abbreviations and spelling errors, the old and new official name of the city has been detected as duplicate: 
  • 43/2, Industrial Road, Sector 65, Bangalore
  • sector 65, bangalore - 43/2 (industrial) rd
  • 43\2, sectorr 65 - industrial rd,, BENGALURU
  • 43 2 indl rd sec 65 bangalore 
Address - Second Example
An example of Address de-duplication, from Singapore:
  • #01-33, 92 Whampoa Annexe, Causeway Drive
  • 01 33, 92-whampoa annx, causeway dr
  • 01--33 whampoa anx #92, causeway drv
  • #92. Whampoa Anex. Causeway= Drive, 01,33
Address - Third Example
​An example of duplicate address cluster from Australia. Note the abbreviations and variations of state name captured in duplicate cluster: 
  • 7th Floor, 43/2 Miller Plaza, Industrial Highway, Sydney, New South Wales
  • 7th fl miller plz, (industrial) hw – 43 2, sydney, nsw
  • Seventh Floor. Indl Hway. 43-2 Plaza. Miller. Sydney. N.S.W.
  • Flr 7th, #43—2, miller pz, sydney indl hwye, ns.w
Mobile Numbers
The algorithm ​finds mobile numbers in multiple formats and groups the duplicate together. Here's an example of such a group:
  • +63-906-222-1520
  • 0063 9 06 22 21 520
  • +(906).222.1520
  • 0-906-22-21-520
  • 9 0 6 2 2 2 1 5 2 0
Company Name - First Example
An example of Company Name de-duplication from Philippines:
  • HPE Philippines Incorporated
  • HPE Philippines Inc.
  • HPE Phils, Inc.
  • H.P.E. Incorporated
  • HPE Inc
Company Name - Second Example
Another similar example of Company Name de-duplication, from India:
  • HPE India Private Limited
  • HPE India Private Ltd.
  • HPE Pvt. Ltd. – India
  • H.P.E. Pvt. Ltd.
  • HPE Limited
Name + Mobile Number
​Example of 5 duplicate Name and Mobile Number combinations as found by system:

Name: Mohammad Kasim 
Mobile: +91-98336-90611

Name: Mohd. Kasim
Mobile: 0091 98 33 69 06 11

Name: Mhd. Kasim
Mobile: (9833) 690-611

Name: Md. Kasim
Mobile: 0-98336-90611

Name: Muhammad Kasim
Mobile: 9 8 3 3 6 9 0 6 1 1
Name + Date of Birth
Example of 4 duplicate Person's Name and Company Name combinations found: 

Person's Name: Narendra Bajpayee
Date of Birth: 15/11/1984

Person's Name: Narindir Bajpayee
Date of Birth: 11-15-1984

Person's Name: Narender Bejpeyee
Date of Birth: 15.11.84

Person's Name: Nariinder Baajpayii
Date of Birth: 1984, novembr 15
Name + Company
​Example of 4 duplicate combinations of Person's and Company Names as found by DataStitch:

Person's Name: Sanjiv Kumar
Company's Name: HPE India Private Limited

Person's Name: Sanjeve Kumarr
Company's Name: HPE India Private Ltd.

Person's Name: Sanjeev Qumar
Company's Name: HPE Pvt. Ltd

Person's Name: Sanjive Koomar
Company's Name: HPE Limited
Website URL
DataStitch ​groups different Website URLs which refer to the same page in a single cluster. Here's an example of such a group:
  • contactous.com
  • http://www.contactous.com/index.htm
  • www4.contactous.com/?query=malaysia
  • https://contactous.com:8080/

Ask for an Evaluation Copy of DataStitch

© 2020 CONTACTOUS PTE LTD | ALL RIGHTS RESERVED

Support

FAQ
Contact Us

Resources

Privacy Policy
Terms of Use

Address

24 Raffles Place, #25-02A
Singapore 048621.
© 2016 CONTACTOUS PTE LTD
ALL RIGHTS RESERVED