Contactous
  • Products
    • Contact Management >
      • Enterprise Contact Manager (ECM)
      • ECM Pricing
      • Mobile Downloads >
        • Android Mobile App
        • iOS Mobile App
      • CRM Connections >
        • Salesforce AppExchange
        • Microsoft AppSource
        • Zoho (Contact Manager)
      • Presentations >
        • Microsoft Dynamics Integration
        • Zoho Marketplace Webinar
    • Data Quality >
      • DeDupe API
      • DataStitch
      • CRM Data Quality
      • Zoho CRM Extension
    • Data Parser >
      • On-Premise Data Parser
      • Cloud-based Data Extractor and Parser
  • Use Cases
    • Digital Business Cards
    • Customer Golden Record
    • Sales Funnel from Dealers
    • Automated Document Parser
    • Relationship Intelligence
    • Marketing Leads Management
    • Intelligent Data Import
    • CRM Data Consolidation
    • Webinars and Events
    • Physical Business Cards
    • Enterprise Pre-CRM
  • Company
    • Contact Us
    • Terms Of Use
    • Privacy Policy
  • Login
  • Products
    • Contact Management >
      • Enterprise Contact Manager (ECM)
      • ECM Pricing
      • Mobile Downloads >
        • Android Mobile App
        • iOS Mobile App
      • CRM Connections >
        • Salesforce AppExchange
        • Microsoft AppSource
        • Zoho (Contact Manager)
      • Presentations >
        • Microsoft Dynamics Integration
        • Zoho Marketplace Webinar
    • Data Quality >
      • DeDupe API
      • DataStitch
      • CRM Data Quality
      • Zoho CRM Extension
    • Data Parser >
      • On-Premise Data Parser
      • Cloud-based Data Extractor and Parser
  • Use Cases
    • Digital Business Cards
    • Customer Golden Record
    • Sales Funnel from Dealers
    • Automated Document Parser
    • Relationship Intelligence
    • Marketing Leads Management
    • Intelligent Data Import
    • CRM Data Consolidation
    • Webinars and Events
    • Physical Business Cards
    • Enterprise Pre-CRM
  • Company
    • Contact Us
    • Terms Of Use
    • Privacy Policy
  • Login

On-premise, DIY data preparation

DataStitch is a Do-It-Yourself data preparation application. It enables non-technical users to consolidate disconnected data in CSV files and create clean usable datasets without external help.

You have successfully collected data. Millions of records in multiple formats resides in disconnected files - coming from multiple systems, regions, sources and timelines. DataStitch makes them usable by cleaning and connecting them. 

On-Premise Execution
DataStitch executes on your Windows desktop/laptop. Your data does not have to be uploaded on any external server and does not have to leave the organization's systems (or even your desktop). There is no communication between DataStitch and any external system and hence it can run without connectivity in a standalone mode. 

Data Volume
DataStitch​ has been tested and used for tens of millions of records. Still it does not necessarily require a server and performs well on a fast desktop machine. DataStitch​ is currently supported on Windows. Special builds on Linux can be provided. 

No Implementation 
DataStitch finds patterns across dozens of dimensions automatically and requires no setup. It works on CSV files as input and outputs the records in same format, grouped by similar clusters.  The one important  consideration is the name of header fields within the CSV file, which plays an important part in the way that field is treated by the system. 

DataStitch Evaluation 
A fully functional version of DataStitch​ application is available. It can ingest any number of data files and perform operations on very large datasets. The evaluation version allows 10,000 records to be exported out of the application. 

DeDuplication Quality
The algorithms used in DataStitch are exactly same as ones used in our other products which are hosted on cloud for CRM Data Quality, Enterprise Contact Management or as plugins for standard CRM like Zoho. These have been tested on hundreds of millions of records since 2016. The learnings are part of executable software. 

Here are some examples of results that can be expected. Majority of these are close to real cases of duplicate data found by algorithms during their use. The real data has been changed for confidential reasons, but the discovered pattern is intact.

Full Name - First Example
Example of Name de-duplication taken from a medical institution in India. Combination of salutations, qualifications and swap of first name and surnames were considered​:
  • Sheela Joshi
  • Dr. Sheela Joshi, PhD
  • Mrs. Joshi, Sheela
  • Sheela Joshi, M.B.B.S.
  • Joshi Sheela
Full Name - Second Example
​Example of Name de-duplication taken from a warranty registration database in Philippines:
  • Ivy Mathew Griffin
  • Ivy M. Griffin
  • Ivy Matt Griffin
Full Name - Third Example
A powerful example of algorithm's  capabilities. Example of Name de-duplication taken from a database of a South Asian country:
  • Mohammad Kasim
  • Mohd. Kasim
  • Mhd. Kasim
  • Md. Kasim
  • Muhammad Kasim
Address - First Example
This is one of the best example of Address de-duplication, highlighted by algorithm ​within a massive CRM database in India.  Not only there are inconsistent abbreviations and spelling errors, the old and new official name of the city has been detected as duplicate: 
  • 43/2, Industrial Road, Sector 65, Bangalore
  • sector 65, bangalore - 43/2 (industrial) rd
  • 43\2, sectorr 65 - industrial rd,, BENGALURU
  • 43 2 indl rd sec 65 bangalore 
Address - Second Example
An example of Address de-duplication, from Singapore:
  • #01-33, 92 Whampoa Annexe, Causeway Drive
  • 01 33, 92-whampoa annx, causeway dr
  • 01--33 whampoa anx #92, causeway drv
  • #92. Whampoa Anex. Causeway= Drive, 01,33
Address - Third Example
​An example of duplicate address cluster from Australia. Note the abbreviations and variations of state name captured in duplicate cluster: 
  • 7th Floor, 43/2 Miller Plaza, Industrial Highway, Sydney, New South Wales
  • 7th fl miller plz, (industrial) hw - 43 2, sydney, nsw
  • Seventh Floor. Indl Hway. 43-2 Plaza. Miller. Sydney. N.S.W.
  • Flr 7th, #43-2, miller pz, sydney indl hwye, ns.w
Mobile Numbers
The algorithm ​finds mobile numbers in multiple formats and groups the duplicate together. Here's an example of such a group:
  • +63-906-222-1520
  • 0063 9 06 22 21 520
  • +(906).222.1520
  • 0-906-22-21-520
  • 9 0 6 2 2 2 1 5 2 0
Company Name - First Example
An example of Company Name de-duplication from Philippines:
  • HPE Philippines Incorporated
  • HPE Philippines Inc.
  • HPE Phils, Inc.​
Company Name - Second Example
Another similar example of Company Name de-duplication, from India:
  • HPE India Private Limited
  • HPE India Private Ltd.
  • HPE Pvt. Ltd. - India
Name + Mobile Number
​Example of 4 duplicate Name and Mobile Number combinations as found by system:

Name: Mohd. Kasim
Mobile: 0091 98 33 69 06 11

Name: Mhd. Kasim
Mobile: (9833) 690-611

Name: Md. Kasim
Mobile: 0-98336-90611

Name: Muhammad Kasim
Mobile: 9 8 3 3 6 9 0 6 1 1
Name + Date of Birth
Example of 4 duplicate Person's Name and Company Name combinations found: 

Person's Name: Dr. Nicholas A. Beck
Date of Birth: 15/11/1984

Person's Name: Nicholas Albert Beck
Date of Birth: 11-15-1984

Person's Name: Nicholas A Beck, PHD
Date of Birth: 15.11.84

Person's Name: Mr. Nicholas Albert Beck
Date of Birth: 1984, novembr 15
Website URL
DataStitch ​groups different Website URLs which refer to the same page in a single cluster. Here's an example of such a group:
  • contactous.com
  • http://www.contactous.com/index.htm
  • www4.contactous.com/?query=malaysia
  • https://contactous.com:8080/

Manage Duplicate Records
DataStitch automatically indexes names, addresses, dates and text fields, according to their definition. Merging of duplicate records based on a field or a selection of any number of fields is a common choice. During the merge operation, you can select one or a cluster of keys which you would like to be identified as a unique one (eg, Full Name + DateOfBirth, Full Name + Mother's Maiden Name + IC-Number-Text). You can define the definition of master record and DataStitch will process them at a fast pace. While doing this merge operation, a new dataset is created and the original dataset is never modified. You also have the option of downloading the original dataset with unique cluster keys without merging. 

Dataset Operations
Powerful set manipulation operations can be performed between two datasets, based on the common fields between them. This is explained in detail as a Use Case. The common indexed values are used to join two datasets, find common records between them, subtract one from other or more complex operations. As DataStitch uses its indexed values, address value of "Twelve Newton Crossing" gets matched to "12 newton xing" and date value of "03-Aug-1968" gets matched to "3 augst '68" during these operations, making them useful to data custodians.  A demonstration of DataStitch's dataset operations is available. 

DataStitch Licensing 
Tens of millions of records can be imported to DataStitch without any cost. The application's licensing works on the keys which enables the processed data to be exported out. Hence, a pack of 1 Million keys will enable a Million records to be exported out of application in CSV. 

Ask for an Evaluation Copy of DataStitch

© 2022 CONTACTOUS PTE LTD | ALL RIGHTS RESERVED

Support

FAQ
Contact Us

Resources

Privacy Policy
Terms of Use

Address

24 Raffles Place, #25-02A
Singapore 048621.
© 2016 CONTACTOUS PTE LTD
ALL RIGHTS RESERVED