Contactous
  • Products
    • Contact Management >
      • Enterprise Contact Manager (ECM)
      • ECM Pricing
    • Data Quality >
      • DeDupe API
      • CRM Data Quality
    • Data Parser >
      • On-Premise Data Parser
      • Cloud-based Data Extractor and Parser
    • AI Content >
      • Personalized Learning
    • RAG-as-a-service >
      • Answerous
      • Free Trial
    • Carbon Estimation API
  • Use Cases
    • Digital Business Cards
    • Customer Golden Record
    • Sales Funnel from Dealers
    • Automated Document Parser
    • Relationship Intelligence
    • Marketing Leads Management
    • Intelligent Data Import
    • CRM Data Consolidation
    • Webinars and Events
    • Physical Business Cards
    • Enterprise Pre-CRM
  • Company
    • Contact Us
    • Terms Of Use
    • Privacy Policy
  • Login
  • Products
    • Contact Management >
      • Enterprise Contact Manager (ECM)
      • ECM Pricing
    • Data Quality >
      • DeDupe API
      • CRM Data Quality
    • Data Parser >
      • On-Premise Data Parser
      • Cloud-based Data Extractor and Parser
    • AI Content >
      • Personalized Learning
    • RAG-as-a-service >
      • Answerous
      • Free Trial
    • Carbon Estimation API
  • Use Cases
    • Digital Business Cards
    • Customer Golden Record
    • Sales Funnel from Dealers
    • Automated Document Parser
    • Relationship Intelligence
    • Marketing Leads Management
    • Intelligent Data Import
    • CRM Data Consolidation
    • Webinars and Events
    • Physical Business Cards
    • Enterprise Pre-CRM
  • Company
    • Contact Us
    • Terms Of Use
    • Privacy Policy
  • Login

extractous: On-premise, secure data parsing

Unlimited data extraction from documents, on-premise.

Extractous takes the OCR results from a scanner, specialized software or RPA (Robotic Process Automation) as its input, identifies the patterns in the document and returns the extracted data in key/value pairs as output, back to its calling program. 

It is designed for organizations who are unable to use the data parsing APIs hosted  by third parties on cloud and require the data to be posted to other's servers. As many of the documents are likes of invoices, bank statements, contracts, medical records, etc, the data security and compliance team find it challenging to allow such documents to be processed by external programs on remote servers. Extractous solves this challenge.

Extractous' core function
Extractous takes the OCR results from a RPA workflow, a scanner or another program and returns the extracted data in desired format back to the calling program. It does not extracts text from an image or document. It does not call any external OCR program (for that, we have a cloud based data extraction and parsing product). It does one job - intelligently parses the incoming text and returns the data in expected format.

On-Premise Execution
Extractous resides within the organization's network and is designed for any volume of workload - from a few thousand documents to tens of millions. There are no external API calls from the program and no data is sent to any external program for processing. Internally, it does not store any data in its database, even though it runs on-premise. 

How does it work? 
Extractous works by finding patterns in the incoming text against a rule library. By default, Extractous will find the common fields in the document. To extract all the details and process complex formats, a rule will need to be written for each document type. Usually it takes a couple of hours to write rule for a simple document. 

As the OCR results are passed to Extractous, it matches the text across entire rule library (called RuleBook). There could be a set of rules for a document comprising of multiple fields or a common rule for a field working on multiple document types.

The rules are created and maintained by Contactous team for its users. These are uploaded to a common secure folder, which is used by Extractous program.

Picture

Implementation 
Once Extractous is installed, there is no work to be done by RPA or digital transformation team than just sending the OCR results to it and getting the output. New document templates (rules) can be added or existing ones modified, but there is no change on RPA or bots. All these changes are decoupled from the projects. Integration of Extractous to RPA can be done within minutes. 

Extractous Evaluation 
A cloud-based version of Extractous can be evaluated immediately. An API call is made to the program.  

Frequently Asked Questions

What is the quality of output that I can expect from Extractous?
​It is important to understand that Extractous works on the extracted text of a document that is passed to it from other programs. It is then searching that text for pre-defined patterns, extracting data from it and returning them in structured key/value pairs. Hence, the output of Extractous is strongly tied to the quality of input that is sent to it. The extracted text from image files or pdf documents converted to text are common inputs to Extractous.
Why am I not getting the output from Extractous for some documents?
​Based on incoming text, when Extractous finds a pattern, it returns it. In case the data is not returned, it could mean that 1) either there was no pattern match in the passed text or 2) the parsing rule for matching data is absent (and needs to be created).
Can I make the parsing rules for documents myself?
​In future, yes. We intend to make this functionality available to our users by end of 2021. We are making changes to the product so that it can comply to commonly used ways and upcoming standards of writing such parsing rules. Currently, the rules are written and maintained by our consulting team. When a new set of rule is to be created, a sample of document needs to be sent to us and we ensure that the system recognizes the output of that document. 
How do you update parsing rules if my implementation is on-premise?
​Your implementation of Extractous is at your premises and cannot be reached by us. However, we will need access to single secured shared folder in which we write the rule definitions, which are picked up by Extractous during its execution. 
My documents are very complex. Can Extractous handle them?
​If there is a consistent pattern in the data, a parsing rule can be written for Extractous. An example would be a standard contract document or summons in which the contact names are within the text, preceded or followed by a pattern. Or an invoice with multiple line details. Such complexity can be easily addressed by Extractous. 
What about Machine Learning (ML)? Is Extractous learning from my documents?
​Extractous is on-premise and is designed for information security. It does not learn from the parsed documents and has no 'memory' of them. This is by design. However, we do have knowledge and expertise on how to implement ML, but it is not used in on-premise version of Extractous. The updated version of software always has increasing number of new and advanced features, based on our own learnings and feedback from the users. 
What are the hardware requirements for Extractous?
​There are instances where Extractous is used to parse millions of records. For those, a small dedicated server is ideal. But if your requirements are for processing just a few hundred documents in a day, Extractous can be loaded on a shared server alongside the RPA or digital transformation projects. Once we know about your volume, we will send a recommendation of the hardware needed. 
Can Extractous work on documents which are not in English?
​Yes, it can. However we have one limitation which is not about Extractous - Our own knowledge of languages is currently limited to English. As we need to create parsing rules for our customers (till end of 2021) and it requires understanding of the language in which it is written, it becomes a challenge when it is not English. However, we have worked closely with customers in jointly creating the parsing rules using their knowledge of language. But then again, this will not be an issue in future. 
Does Extractous store any data? What programming language is it written in?
​Extractous does not store any data. This is by design for information security. Even the log file can be switched off. There is no communication between Extractous program and any third-party servers during its execution. Extractous uses PHP and Python as its primary programming languages. The program is delivered as an executable file that resides on your network. 
Which RPA and automation programs can it be integrated?
​As Extractous resides on your own network, it is internally called by automation programs or packages of digital transformation projects. A common case is calling Extractous from a Robotic Process Automation (RPA) program like UiPath, Servicetrace, BluePrism etc. The integration is easy as the extracted text from a OCR or a scanner is passed to the internally installed program of Extractous and key/value pairs are returned. 
How is Extractous priced?
There are 2 components of the pricing:
1) A flat yearly subscription of the program, which includes updates and upgrades. Unlimited documents can be processed. 
2) Parsing rules for every document type, with unlimited modifications (within a year).
​Please contact us with your requirements to get a quotation. 

Ask for an Evaluation Copy of Extractous

© 2025 CONTACTOUS PTE LTD | ALL RIGHTS RESERVED

Support

FAQ
Contact Us

Resources

Privacy Policy
Terms of Use

Address

24 Raffles Place, #25-02A
Singapore 048621.
© 2016 CONTACTOUS PTE LTD
ALL RIGHTS RESERVED