Unlimited data extraction from documents, on-premise. |
Extractous takes the OCR results from a scanner, specialized software or RPA (Robotic Process Automation) as its input, identifies the patterns in the document and returns the extracted data in key/value pairs as output, back to its calling program.
Extractous' core function
|
What is the quality of output that I can expect from Extractous? It is important to understand that Extractous works on the extracted text of a document that is passed to it from other programs. It is then searching that text for pre-defined patterns, extracting data from it and returning them in structured key/value pairs. Hence, the output of Extractous is strongly tied to the quality of input that is sent to it. The extracted text from image files or pdf documents converted to text are common inputs to Extractous. Why am I not getting the output from Extractous for some documents? Based on incoming text, when Extractous finds a pattern, it returns it. In case the data is not returned, it could mean that 1) either there was no pattern match in the passed text or 2) the parsing rule for matching data is absent (and needs to be created). Can I make the parsing rules for documents myself? In future, yes. We intend to make this functionality available to our users by end of 2021. We are making changes to the product so that it can comply to commonly used ways and upcoming standards of writing such parsing rules. Currently, the rules are written and maintained by our consulting team. When a new set of rule is to be created, a sample of document needs to be sent to us and we ensure that the system recognizes the output of that document. How do you update parsing rules if my implementation is on-premise? Your implementation of Extractous is at your premises and cannot be reached by us. However, we will need access to single secured shared folder in which we write the rule definitions, which are picked up by Extractous during its execution. My documents are very complex. Can Extractous handle them? If there is a consistent pattern in the data, a parsing rule can be written for Extractous. An example would be a standard contract document or summons in which the contact names are within the text, preceded or followed by a pattern. Or an invoice with multiple line details. Such complexity can be easily addressed by Extractous. What about Machine Learning (ML)? Is Extractous learning from my documents? Extractous is on-premise and is designed for information security. It does not learn from the parsed documents and has no 'memory' of them. This is by design. However, we do have knowledge and expertise on how to implement ML, but it is not used in on-premise version of Extractous. The updated version of software always has increasing number of new and advanced features, based on our own learnings and feedback from the users. What are the hardware requirements for Extractous? There are instances where Extractous is used to parse millions of records. For those, a small dedicated server is ideal. But if your requirements are for processing just a few hundred documents in a day, Extractous can be loaded on a shared server alongside the RPA or digital transformation projects. Once we know about your volume, we will send a recommendation of the hardware needed. Can Extractous work on documents which are not in English? Yes, it can. However we have one limitation which is not about Extractous - Our own knowledge of languages is currently limited to English. As we need to create parsing rules for our customers (till end of 2021) and it requires understanding of the language in which it is written, it becomes a challenge when it is not English. However, we have worked closely with customers in jointly creating the parsing rules using their knowledge of language. But then again, this will not be an issue in future. Does Extractous store any data? What programming language is it written in? Extractous does not store any data. This is by design for information security. Even the log file can be switched off. There is no communication between Extractous program and any third-party servers during its execution. Extractous uses PHP and Python as its primary programming languages. The program is delivered as an executable file that resides on your network. Which RPA and automation programs can it be integrated? As Extractous resides on your own network, it is internally called by automation programs or packages of digital transformation projects. A common case is calling Extractous from a Robotic Process Automation (RPA) program like UiPath, Servicetrace, BluePrism etc. The integration is easy as the extracted text from a OCR or a scanner is passed to the internally installed program of Extractous and key/value pairs are returned. How is Extractous priced? There are 2 components of the pricing: 1) A flat yearly subscription of the program, which includes updates and upgrades. Unlimited documents can be processed. 2) Parsing rules for every document type, with unlimited modifications (within a year). Please contact us with your requirements to get a quotation. |