How We Extract Your Data

Too many documents, not enough time! Does it feel like this for you? For goods manufacturers in the supply chain, the sheer diversity of document types prevents them from solving customer issues as quickly as they’d like. It takes time to input and check data – even once it’s already in an ERP system or an EDI format. This is where OmPrompt comes in.

Data extraction, using OmPrompt’s technology, overcomes diversity by using recognition technologies to accurately pull the data out of your business documents, standardise it and store it in our system ready for validation.

Man-with-7-lightbulbs_shutterstock_3_243191026.png

EDI Might not Solve your Problem:

Even if you and your customer both invest in EDI, chances are, documents may still require manual re-work. This is because your EDI system doesn't use business rules or any other form of validation to ennsure meaningful data is pulled correctly. If data is incorrect, the document will fail.

Sometimes, EDI isn't always commercially viable or technically possible for you or your trading partners.

How Should Data be Extracted?

Use the right technology to give you the best results. With OmPrompt you can choose from three different levels of product offering; Premium, Standard or Basic. We’ll use our expertise and experience to work with you so you can find the right balance of speed, accuracy and reliability, based on your data requirements, and the level of human intervention that you prefer.

We use 3 Types of Data Extraction Techniques:

Automated Data Extraction

  • Lower volume
  • Little to no document structural consistency

Intelligent, learned extraction

Assisted Automation

  • Average Volume
  • Lower Quality
  • Inconsistent formats

Partial layout extraction

Full Automation

  • High volume
  • High quality
  • Structured formats

Exact layout extraction

Structured vs Unstructured Formats

Structured Formats
Most computer-generated business documents are created with a fixed layout. These structured, templated documents are often sent by customers who share high volumes of documents, with you. We use focused mapping techniques to extract this data with a high level of extraction accuracy.

Unstructured Formats
If you have a larger numbers of customers who order less frequently, they tend to use a variety of inconsistent documents and send them on an ad-hoc basis. Additionally, there can be non-formatted information on a document: handwriting, stamps, stickers, barcodes or, often a lack of information. This makes it more difficult for systems to analyse documents, and more time-consuming for people.

Hybrid Extraction Techniques

Not all PDFs are created equal. Often, if you copy and paste text from a PDF, you’ll see the challenges that computer extraction techniques face too. While the structure may look right when it’s rendered visually, as soon as you try to extract the data, it can change significantly.

Similarly, we don’t want to take the risk that OCR may misread the content. So we use a hybrid approach: combining and comparing OCR and data extraction to get the best results.

Hybrid-Extraction-Techniques_3.png
Get a