OCR Engine · Extracting fields
REPUBLIC OF INDIA · AADHAAR CARD
RISHI KUMAR SHARMA
XXXX XXXX 3847 · DOB: 14/03/1990
IN
MALE · S/O RAJESH SHARMA · 42 MG ROAD, BANGALORE, KA 560001
"full_name""RISHI KUMAR SHARMA"
99.8%
"dob""1990-03-14"
99.9%
"gender""MALE"
100%
"uid_number""XXXX-XXXX-3847"
99.7%
"address""42 MG Road, Bangalore"
98.2%
"father_name""RAJESH SHARMA"
99.1%
"state""Karnataka"
99.5%
Fields extracted
All fields · 624ms · Aadhaar format
12 ✓

99.4%

Field accuracy

190+

Doc types

40+

Field types

<800ms

Response time

What We Extract

Every meaningful field. From any document format.

DigiVerify's OCR engine goes beyond simple text reading — it understands document structure, field semantics, and country-specific formats to return typed, labelled, validated data.

Personal Identity Fields

Core identity data extracted and typed precisely. Names returned in both full and split form. Dates normalised to ISO 8601 regardless of the source format on the document.

full_namefirst_namelast_namedobagegendernationalityplace_of_birth

Document Metadata Fields

All document-level data extracted and validated. Issue and expiry dates cross-checked against MRZ where available. Document numbers cleaned and normalised.

document_numberdocument_typeissuing_countryissuing_authorityissue_dateexpiry_dateis_expired

MRZ Zone Data

Full Machine Readable Zone parsing per ICAO 9303. Both MRZ lines extracted, checksums computed and validated, and all constituent fields decoded and returned individually.

mrz_line_1mrz_line_2mrz_validchecksum_passedmrz_document_numbermrz_dob

Address & Location Fields

Full address extraction where present on the document. Parsed into structured sub-fields — street, city, state, postal code — using country-specific address format knowledge.

address_fullstreetcitystatepostal_codecountry

Country-Specific ID Numbers

Dedicated extraction models for each country's unique identifier formats — validated against their official check-digit algorithms and format specifications.

aadhaar_idpan_numberssn_numbernhs_numberepic_numberghana_card_id

Biometric & Machine Data

The face photo is cropped, aligned, and returned as a URL ready for downstream face matching. QR codes and barcodes on documents are decoded and their payloads extracted.

face_image_urlqr_code_databarcode_datafingerprint_refsignature_present

How It Works

From raw image to structured data — in under 800ms.

The OCR pipeline runs four specialised stages in sequence, each optimised for accuracy over speed — and still returns results in under a second.

Image Pre-processing
Perspective Fix
De-skew
Glare Remove
Contrast Norm.
Resolution Scale
Binarisation
Input quality: HIGH · DPI: 286 · Skew: 1.2° corrected
Document Type & Region Detection

Region Detection

IN Aadhaar Card — IndiaAUTO-DETECTED
📋Personal data zone
Found ✓
🖼️Photo zone
Found ✓
📱QR code zone
Found ✓
📍Address zone
Found ✓
Field Extraction & Parsing

Extracting & Parsing Fields

Name fields99.8%
Date of birth99.9%
UID number99.7%
Address98.2%
QR decode100%
Structured JSON Output
EXTRACTEDResponse: 624ms
{
  "document_type": "AADHAAR",
  "issuing_country": "IND",
  "full_name": "RISHI KUMAR SHARMA",
  "dob": "1990-03-14",
  "gender": "MALE",
  "uid_number": "XXXX-XXXX-3847",
  "father_name": "RAJESH SHARMA",
  "address_full": "42 MG Road...",
  "state": "Karnataka",
  "qr_decoded": true,
  "response_ms": 624
}

COUNTRY-SPECIFIC EXTRACTION

Trained on the exact fields each country's documents carry.

Generic OCR reads text. DigiVerify understands documents — it knows what Aadhaar fields look like vs a Ghana Card vs a PVC, and extracts accordingly.

logo
India

UIDAI · IT Dept · ECI · MHA · ARTO

LIVE
Aadhaar Card
uid_numberfull_namedobgenderaddressfather_namestateqr_decoded
PAN Card
pan_numberfull_namefather_namedobsignature_present
Voter ID (EPIC)
epic_numberfull_namefather_namedobaddressconstituencypart_number
Passport
passport_numberfull_namedobexpiryplace_of_birthmrz_line_1mrz_line_2
logo
Nigeria

NIMC · INEC · FRSC · CBN

LIVE
NIN Slip / NIN Card
nin_numberfirst_namedobgenderphonefingerprint_ref
PVC (Voter Card)
vin_numberfull_namedoblgastatepolling_unit
BVN Slip
bvn_numberfull_namedobphone
International Passport
passport_numberfull_namedobexpiryplace_of_issuemrz_valid
logo
Ghana

NIA · EC Ghana · DVLA · SSNIT

LIVE
Ghana Card (NIA)
ghana_card_idfull_namedobgenderplace_of_birth
Voter ID (EC Ghana)
voter_idfull_namedobconstituencyregion
SSNIT Card
ssnit_numberfull_namedob
Ghanaian Passport
passport_numberfull_namedobexpirymrz_line_1mrz_line_2
logo
Kenya

NIIMS · NTSA · Immigration Dept

LIVE
National ID (Huduma)
id_numberfull_namedobgenderdistrictdivision
Kenyan Passport
passport_numberfull_namedobexpirynationalitymrz_valid
Driver's Licence (NTSA)
licence_numberfull_namedobexpiryvehicle_classes
Alien ID Card
alien_idfull_namedobnationalityexpiry

Output Formats

Data in the shape your system needs.

DigiVerify returns extracted data in structured JSON with per-field confidence scores — typed, labelled, and cross-validated. Every response includes metadata on extraction quality and any low-confidence fields flagged for review.

Typed JSON

All fields returned with their correct data types — strings, ISO dates, booleans, enums. No string-to-date parsing in your codebase.

Confidence Scores

Every field carries a per-field confidence score (0–100). Low-confidence fields are flagged separately so your system can decide whether to auto-approve or queue for manual review.

Cross-validation Flags

Where MRZ data is present, visually-extracted fields are cross-validated against it. Mismatches are flagged with a discrepancy note — a strong signal of document tampering.

For Developers

Raw image in. Clean structured data out.

No template configuration. No regex rules. Pass an image and a country code — DigiVerify handles the rest and returns typed, labelled, confidence-scored fields.

Auto document type detection

Omit document_type and DigiVerify detects it automatically. Works reliably across all 190+ supported document types.

Bundled with Document Verification

OCR runs automatically as part of Document Verification — no separate call needed unless you want extraction-only without the full authenticity check.

QR & barcode decode included

Set decode_barcodes: true to automatically decode QR codes and barcodes present on documents like Aadhaar, PAN, and Ghana Card.

Front + back side support

Submit both sides of two-sided documents in a single call. DigiVerify merges the extracted fields from both sides into one unified response.

ocr-extract.js
// DigiVerify — OCR Extraction API
const result = await fetch('https://api.DigiVerify.com/v1/ocr/extract', {
  method: 'POST',
  headers: {
    'Authorization': `Bearer ${apiKey}`,
    'Content-Type': 'application/json'
  },
  body: JSON.stringify({

    // Required
    document_image: documentBase64,
    country: 'IN', // IN | NG | GH | KE

    // Optional — auto-detected if omitted
    document_type: 'AADHAAR',

    // Back side for two-sided documents
    document_back: backBase64,

    // Return per-field confidence scores
    return_confidence: true,

    // Decode QR/barcodes present on doc
    decode_barcodes: true
  })
});

const {
  extracted_data, // typed fields object
  confidence_scores, // per-field scores
  low_confidence, // fields needing review
  cross_validated, // MRZ vs visual checks
  face_image_url // cropped photo URL
} = await result.json();

OCR is how you read the document. These are how you trust it.

Extracted data is only as good as the document it comes from. Pair OCR with authenticity checks to know the data is real.

Document Verification

OCR runs inside every Document Verification call — but Document Verification also authenticates the document, checks for tampering, and validates security features. Trust the source before you trust the data.

Learn more

Face Verification

Once the document is read and authenticated, the face photo extracted by OCR becomes the reference image for biometric matching — confirming the person matches the document data.

Learn more

Liveness Detection

Combine OCR data extraction with liveness detection to confirm the user submitting the document is physically present — not a replay or a fraudster using someone else's documents.

Learn more

Get Started

See OCR extraction in action.

Book a demo and we'll run live extractions on documents from your target market — India, Nigeria, Ghana, or Kenya.

Schedule Your Live DemoDownload Guide
Section Page CTA

Read our insightful blogs!

Stay updated with the latest trends and innovations in finTech with our insightful blogs.