AI-enabled Document Classification

Automated document classification process, reducing input errors and workflow issues.

Document Pile

The Story

This client came with issues surrounding their construction project planning and operations system. They needed a way to quickly organize and easily classify the tens of thousands of documents they had, a nearly impossible task manually. The client's cloud-based platform is widely used throughout the construction lifecycle and serves users in the United Kingdom, Ireland, Australia, Qatar, and the UAE. The goal was to streamline document workflow processes for their customers by automating document classification using AI technologies.

Technology: OCR, NLP, AI, Python, scikit-learn, Tesseract OCR, AWS services

Challenges

This was a large project; the system would need to handle various document types such as Microsoft Office files, PDFs, images, and AutoCAD drawings, along with a sizable dataset of close to 45,000 pieces of content. We also needed to ensure accurate document classification, demanding thorough exploration and testing of machine learning models and techniques to address dataset imbalances and ensure precise classification outcomes.

Key Features

  • Conducted dataset analysis on various document types and assigned three labels to each document.
  • Utilized Tesseract OCR for optical character recognition and text vectorization.
  • Tested multiple machine learning models, adopting an ensemble learning approach for enhanced accuracy.
  • Developed an AI-based Document Classification API achieving 96% document-level and 98% label-level accuracy.
  • Securely deployed the solution on Amazon Web Services (AWS) for GDPR compliance and client data control.

Hitting The Target

The implementation of AI-based document classification aimed to automate the manual entry of document details into the document management system, resulting in a smoother user journey and reduced malfunctions in other modules of the system. By enhancing accessibility and scalability, the solution provided extensive document support, robust analysis, and high accuracy in categorizing documents. The integration of AI technologies into the DMS offered significant benefits in terms of efficiency, accuracy, and cost savings compared to manual organization.

45,000

Documents to Organize

98

Classification Accuracy
Ready to get started?

We’d love to talk about how we can work together.

Let's Talk
Ready to get started?

We’d love to talk about how we can work together.

Let's Talk
Daredevil Diaries

Slingshot news, company information, and resources.

What's New
Daredevil Diaries

Slingshot news, company information, and resources.

What's New