How can you reduce storage costs by automatically categorising documents?

Discover how Coexya’s experts helped a major energy player optimize their document management and storage costs.

Automatic document categorisation — a concrete lever for reducing costs

Data volumes in organisations increased fivefold between 2020 and 2025, with an average annual growth rate of 35%. This exponential growth creates three major problems: data obsolescence from unpurged files, growing complexity around GDPR and regulatory compliance, and accumulating storage costs — hardware, backups, licences, and duplicate management. Automatic categorisation of documents, whether structured or unstructured, based on their type and content, directly addresses all three challenges.

The client case: a major player in the energy sector

An energy sector organisation approached Coexya to design a solution for optimising its document storage costs. The principle adopted was straightforward: document type determines retention period. The project therefore involved automating the classification of documents according to a ten-category classification plan — identity documents (10 years), contracts (15 years), training materials (5 years), and so on — in order to automatically derive the applicable retention period for each file.

The Coexya approach in 6 steps

Coexya’s Search & Semantics experts deployed a supervised machine learning model, integrating the Sinequa platform for OCR, model training and application. The initial corpus comprised approximately 1,000 manually annotated documents, split into a training corpus (70%), an evaluation corpus (30%), and an application corpus. The project was completed in under two months — one month for implementation and three weeks for the evaluation phase — for a total effort of approximately 35 person-days.

Measurable results: 80% of categories achieving an F1-score above 80%

Evaluation of the model on a corpus of 273 documents showed that 80% of categories achieve an F1-score above 80%. Precision reaches 91% for documents classified with a confidence level above 30%, which represents 77% of the total volume. The model improves continuously: documents classified with insufficient confidence are redirected for manual annotation and then reintegrated into the training corpus.

A publication by the Search & Semantics experts at Coexya:

Jean-Louis Vila, CTO — Gaël Yvrard, Project Director — Pierre Martin, Sales Engineer

Download the white paper

👉Find out more about our Search offer

These publications may interest you

White papers 6 Jul 2026

Tokenised Deposit

A strategic guide for banks preparing the next generation of payment infrastructures and large corporates in needs of a better understanding of the future of payments.

White papers 27 Apr 2026

Rationalising the use of middleware / iPaaS

Integration platforms are now at the heart of information systems. Yet in many organisations, they remain only partially utilised, with a return on investment that is difficult to demonstrate. How ca...

White papers 18 Feb 2026

The European Union Artificial Intelligence regulat...

In just a few years, AI has become widely adopted, driven by advances in deep learning and the emergence of large-scale public language models. However, this rapid uptake also raises very concrete que...

White papers 20 Nov 2025

Personal data: how to anticipate risks?

To learn more, download the white paper Personal Data Protection: Anticipating Risk Together, co-written with Microsoft and DPO Consulting.

White papers 19 Sep 2025

Evaluation of a RAG solution

This white paper outlines the essentials for deploying and optimizing a RAG solution, with practical methods, use cases, and evaluation tools.

White papers 20 Jul 2025

Document management: what is your organisation’s l...

With the surge in data and rising regulatory requirements, mastering document management has become a strategic priority.

White papers 20 Jun 2025

Taking accessibility into account during the devel...

Learn how to embed accessibility throughout your projects to create inclusive, compliant user experiences. This guide helps you make accessibility a priority.

White papers 20 May 2025

The importance of accessibility in the UX/UI phase...

This guide raises awareness of digital accessibility issues and provides best practices to deliver inclusive services that meet legal requirements.

White papers 20 Apr 2025

Electronic Signature: Add a touch of security and ...

Discover how advances in digital security, e-signatures, and AI are redefining organizational performance in the Archimag special issue featuring a contribution from TEDIJI by Coexya.

White papers 20 Mar 2025

Are you ready for generative AI? Test your maturit...

Take our interactive quiz and get an instant assessment of your AI maturity level.

White papers 20 Jan 2025

Discover how AI is transforming image recognition ...

Download this white paper to discover how AI is transforming intellectual property and the tangible benefits it can bring to your company.

White papers 20 Feb 2024

Fostering collective intelligence in organisations

At Coexya, we believe that collective strength and the complementarity of talents lead to more innovative and more meaningful results.

White papers 20 Oct 2023

Study: the quality of working life of IT professio...

Download the results of our study now.

White papers 20 Apr 2023

Transforming the public sector: keys to digitalisa...

To learn more about the digital transformation of the public sector, download the white paper produced by Archimag with contributions from Coexya’s SAE teams.

White papers 2 Feb 2023

Coexya DeCodes: IT jargon

A valuable resource to help you navigate a constantly evolving landscape.