Why structuring data remains one of the hardest problems in insurance transformation

Ian Smith Categories: Business Insights Date 24-Mar-2026 10 minutes to read

Insurance companies collect vast amounts of information about policyholders, assets, exposures, and claims across decades of operations. However, the ability to consistently structure and use that data remains one of the most persistent obstacles in the industry’s digital transformation efforts.

The issue is not simply technological maturity. Over the past decade, insurers have invested heavily in analytics platforms, data lakes, and artificial intelligence capabilities.

Many organisations have also introduced data extraction technologies designed to capture information from both structured and unstructured sources, including policy documents and other physical records such as paper slips.

In parallel, insurers have invested in master and metadata management tools in an effort to standardise definitions and improve data consistency across systems.

Yet the underlying constraint often remains the same. Operational data continues to be fragmented, inconsistent, and frequently embedded in documents rather than structured systems.

Industry research shows the same. Research found that data analysts in large companies spend as much as 80% of their time searching for, cleaning and preparing data for analysis, while only 20% of their time is used to analyse data.

These structural conditions explain why many AI initiatives in insurance struggle to move beyond experimentation. Analytical models depend on consistent inputs, yet the data environments of many insurers were never designed with structured analytics in mind. To understand why this challenge persists, it is necessary to look at how insurance data is created and managed across the industry.

The data challenges insurers face

Insurance operations still rely heavily on document-based information

Insurance products are fundamentally legal agreements that describe risk coverage through detailed contractual language. Policies, endorsements, exclusions, and coverage conditions are written in narrative form to ensure legal precision and regulatory compliance. As a result, the policy document itself has historically served as the primary source of truth.

This document-centric structure continues to shape operational processes today. Underwriting submissions frequently arrive as broker emails accompanied by engineering reports, financial statements, and risk surveys. Claims files accumulate adjuster notes, medical documentation, invoices, photographs, and correspondence over time. While core systems may capture summary attributes such as policy limits or claim values, a significant portion of the contextual information required for decision-making remains embedded in supporting documentation.

From a data architecture perspective, this creates an inherent challenge. Key attributes are often described in text rather than captured as structured fields. The same type of information may appear in different formats across documents, systems, and markets. Transforming these variations into consistent data elements requires interpretation and validation, making large-scale standardisation difficult.

Structuring Data In Insurance BLOG DETAILS B2B 01 (1)

Unstructured data dominates underwriting and claims processes

The reliance on documents directly influences the structure of operational datasets across the insurance value chain. Many underwriting and claims decisions still depend on information extracted from reports, PDFs, and external submissions.

Industry studies illustrate the scale of this dependency. KPMG estimates that insurers process millions of pages of policy and claims documentation every year, much of which contains information essential to underwriting decisions or claims validation. At the same time, Deloitte research highlights the significant operational effort insurers devote to manually extracting, verifying, and reconciling information from documents.

Manual interpretation inevitably introduces variation in how data is captured and recorded. Different teams may extract the same information using slightly different conventions, terminology, or levels of detail. Metadata may be incomplete or inconsistent, and the origin of specific attributes may not always be clearly documented.

For analytical models, these inconsistencies create serious limitations. Machine learning systems rely on stable variables and consistent definitions to produce reliable outputs. When datasets contain ambiguous fields or inconsistent structures, model performance becomes difficult to evaluate and explain. In regulated industries such as insurance, where decision transparency is essential, this uncertainty can quickly become a governance concern.

Legacy core systems fragment data across the organisation

As digital distribution channels and analytics platforms were added, organisations often introduced integration layers to connect new capabilities with existing systems rather than replacing them entirely. Over time, this approach produced complex architectures in which similar data elements exist across multiple platforms with subtle differences in meaning or format.

Even when insurers implement enterprise data platforms or data lakes, the quality of analytical outputs continues to depend on the consistency of the underlying source data. Reconciling definitions and formats across legacy systems, therefore becomes one of the most demanding aspects of modernisation efforts.

Organisational silos make consistent data definitions difficult

Underwriting, claims, actuarial, finance, and risk management teams often develop their own data practices over time, tailored to their operational needs. Each department may maintain independent datasets, reporting models, or spreadsheets that reflect the specific metrics used in daily decision-making.

While these practices are often effective within individual departments, they can create inconsistencies when organisations attempt to establish enterprise-wide data structures. The same concept may be defined differently across functions, or key attributes may be captured at different levels of detail.

Aligning these definitions requires collaboration across business units and agreement on common terminology. In practice, this organisational alignment can be more challenging than the technical implementation of new data systems.

Structuring Data In Insurance BLOG DETAILS B2B 02 (1)

What insurers must change to build structured data foundations

Establish shared data models across underwriting and claims

Define consistent models for elements such as risk objects, coverage structures, policy attributes, and claim events.

Developing these models requires collaboration between underwriting experts, actuaries, compliance specialists, and data architects. Without this alignment, technical systems will continue to capture information in inconsistent ways that limit analytical capabilities.

Experience across the industry shows how difficult this can be at scale. Initiatives such as Lloyd’s Blueprint Two attempted to establish a standardised data ecosystem across the market, but faced prolonged delays and were ultimately scaled back. The outcome reflects the complexity of aligning multiple organisations, processes, and regulatory requirements around shared data structures.

For individual insurers, this highlights the importance of building on established standards rather than defining data models in isolation. Frameworks such as ACORD provide widely adopted data models and messaging standards that support interoperability between insurers, brokers, and partners. Aligning internal data structures with these standards can reduce fragmentation, improve consistency, and enable more efficient data exchange across the value chain.

At the same time, insurers must define how these standards apply to their own products, processes, and systems. Industry frameworks provide a foundation, but internal alignment remains essential for creating data models that are both consistent and operationally usable.

Introduce document intelligence to convert documents into structured data

Advances in document intelligence technologies now allow insurers to extract key entities from policy wording, underwriting submissions, and claims documentation with increasing accuracy. When combined with validation workflows and audit trails, these systems enable organisations to gradually transform large volumes of document-based information into structured datasets.

Rather than attempting large-scale manual data migration, document intelligence creates a practical pathway for building structured data foundations while maintaining operational continuity.

Implement governance and clear ownership of operational data

Structured data environments require strong governance frameworks. Data ownership, quality monitoring, and change management processes must be clearly defined to ensure that datasets remain reliable over time.

Regulatory expectations are also evolving in this area. As insurers begin to use advanced analytics and AI in underwriting and claims decisions, regulators increasingly expect organisations to demonstrate transparency and explainability in automated processes. That way, companies can ensure that data quality, traceability, and compliance requirements are embedded in the organisation’s data architecture.

Modernise data architecture incrementally rather than through large replacements

Large-scale system replacement programmes often involve significant operational risk and long implementation timelines.

A more practical approach is incremental modernisation. By introducing integration layers, shared data models, and document intelligence capabilities, organisations can gradually reduce data fragmentation while continuing to operate existing systems. Over time, this layered approach allows insurers to build more coherent data architectures without disrupting critical operational processes.

Structuring Data In Insurance BLOG DETAILS B2B 03

Structured data improves collaboration between insurers, brokers, and underwriters

In many markets, underwriting submissions still arrive as large collections of documents that require significant manual interpretation. Brokers may provide risk reports, financial statements, engineering surveys, and supporting documentation in different formats, leaving underwriting teams to extract and reconcile key information before a decision can be made. This process is time-consuming and can slow down response times in competitive markets where brokers expect rapid quotes.

Structured data environments can significantly streamline these interactions. When key risk attributes are consistently captured and validated, underwriters can evaluate submissions more quickly and with greater confidence. Brokers benefit from clearer submission requirements, faster responses, and more predictable underwriting processes.

Over time, data that is properly structured also enables more advanced capabilities in broker-insurer collaboration. Insurers can support digital submission channels, provide more transparent feedback on underwriting decisions, and share insights that help brokers structure risks more effectively. For underwriters, this reduces time spent on manual data interpretation and allows greater focus on risk assessment and pricing decisions.

AI adoption in insurance will depend on data architecture maturity

Artificial intelligence is rapidly becoming a central focus of innovation across the insurance sector. However, the effectiveness of AI systems ultimately depends on the quality and structure of the data that supports them.

Insurers that invest in coherent data models, document intelligence capabilities, and governance frameworks will be better positioned to translate AI experimentation into operational improvements. Those that focus primarily on analytical tools without addressing underlying data structures may continue to face limitations in scaling their initiatives.

In an industry built on risk evaluation and long-term commitments, the ability to structure and interpret data consistently is becoming a strategic capability. The organisations that succeed in building these foundations will define the next phase of digital transformation in insurance.