0% found this document useful (0 votes)
334 views36 pages

Data 4.0: Preparing Enterprise Data For AI Transformation

Explore EY’s insights on how businesses can transition to Data 4.0 and make their enterprise data AI-ready. This report highlights strategies for overcoming challenges, optimizing data management, and leveraging AI to drive innovation and decision-making.

Uploaded by

sambhavirout
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
334 views36 pages

Data 4.0: Preparing Enterprise Data For AI Transformation

Explore EY’s insights on how businesses can transition to Data 4.0 and make their enterprise data AI-ready. This report highlights strategies for overcoming challenges, optimizing data management, and leveraging AI to drive innovation and decision-making.

Uploaded by

sambhavirout
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Data 4.

0: making
your data AI-ready
September 2024
Table of
contents
Chapter 1:
Why do businesses need AI-ready data? 07

Chapter 2:
The evolution towards Data 4.0 13

Chapter 3:
Achieving AI-ready data 21

Chapter 4:
Data readiness and
governance for Indian enterprises 27

Chapter 5:
Preparing data for an agentic future 31

Data 4.0: making your data AI-ready 3


Foreword
Artificial Intelligence (AI) and Generative AI (GenAI) are transforming how businesses
operate, innovate and compete. However, one crucial factor underpins their success: data
quality. Just as advanced algorithms and powerful computers are essential, so too is the
quality of the data they rely on. Today, high-quality data forms the backbone of modern
business operations.

Effectively leveraging this data, however, presents its own set of challenges. The speed
and accuracy of AI outputs depend on accessing current, relevant datasets, while poor-
quality data can lead to flawed outcomes. This growing recognition of data’s importance
has driven a significant evolution in data management, moving from simple digital records
to sophisticated, cloud-native data stacks. These systems, provide flexibility, scalability
and integration capabilities, which are key to unlocking AI’s full potential.

Data 4.0 marks a major leap forward by treating data as a central, strategic asset
essential for digital transformation. It adopts cloud-native, metadata-driven approaches
and harnesses intelligent automation to deliver operational insights at scale.

This report provides timely, practical guidelines grounded in real-world experiences and
a forward-looking view of how organizations should assess their data landscape. It traces
the evolution of data management and introduces a seven-pillar framework for AI-ready
data, addressing both technical and strategic aspects, while aligning data initiatives with
business goals.

As we progress towards an ‘agentic future’ where AI systems grow increasingly


autonomous, the data foundations we build today will shape our success. The era of GenAI
is not a distant prospect — it is unfolding now. Organizations that prioritize data readiness
today will lead tomorrow, harnessing AI’s potential to drive innovation, efficiency and
growth.

Alexy Thomas
Partner, Technology Consulting
EY India

4 Data 4.0: making your data AI-ready


Foreword
“Your research is only as good as your data”
“Your analytics are only as good as your data”
“Your conclusions are only as good as your data”

These and similar expressions have been bantered about for many years. We hear
expressions like these used, but are firms actually adopting what they are saying? Well, in
this new world of AI/GenAI, they better be!

Data and the processes and disciplines in managing data are more important today than
ever before. Truly managing data as a strategic asset is no longer a ‘nice to have’, but
mandatory to the success of any organization. And especially today, as organizations
strive to embrace AI and GenAI to enhance their businesses, understand their markets,
improve customer service and increase efficiency, recognizing how critical data is to these
new technologies is paramount. “Your AI is only as good as your data” is the term we
must all now embrace!

In the following paper, Alexy Thomas, Partner at EY, does an outstanding job of laying
out the argument for data management adoption. What is so well done in this paper is
not that you’ll find anything particularly new, but that you’ll find how Alexy leverages
the various known capabilities of data management and aligns them in a clear and
constructive way, making the case for data management as fundamental to your AI
program and success.

Alexy presents the three critical areas of data readiness - data integrity; data management
and database performance and ties them to the key considerations of AI-ready data
– incorporating data governance; enhancing data discoverability and modernizing the
application stack. It is this construction of concepts which he refers to as “Data 4.0”. He
expands on these concepts by emphasizing the importance of leveraging today’s cloud
technologies; building a strong foundation of metadata management and data cataloging
and providing transparency and improved access to data while protecting and ensuring
data is being used effectively and ethically.

And why are these so important? Because firms face significant challenges to the success
of their AI programs without embracing disciplined data management. The EDM Council,
approaching our 20th year anniversary as the global data management association,
has been a champion of raising the awareness of foundational data management best
practices through the promotion of the DCAM (Data Management Capability Assessment
Model). As you read about the 7 pillars of an AI-Ready data framework so well-articulated
in this paper and you’ll see how data management best practices will be the stepping
stone to a successful AI program.

If you believe that ‘your AI is only as good as your data’, then review and embrace the
concepts provided Alexy Thomas in this paper. As a data management practitioner for
decades, I can say with confidence that Alexy has successfully made his argument for the
importance of data management for AI!

John Bottega
President
EDM Council

Data 4.0: making your data AI-ready 5


01
Why do businesses
need AI-ready data?

6 Data 4.0: making your data AI-ready


Why do businesses need AI-ready data?

In the past two years, the world has witnessed an unstructured data, including video monitoring and company
unprecedented surge in AI capabilities, ushering in a new records as the neural networks are designed to model
era of technology. The rise of Generative AI (GenAI) has complex, non-linear relationships using a graph data structure.
accelerated AI’s evolution into a truly general-purpose This data that is typically stored in file systems, DAM systems,
technology, democratizing access and spurring creative CMSs and version control systems can provide highly valuable
experimentation across all sectors. Enterprises worldwide are AI insights grounded in customer information.
swiftly moving beyond point solutions, integrating horizontal
In sectors like financial services, banks possess a treasure
AI applications across internal functions to achieve sustained
trove of customer data due to long relationships and frequent
competitive advantage. As AI continues to reshape the
interactions across various channels. This presents a golden
global technological landscape, India is emerging as a key
opportunity to gain deeper insights into their customers and
player. Poised to become one of the largest markets for AI,
organizations are yet to tap into it fully.
the country is transforming into a massive playground for AI
applications that drives enterprise growth and productivity. Insufficient access to unstructured data can result in
incomplete and inadequate view, potentially impeding AI
Investments in consumer-facing technology, next-generation
development and adoption.
supply chains, and intelligent automation platforms have
the potential to lead with an AI-first approach, leapfrogging
There are three critical areas for data readiness. They are:
legacy paradigms. The EY CEO Outlook Pulse Survey 2024
reports that 99% of CEOs are planning to invest in GenAI. • Data integrity
GenAI implementation could give a significant boost to India’s
• Ensure high-quality, consistent data across all sources
GDP. According to an EY study, this could be US$359 billion
to US$438 billion in fiscal year 2029-30 over and above the • Implement rigorous data cleansing and validation
baseline estimates, representing a growth rate of 5.9% to 7.2% processes
that year.
• Regularly audit data for accuracy and completeness
AI advancement relies on sophisticated algorithms, computing
• Data management
power, and vast amounts of high-quality enterprise data. This
data is crucial for decision-making, customer interactions • Develop a comprehensive data governance strategy
and operations across industries. The AI value chain benefits
• Key components to consider include
from enhanced data and analytics capabilities, improving
everything from personalized recommendations to automated a) Data lifecycle management
screening.
b) Tagging and classification systems
Data quality and quantity are critical for AI success, including
c) Company data dictionaries
GenAI, machine learning and analytics. Poor-quality data
can lead to inaccurate results, while implementation speed d) Reference and master data management
depends on current, relevant datasets. Organizations that
• Database performance
build an open and trusted data foundation will best leverage
their data assets. • Optimize database structures for AI workloads

• Ensure scalability to handle increased data processing


demands
Integrating structured and • Implement efficient data retrieval mechanisms
unstructured data
Structured datasets have historically provided a foundation GenAI relies on a strong foundation of data maturity,
for traditional AI development, as they are typically which involves an organization excelling in both integrating
concerned with numerical or categorical prediction, pattern data—through processes like moving and transforming it—
recognition or automated decision-making. However, future and managing its governance. Without data maturity, the
AI applications, especially GenAI require the diversity and prototyping, deployment and effective testing of GenAI—or
depth of information that comes from semi-structured and any type of analytics—become very challenging.

Data 4.0: making your data AI-ready 7


and the business owners. The consequences of poor data
integration in this scenario could be severe, potentially
Leveraging unstructured data for resulting in financial losses for both the bank and the business
AI success owners, damaged customer relationships or even regulatory
compliance issues.
Indian organizations are still in the journey of maturing their By implementing a comprehensive strategy to leverage
data management. To harness the full potential of GenAI unstructured data, businesses can significantly enhance
applications, businesses must effectively adopt a strategic their AI capabilities, leading to more accurate, personalized
approach to balancing technology, people and processes. and valuable AI-driven services across various industries
Organizations can tap into their unstructured data effectively and applications. For instance, a global bank is leveraging
by aligning data strategy with AI objectives that include information from its social media handles to identify
reassessing strategic objectives for data in light of GenAI potential customers and assess credit risk. By analyzing this
advancements. unstructured data, the bank has expanded its ability to offer
Imagine owners of a small business applying for a loan credit to a wider range of customers while simultaneously
through a bank’s online platform, where they interact with an minimizing default risk.
AI-powered financial advisor for personalized guidance. To
be effective, the AI needs access to a variety of data sources:
structured data such as the applicants’ financial history, tax
returns, credit scores, existing loans and account balances; Data requirement and sources
as well as unstructured data including business performance,
market trends and economic forecasts; and internal bank While businesses and governments think about data access, it
data with up-to-date lending regulations and compliance is important to keep in view that data requirements and issues
requirements. pertaining to data access will vary by industry. For instance, in
the financial services sector, AI applications focus on customer
The AI system must analyze both structured and unstructured
experience, product design and risk management, requiring
data to offer accurate advice and navigate the loan application
diverse datasets like customer interactions, market data and
process effectively, all while adhering to financial regulations
financial records while healthcare relies on patient records,
and data protection laws. If the AI system cannot properly
medical images and regulatory data for clinical services,
access or interpret these data sources, there is the risk of
outreach and compliance. Retail and agriculture emphasize
providing incorrect loan eligibility information, unsuitable
product design, procurement and production planning,
product recommendations, or overlooking critical financial
utilizing data on market research, soil conditions and logistics.
factors, which could lead to poor decision-making and
potential financial or regulatory issues for both the bank

8 Data 4.0: making your data AI-ready


The following table summarizes a few examples of AI use cases and their corresponding data requirements.

Industry AI Use Cases Data Requirements


Product and service design and innovation Includes customer, market and competitor data
Improving customer experience through Virtual Includes summaries of past customer interactions, key
Assistant enabled conversations concerns and FAQs
Includes past documents, SOPs and product
Document creation for underwriting
characteristics
Financial Services
Includes e-mail data, product documents, onboarding
Marketing and sales guides with text, audio and video, sales data, customer
data, customer profiles and credit history
Includes user data, cashflow/bank statements, KYC data,
Collections, recovery and attrition control payment schedules, regulatory requirements and risk
management data
Includes patient healthcare data and records, disease
Clinical services and operations report databases, medical images, patient test reports and
research data
Includes data pertaining to key health concerns for AI-
Healthcare based content creation and personalized engagement,
Community outreach
and virus and disease data tracking for early detection of
breakouts
Includes regulatory data, digital forensics and compliance
Audit and compliance
requirements
Includes product and packaging data, structures, material
Product, design and research
and ingredients data and design blueprints
Procurement, manufacturing and quality Includes contracts data and documents, logistics and
assurance inventory data and quality control information
Includes market research and product category data,
Retail
product descriptions and insights
Sales and marketing
Includes user-level data to create buying
recommendations, warranty, refunds and repairs data
Includes store level customer and inventory insights,
Store operations and staff management
product descriptions
Includes soil mapping data, food supplies and prices data,
Production planning subsidies data, provenance of crops, organic certification,
and other data points collected at Mandis
Includes groundwater availability, data about pipeline
infrastructure and efficiency, data about power subsidies,
Irrigation management
historic rainfall information, soil moisture data, river flow
Agriculture data, flood data and soil moisture
Includes data pertaining to crop insurance schemes,
climatic forecasts, weather forecasts/pattern, crop storage
infrastructure data, and data pertaining to costs and
Crop protection and management
availability of pesticides and insecticides, historic disaster
damage data, drone data and other data relevant for
underwriting
Includes data of past coding models, automated response
Application development and support
management, data engineering models and UI/UX designs
Tech Services Business process management Includes processes data, customer experience data
Infrastructure and operations Includes contract data, past incident and response data
Marketing Includes marketing data, content data and customer data

Data 4.0: making your data AI-ready 9


Industry AI Use Cases Data Requirements
Includes datasets on public consumption, expenditure,
Efficient policy drafting and data-driven decision data on key governance indicators like education and
making healthcare etc., government statistics and data reports
through M&E systems
Government
Includes personalized AI content, engagement on draft
Services
Enhancement citizen engagement policies, government policies and schemes and grievance
redressal
Includes KPI tracking data, public expenditure and impact
Automated report generation
data
Includes audio data for subtitles and captioning, rights
Schedule and distribute content management and fact checking for fake news, platform
use data
Media Includes user analytics/demographics, viewership tracking
User engagement and monetization
and user preference profiling/behaviors
Includes integrated databases to flag inaccurate news,
Prevent the spread of misinformation
social media streams and news streams

Accessing the benefits of data


Optimizing data for AI unlocks an organization’s growth potential. A consolidated, secure and compliant data ecosystem enables
gaining insights, hidden trend discovery and accurate predictions. AI models also can deliver faster, reliable and accurate outputs.

This data-driven approach empowers seamless team collaboration and strategic focus. Analysts can dive into higher-level
analytics, while data scientists freely explore frontiers, catalyzing transformative breakthroughs.

Data Strategy with AI Enablers What can AI do for me?


Core data capabilities are not Evaluation of where Generative Al
enabled yet (e.g., MDM, Metadata and Al can support multiple facets
Management and Data Quality). of the organization (e.g., Managed
Evaluate use of Generative Al to Services, Supply Chain
support these offerings Optimization)
Data/
Content Code
creation generation

Gen AI

Data Lens AI Lens


Agent
orchestration/
Execution

Data for Complex Technology Programs Pinpoint Use Case Enablement


Large transformational initiatives Pinpoint Use Case Enablement
are underway that should evaluate Interest in specific Generative Al
leveraging components of Al to capability and how it can be
streamline processes (e.g., EPM, enabled and rolled out within the
ERP Upgrade/Merge, CRM organization (e.g., Al Copilot,
capabilities Informatica CIAIre)

10 Data 4.0: making your data AI-ready


Servicing customers better by leveraging AI algorithms to
analyze customer behavior and preferences, businesses can
How organizations can leverage tailor their offerings, create personalized communications,
AI and its interconnectivity to and identify high-potential leads, ultimately driving sales and
customer loyalty.
core data capabilities Operational efficiency is another key advantage of investing
Critically, this foundation boosts revenue through enhanced in data and AI. These technologies enable scalable operations,
customer engagement, cross-selling, upselling and digital allowing businesses to handle increased volume without
marketing optimization. For instance, if a customer is buying proportional cost increases. AI-driven process optimization
a camera through an ecommerce platform, the platform can and resource allocation can significantly reduce inefficiencies,
analyze photography-related hashtags and discussions to leading to improved productivity and reduced operational
recommend trending accessories or techniques for cross- costs. For instance, manufacturing plants can optimize
selling. If many camera customers ask about low-light production schedules, while retail chains can better manage
photography, it can prioritize recommendations for low-light inventory distribution based on AI-predicted demand.
optimized lenses or accessories. Similarly, by mining product By strategically implementing data-driven AI initiatives,
reviews, analyzing customer support interactions, leveraging organizations can realize these multifaceted benefits,
user behavior data for personalization, applying image and leading to improved financial performance, enhanced
video analysis for context-aware suggestions, the platform can competitiveness, and long-term sustainability in an
also present an upsell opportunity. increasingly data-centric business environment. The key lies in
By harnessing data’s full potential, the organization positions aligning AI strategies with business objectives and effectively
itself for sustained growth and market leadership, elevating leveraging both structured and unstructured data to drive
competitive edge in today’s data-driven landscape. informed decision-making and innovation across all aspects of
the organization.
Newer revenue models allow users of data-driven AI
to make faster decisions and swiftly adapt to changing
market dynamics. By leveraging the right data in AI models,
organizations can identify potential revenue streams and
extract value from the marketplace more effectively. While Key considerations for AI-ready data are:
AI-driven business models serve as powerful growth drivers,
it is crucial to understand both what AI can do and how to 1. Incorporating governance into data architecture
leverage it for maximum benefit. The financial advantages of
2. Enhancing data discoverability
data-driven AI extend beyond direct revenue generation and
cost savings. Organizations can create new business models in 3. Modernizing application stacks to support
adjacent industries, access previously untapped markets and innovative AI applications and enable agile, data-
achieve significant cost optimizations. These optimizations driven processes
come through productivity enhancements, infrastructure
savings and reductions in operating expenses.

Data monetization represents a powerful opportunity for


organizations to create tangible economic value from their
data assets and AI capabilities. This can involve developing
new data products, enhancing internal business performance,
gaining competitive advantages and addressing industry-wide
challenges. A company can also potentially sell or license its
datasets or AI models (trained on its proprietary data) to other
businesses or third parties. By leveraging unique data insights,
companies can differentiate themselves in the market and
even expand into adjacent industries or new markets.

Data 4.0: making your data AI-ready 11


02
The evolution
towards Data 4.0

12 Data 4.0: making your data AI-ready


The evolution towards Data 4.0
The ‘big data’ phenomenon required the development of
novel tools and methodologies to effectively process and
Data - a brief history analyze these vast data sets. Organizations made significant
investments in sophisticated programs to harness and explore
The origin of data-driven problem-solving can be traced
these extensive data lakes. This era brought both tremendous
back to the 17th century when John Graunt, a London
opportunities and challenges, from advanced analytics and
haberdasher, pioneered mortality data collection. This
artificial intelligence to data quality, monetization, privacy and
nascent field evolved significantly through the centuries, with
security.
people like Florence Nightingale leveraging data visualization
to revolutionize medical practices, and Edgar Codd’s Despite the advancements, much of the data work within
groundbreaking conceptualization of relational database organizations remained focused on foundational tasks. These
management systems. included adding new fields to databases, aligning disparate
systems, defining metadata, implementing basic governance,
Later, digitization and enterprise data available in digital
deploying business intelligence systems and preparing data for
forms have led to data being used for decision-making,
machine learning algorithms. Companies collected more data
from descriptive to prescriptive to predictive and then to
than ever before in their quest to transform their operations
generative. Evolution of data usage and technology can be
and make data-driven decisions, yet a considerable portion
traced to four revolutionary leaps analogous to the industrial
of the work was still centered on maintaining essential data
revolution.
infrastructure and processes.

Data 1.0: The dawn of digital Data 4.0: Data-first architectures:


The 1980s heralded the era of personal computing,
democratizing access to data analysis tools. This period saw
powering intelligent applications
data primarily confined to specific business applications, Data 4.0 represents a significant evolution in how
largely in non-digital formats. Organizations started creating organizations approach data management and utilization. It
data bases and data marts to generate reports. is a cloud-native, metadata-driven paradigm that leverages
intelligent automation and trusted insights at an operational
scale. Unlike previous data strategies, Data 4.0 treats data
as a central, strategic asset, essential for driving digital
Data 2.0: Rise of enterprise data transformation and enabling organizations to remain
competitive in a rapidly changing technological landscape.
As internal processes became digitalized and organizations
Moving beyond traditional storage, processing and analysis
built enterprise-wide systems, the need for an enterprise-
to offer a more integrated and intelligent approach, Data
wide view of information arose. Enterprise data warehouses
4.0 leverages cloud infrastructure for scalable and flexible
emerged, with analytical applications and reporting tools
data solutions, addressing the challenges of increasing
like Teradata, Vertica, and Greenplum powering advanced
data volume and complexity. Modern data stacks are
reports and visualizations primarily for regulatory and finance
characterized by cloud-native data lakehouses built on open
reporting.
architectures, utilizing open formats, standards and open-
source technologies. Data catalogs have become as crucial as
the data stores themselves and open formats enable various
engines, such as SQL engines, search engines, analytical
Data 3.0: The big data revolution workloads and AI-powered conversational storytellers - to
efficiently access and operate on the data. This approach
This marked an era of unprecedented growth in data volume,
allows for greater flexibility and specialization in data
variety and velocity. This surge was driven by the proliferation
processing and analysis, as different tools can be used
of smartphones, sensors, connected vehicles and other digital
for specific purposes while all accessing a common, well-
devices, which now auto-generate vast amounts of data.
organized data resource.
However, the true catalyst for this explosion in data was the
increasing utility of data analytics and the automation of
decisions based on these insights.

Data 4.0: making your data AI-ready 13


This version also introduces concepts such as data as a • Data as a product:
product, responsible data science and explainable AI, which
• Valuable asset: Like software, data is now considered
are essential for building trust and ensuring ethical use of
a valuable asset that must be developed, tested and
data. Data 4.0 is characterized by the following key pillars:
delivered with care.
1. Explainable AI: Ensuring transparency in AI operations to
• Dedicated teams: Organizations are creating dedicated
build trust and facilitate widespread adoption.
teams for the development and delivery of data products,
2. Responsible data science: Prioritizing ethical which are designed to be easily discoverable and
considerations in data handling and algorithmic consumable through self-serve platforms.
deployment.
• Quality and consistency: Standardized data models and
3. Edge computing: Decentralizing data analytics to optimize rigorous testing ensure that data products are reliable
speed and efficiency. and interoperable across various systems.
4. Data democratization: Broadening data accessibility and
• Self-serve data architecture:
comprehension to empower decision-makers at all levels.
• Data catalogs: Centralized repositories provide metadata
about data assets, making it easier for teams to discover,
understand and utilize the data they need.

The impact of Data 4.0 • Open and headless architecture: Organizations are
adopting open and headless architectures that support a
Data 4.0 is transforming how organizations operate on their wide range of applications, including analytics, internal
data. This shift is enabling organizations to power every tools and APIs.
aspect of their operations, enhance customer and employee
experiences and drive innovation through advanced analytics • Interacting with partners through data sharing:
and AI. The following are key ways in which Data 4.0 is • Real-time insights: Organizations are leveraging real-
changing organizations: time data to gain insights into leading indicators and key
performance indicators (KPIs).
• Data at the center:

• Data is becoming the backbone of all technological • Collaborative data sharing: Data sharing with strategic
processes, driving efficiency and innovation. partners is becoming more prevalent, enabling more
informed and agile decision-making. Architectures for
• Improved data management leads to better customer sharing data like data clean rooms enable collaborative
and employee experiences by enabling more personalized data sharing by providing controlled access, secure data
and responsive interactions. sharing, supporting data governance, preserving privacy
by techniques like differential privacy, k-anonymity and
data perturbation.

14 Data 4.0: making your data AI-ready


Data technology evolution

Data 1.0: Data 2.0: Data 3.0: Data 4.0:


The Foundation Enterprise Scale Big Data Revolution Modern Data Stack

Era: Early database systems Era: Enterprise data Era: Hadoop ecosystem Era: Cloud-native, intelligent
warehouses systems
Key Features: Key Feature:
Key Features: Key Features:
• Creation of data marts Data lakes
• Analytical Appliances and • Data catalogs
Technologies: MPP Databases Technologies:
• Open format storage
• Oracle databases • Hadoop Distributed File
Technologies:
System (HDFS) • Open and headless
• SQL databases • Teradata architectures
• MapReduce
• Vertica • Data Products
• Hive, Pig, etc.
• Greenplum • Automated Governance

• Netezza • AI Powered

Technologies:
• Cloud Data Platforms

• Snowflake, Databricks

• Starburst, Trino

• Iceberg, Delta, Hudi

• Spark, Airflow, DBT

Data 4.0: making your data AI-ready 15


ensure ethical, legal and responsible use of data in addition to
What goes into making AI-ready articulating the organization’s approach to risk and response
in policy and practice.
data Along with this, organizations must approach data readiness
Broadly speaking, AI-ready data can be defined as something initiatives by adhering to privacy regulations like the Digital
that is readily available and accessible on a unified platform in Personal Data Protection Act (DPDP Act), 2023.
user-based formats. However, the technology itself is evolving Compliance should be woven into the fabric of data
and so are the standards that define AI readiness, which will management, ensuring data is collected lawfully and
have a direct impact on value-cost equations for enterprises. transparently, processed with clear purpose, stored securely
Notwithstanding the changes and interpretations, any AI- and accessed only by authorized parties. By prioritizing data
ready data has certain core features: minimization, accuracy and storage limitation, companies can
Comprehensive metadata provides the correct context so build trust with customers while maximizing the value of their
that users get the necessary information that is of high quality data assets in a responsible and compliant manner.
at the right time. Active metadata ensures that all sections of
the data ecosystem are always available, accurate, intelligent
AI-ready data means:
and oriented towards execution. Comprehensiveness and
recency of metadata allow AI systems to deliver business
• Data is a product
value.
• Data seamless integration
Lineage information and provenance, which shows the data
• Shared asset across the organization and
flow path - from origin to all levels of transformation - gives
subsidiaries
visibility into how the metadata was built and allows tracking
of all changes to ensure data quality. As organizations have • Users require adequate access to data
data on various systems and in different formats, data • Security is essential
lineage is necessary to be able to define data strategies, • Data should be curated
improve quality and ensure effective use of information.
• Data is compliant
Comprehensive, accurate and traceable data is essential to
derive value from AI solutions. Data provenance will provide • Data flow should be optimized for agility
a documented trail that accounts for the origin of a piece of
data and where it has moved from to where it is now.

The fit-for-purpose element of AI-ready data makes high-


quality data relevant. Organizations house large quantities of
information. Structuring it in a way that sharpens applicability
Current challenges and gaps
by qualifying the use makes access for users easier, faster In making their data AI-ready – which is available, accessible
and more pertinent. There are clear distinctions between and trustworthy - organizations usually face many challenges.
data requirements of different teams and data readiness must According to one metric1 about 87% of data science projects
reflect that for the AI solutions to produce meaningful output. do not reach production because of siloed and ungoverned
Organizations are responsible for securing their data data as well as underdeveloped data infrastructure. There are
to prevent misuse. Secure data makes AI applications both technological and organizational elements to it.
trustworthy, allowing organizations to use as well as offer
secure access to authorized stakeholders. Data exchanges Technological challenge:
are complex and can involve internal and external providers,
Data silos: Many organizations struggle with data silos, which
which further underlines the importance of security protocols
makes their enterprise data inaccessible and incompatible
to prevent lapses that can lead to significant risks.
with AI solutions. The Nasscom-EY AI Adoption Index 2.0
A robust data governance framework and relevant data report highlights this issue in Indian enterprises, revealing
management policies can enhance an organization’s that 32% of organizations do not have data back-end ready for
data maturity. A high level of data governance improves AI. Of the other 68%, majority focuses on data accessibility,
transparency and auditability. More importantly, it is a way to reflective of the still existing BU-enterprise data silos. Data
silos often result from decentralized teams, acquisitions, rapid
business growth and separate IT deployments.

1. https://venturebeat.com/ai/why-do-87-of-data-science-projects-never-make-it-into-production/

16 Data 4.0: making your data AI-ready


These silos create several issues, such as: Organizations possess a myriad of data types, ranging
from customer interactions and transactional records to
• Limited cross-organizational data access
operational metrics and employee information. Building
• Dataset incompatibility AI-ready data - defined broadly in terms of comprehensive
metadata, lineage information, fit-for-purpose, security and
• Insufficient AI-ready data
governance – presents challenges. Addressing these is crucial
• Inefficiency from data duplication for leveraging AI’s full potential.

• Heightened security and compliance risks

To effectively implement AI, it is crucial to break down these What happens if your organization’s data is not AI-ready
silos, integrate data architecture, and facilitate information
sharing across the enterprise. This integration is key to
creating comprehensive datasets that drive successful AI AI initiatives often fail to deliver
solutions. expected ROI owing to model
Ineffective AI
inaccuracies and an inability to scale
Bad data quality: Poor data quality, characterized by a lack implementation
solutions across the organization,
of accuracy and integrity, significantly affects AI outputs resulting in significant resource waste
and business operations. It can cause transaction processing
errors and faulty analytics results. High-quality data,
possessing validity, consistency, timeliness, completeness
and accuracy, is crucial for effective decision-making and
Poor data integrity, inefficient
AI implementation. Maintaining data quality reduces costs governance and inadequate security
associated with fixing data issues, prevents operational errors, Data quality and measures impair AI performance and
and avoids business process breakdowns that could increase governance pitfalls expose organizations to regulatory
expenses and decrease revenues. penalties and operational cost
increases

49% 39% Data quality and


Inadequate or outdated infrastructure
obstructs AI deployment and
scalability, leading to performance
of decision-makers don’t of decision-makers cite
governance pitfalls bottlenecks, heightened risks and
trust the quality of their consumer concerns about
elevated costs stemming from non-
companies’ AI data.2 privacy and the use of compliance and security vulnerabilities
data.

42% 37% Change


Organizational resistance and skill
deficits hinder AI adoption and
of decision-makers cite a cite a lack of data integration, resulting in extended
management and
lack of quality, unbiased interoperability as a deployment periods, suboptimal
skill gaps
data as the greatest digital barrier to AI adoption.5 utilization, and a failure to achieve
barrier to AI adoption.4 strategic transformation

Sources: 1 - EY CEO Outlook Pulse - October 2023, 2,3,4,5 - EY MIT


Technology Review Insights

Legacy data architecture: Many organizations have


traditional data architectures that follow centralized control.
While that has its advantages, it can also lead to bottlenecks,
slowing accessibility for users. Delays could make it difficult to
scale data usage and innovate in data products and services
as well as inhibit reliability.

Data 4.0: making your data AI-ready 17


Organizational challenges
Lack of strategy: Lack of strategy affects every stage of
the data value chain, from collection and management to
distribution and usage. Many organizations struggle with
data literacy, hindering their ability to become data driven.
It is crucial to integrate structured and unstructured data
for better decision-making. Without a comprehensive
data strategy, preparing data for AI becomes a significant
challenge.

Talent shortage: According to Nasscom, India’s data


annotation and labeling services market is expected to be
worth US$7 billion by 2030 and engage a potential workforce
of one million. Over the past years, there has been a
significant increase in the requirement for professionals with
expertise in data science resulting in a significant demand-
supply gap for jobs like ML engineers, data scientists, DevOps
engineers, and data architects ranging from approximately
60% to 73%.

Insufficient budget: Getting data AI-ready is expensive and


time-consuming process.

That said, the success of AI adoption lies in embracing the


evolving data trends while mitigating challenges.

18 Data 4.0: making your data AI-ready


Data 4.0: making your data AI-ready 19
03
Achieving
AI-ready data

20 Data 4.0: making your data AI-ready


Achieving AI-ready data

Establishing a robust data foundation can lead to AI models Organizations must ensure compliance with regulations and
with reliable results. Data preparation, storage, management data privacy laws while improving trust and securing data.
and accessibility across hybrid cloud environments are This involves creating a robust framework that aligns with
crucial for driving innovation, creating new revenue models business goals and bridges data capabilities with AI objectives.
and enhancing productivity. A domain-driven approach
helps in effectively managing data and AI initiatives across The seven-pillar AI-ready data framework
the organization. This involves categorizing data and AI
1
capabilities into logical domains that align with business An adaptive and effective AI data strategy
functions. AI-ready
harnesses data value aligns with business
data
The most important thing is not just collecting the data, but strategy goals, and bridges data capabilities with
cleaning and categorizing it to ensure that it is in a usable AI objectives. Regular technology updates
format. Otherwise, the organization is just paying to store help in ensuring maximization of potential.
meaningless data.

2 Efficient knowledge management platforms


Begin your data journey by asking the right questions
are adaptable and capable of integration
Knowledge with LLMs. This might necessitate data
01 Is your data infrastructure truly ready for Al? management restructuring to help ensure compatibility
02 and effective usage within the LLM
How robust is your data platform?
knowledge base.
How easy is it to access your data to enable GenAl
03
use cases? 3
AI-ready data must be accessible, self-
04 Does the company's governance framework Data defining and convey its constraints.
sufficiently address key concerns and sustain trust? Enhancing metadata cataloguing, data
governance
What are the potential data risks associated with Al, product handling and automating data access
05
and how is the company managing these risks? monitoring and provisioning are vital.

4 AI-ready environments offer a superior


AI-ready data solutions build a robust pathway between an context for transaction data. With master
organization’s data potential and its AI aspirations, setting a Master data data serving as the GenAI context, it is vital
management to establish a single, reliable source for
clear foundation for AI-enabled business transformation. A
clear framework for data sharing incentives is necessary to entities tied to all transaction data.
facilitate the flow of data from private repositories.
5 Data products with automated risk and
compliance controls are vital. For a quick
Data risk
and AI adoption, robust, automated data
compliance controls are needed for data sovereignty,
Data framework for lasting value data privacy and compliance to regulatory
requirements.
As a first step, implementing a frictionless data framework
is essential for long-term value. AI-ready data must be 6
Not all data in an organization is equally
accessible, self-defining, and convey its constraints. Enhancing significant. Critical data elements demand
Data
metadata cataloging, data product handling, and automating quality resource allocation and observability
data access monitoring and provisioning are vital. in appropriate data products to ensure
An effective AI-ready data strategy aligns with business goals, accuracy and user trust in AI-ready data.
integrating adaptable knowledge management platforms
with LLMs. It prioritizes accessible, self-defining data with 7 For organization-wide AI adoption, a
enhanced metadata and automated controls. Establishing
AI-ready flexible, quickly adaptable data architecture
reliable master data sources, prioritizing critical data elements
data is essential. Utilizing sandboxes for PoC
and maintaining flexible architecture are crucial. This architecture testing, tools like vector databases, and
approach ensures efficient AI adoption, balancing data value,
their ongoing management are vital.
risk management and architectural adaptability across the
organization.

Data 4.0: making your data AI-ready 21


1. AI-ready unified data strategy: Organizations need Traditional documentation often buries critical insights,
a unified data and AI strategy to effectively prototype, making it difficult for businesses to leverage their full
deploy and test AI solutions. A unified data strategy knowledge potential. To improve self-service and overall
offers a clear roadmap for managing and governing data efficiency, organizations must prioritize maintaining
across all necessary capability components. It aligns with a consistent and coherent knowledge management. It
business value objectives and is supported by a defined offers:
funding model, ensuring its implementation and ongoing • Data storage
sustainability.
• Streamlined communication
This strategy must align with business goals to enhance:
• Information retrieval
• Data quality • Knowledge sharing
• Data governance Efficient knowledge management platforms are now
• Analytics capabilities evolving to integrate with LLMs. So, organizations should
assess their current knowledge management practices
to identify areas where LLM integration can provide the
When data is trapped in silos or scattered across different most value, invest in training and tools to maximize the
systems and departments, AI’s ability to deliver valuable benefits of LLM-enhanced knowledge management,
insights is severely restricted. develop strategies for continuous improvement and
A unified data approach adaptation of their knowledge management systems.

• Enables AI to deliver more accurate insights This integration may require:

• Improves decision-making processes • Data restructuring to ensure compatibility

• Facilitates connecting the dots between disparate data • Optimizing data formats for effective usage within the
sources LLM knowledge base

• Allows more effective AI deployments: a unified • Developing new protocols for data management and
approach leads to more successful AI implementations retrieval

• Provides deeper insights: customers can harness data By embracing these advancements, businesses can
to gain more profound insights into their operations position themselves at the forefront of knowledge
and market management innovation, driving efficiency and
competitive advantage in their respective industries.
• Allows faster growth: streamlined data processes
contribute to accelerated business growth
3. Data governance
• Increases efficiency: unified data strategies optimize
resource utilization and operational efficiency To establish a robust data governance model,
organizations need to implement several key capabilities:

2. Knowledge management: A robust data infrastructure • Data protection: ability to block and hash sensitive
should center around a cloud-based repository—be it a data data before it reaches our central repository
warehouse, lake or lakehouse—serving as a single source of • Access control: automated user provisioning to
truth. manage data access rights efficiently
• Domain ownership: applying domain-driven design
principles in architecture and empowerment of cross-
functional teams to own and manage their respective
data domains
• Policy implementation: tools that facilitate the
implementation of data management policies and
standards by domain teams

22 Data 4.0: making your data AI-ready


While implementing MDM, an organization should analyze
the current data environment, the source of data,
Data Data whether they are in silos, etc. It should also define your
architecture development business goals, which could include:
management
• Up-sell and cross-sell opportunities

Data quality Database • Complete view of customers


management operations
• Improved data quality and business decisions
management
• Reduce costs of data maintenance and support

• Improved customer experience


Data governance
Metadata Data security This system should incorporate a tool capable of reliably
management management
and automatically ingesting data from various sources
at scale, featuring rapid, timely updates and the ability
to swiftly recover from failures. Additionally, it should
Document Reference and support collaborative, version-controlled modeling and
and content master data data transformation.
management management
DWH and BI
management 5. Data risk and compliance:

According to Gartner, 30% of GenAI projects are expected


to be abandoned by 2025 due to poor data quality,
inadequate risk controls, escalating costs or unclear
Effective data governance yields numerous benefits: business value. Creating a trusted data foundation is
essential for enabling high-quality, reliable, secure, and
• Enhanced data security
governed data and metadata management to meet the
• Ensured compliance with regulations and data privacy laws needs of analytics and AI applications while ensuring
• Improved data quality data privacy and regulatory compliance. Failing to embed
controls into data could leave organizations vulnerable to
• Prevention of inconsistent data silos
risk and attacks that could be expensive — or in the worst
• Increased trust in data cases, even existential.
• Better decision-making processes Using data without risk and compliance controls can lead
• Improved operational efficiency to:
• Regulatory noncompliance
A comprehensive data catalog is essential, acting as a • Financial and reputational damage
centralized repository for metadata about organizational data
• Erosion of customer trust
assets across different domains. This catalog enables domain
teams to easily discover, comprehend and utilize the data
relevant to their business functions. To facilitate seamless 6. Data quality
data exchange between various domain teams, a service mesh Effective data quality management is crucial to mitigating
should be implemented, ensuring efficient communication and risks. A well-designed data architecture strategy, such
data flow throughout the organization. as a data fabric, provides a robust framework for data
Implementing metadata management with proper metadata leaders to profile data, design and apply data quality
strategy, adopting metadata strategy and the right metadata rules, discover data quality violations, cleanse and
management tool and other such measures form part of the augment data. Robust data quality means ensuring data
strategy. integrity, consistency and reliability, leading to better
decision-making processes. This can be achieved through:
• Data observability: Continuous monitoring of data
4. Master data management: Master data management
quality levels through data observability capabilities
(MDM) plays a critical role in providing a single, reliable
allows organizations to identify data issues before they
source for entities tied to all transaction data. This is vital
escalate into larger problems.
for creating AI-ready environments that offer superior
context for transaction data.

Data 4.0: making your data AI-ready 23


• Data transparency: Transparency into data flows
enables data and AI leaders to identify potential issues,
ensuring that the right data is used for decision- Storage options
making.
Object storage: Ideal for large volumes of unstructured data,
By prioritizing data quality and governance, organizations as it is highly scalable and cost-effective.
can build trust in their AI systems, minimize risks and
SQL databases: For structured data with well-defined
maximize the value of their data. It is crucial to recognize
schemas.
that data quality is not just a technical issue, but a
critical business imperative that requires attention and NoSQL databases: For flexible schemas and semi-structured
investment. data.
Graph databases: Store and query relationships between
7. AI-ready data architecture that is open and headless entities.

An open and headless data architecture will make data Vector embeddings: Enable storage and retrieval of
access easy and pluggable to where you need it. With “embeddings” (high-dimensional representations of various
this architecture, a company can manage its data from media). Make data interpretable by foundation models.
a single logical location, including permissions, schema In the coming decades, what goes around with databases will
evolution, and table optimizations. And, to top it off, it continue to come around2. Thus, the current databases will be
makes regulatory compliance a lot simpler. insufficient but there will be new query languages and data
Adopting open table formats enhances interoperability models to overcome these problems. To accelerate the next
and flexibility in data storage and processing. A decoupled generation of database management systems organizations
architecture allows for greater flexibility and extensibility should stay ready.
in managing data and AI components.
Polyglot storage: A polyglot storage approach allows
for efficient handling of diverse data types, including
structured, semi-structured, and unstructured data.
Polyglot also supports multiple languages and readily
adapts to the use of SQL, NoSQL or hybrid database
systems. The most important factor to understand is the
data flow within the organization.

2. https://db.cs.cmu.edu/papers/2024/whatgoesaround-sigmodrec2024.pdf

24 Data 4.0: making your data AI-ready


entire data value chain can be highly beneficial. Utilizing AI
to analyze vast datasets, enabling organizations to make
Organizing data into data-driven decisions that optimize operational efficiency and
consumable products reduce costs.

GenAI can be applied throughout the data value chain,


Data is only valuable when understood in context. including:
Context allows us to see how different data sets relate
to each other, leading to deeper insights. Therefore, • Sourcing data
integrating data context is essential in any data • Modeling data into products
management strategy. Across the industry, various
strategies, such as data mesh and data fabrics, are • Creating data pipelines for consumption
emerging to improve data access. These strategies A more powerful use of GenAI is to leverage an organization’s
require advanced metadata management frameworks unique data and context from its operations. Organizations
that facilitate the seamless movement of data across typically generate vast amounts of text through contracts,
silos. By enhancing metadata management, data blogs, call transcripts, chat applications, project management
context can be elevated to a business level through tools, emails and internal documentation. A large language
data enrichment. model trained on this data can answer domain-specific
questions, summarize text, translate languages, adjust tone,
and extract issues, themes and sentiments. Essentially, a large
language model with access to an organization’s accumulated
AI powering data; data data can become the most knowledgeable “member” of the
organization.
powering AI
GenAI uses natural language processing to provide contextual
AI and data have a symbiotic relationship: AI search results. For example, an appropriate GenAI tool can
techniques enhance data quality and preparation, make ESG data more accessible and assist employees in
while high-quality data fuels more effective AI understanding sector-specific nuances, compliance insights
models. Leveraging AI, including GenAI, across the and operational efficiency.

Other key elements to consider

• Central role of data platform in tech architecture:


Positioning the data platform at the center of the
organization’s technology architecture is key to
driving data-powered innovation across all aspects
of the business.
• Robust and scalable data pipelines: Developing
scalable and flexible data pipelines ensures reliable
data flow and optimizes data sourcing for best
performance.
• Seamless data integration: Efficient integration of
data from various sources is crucial for creating a
comprehensive and accurate data foundation for
AI.
• Real-time data enablement: Enabling real-time
data processing and analysis is essential for
powering AI applications that require up-to-date
information.

Data 4.0: making your data AI-ready 25


04
Data readiness
and governance for
Indian enterprises

26 Data 4.0: making your data AI-ready


Data readiness and governance for Indian enterprises

As companies accelerate their data strategies to harness the Open source solutions are also becoming attractive with
power of GenAI, India is on the verge of becoming a data- Indian organizations including government-run bodies taking
driven economy. The EY-Nasscom AI Adoption Index 2.0 this route. Industry body Nasscom, Meta and Ministry of
report highlights how data standardization among Indian Electronics and IT (MeitY) are working to create an open
firms has improved significantly since 2022. Over 60% of source GenAI platform to promote solutions with socio-
organizations in sectors such as BFSI, manufacturing, CPG, economic impact. The BharatGPT platform is available across
transport, logistics, technology, media and entertainment channels in 14-plus Indian languages, in video, voice and text.
now have enterprise or business unit-level standardized data. Nandan Nilekani backed EkStep Foundation is constructing
However, gaps remain, as 32% of enterprises still lack an AI- datasets for Indian languages and will be open source.
ready data backend.

The report indicates that while technology modernization


was once a bottleneck, most Indian enterprises today use Partner ecosystem
modern cloud applications and standardized data systems
for AI applications. As the country moves toward a more AI- Indian companies are increasingly turning to data aggregators
driven future, Indian organizations are working to overcome and fintech firms to scale their data infrastructure for AI
persistent challenges related to data accessibility, quality, and applications based on their success in using AI to provide
security. solutions. The RBI has given in-principle approval to some
account aggregators including CAMS Finserv, CookieJar
Technologies and National E Governance Services Asset Data
to build a data-sharing solution that will improve access to
What Indian companies are doing financial data.

Indian organizations are increasingly adopting sophisticated


data strategies to address evolving business needs. Many
firms are transitioning from traditional data management Data governance
systems to more advanced cloud-based solutions.
One of the key regulatory frameworks introduced is the
Indian enterprises are also embracing hybrid approaches Data Empowerment and Protection Architecture (DEPA),
to data lakehouse architectures. Companies can use hybrid which provides a consent-based model for data sharing
models as an alternate to hyperscalers. Data lakes are and protection. Introduced in 2020, this framework laid
replacing or sometimes absorbing within them older style the foundation for the Digital Personal Data Protection
data warehouses. Data lakes can reside on premises, in the Act (DPDPA) 2023, which formalized India’s approach to
cloud, in hybrid (cloud plus premises) or even across multiple safeguarding digital personal data.
hyperscalers (AWS, Google Cloud, Azure) simultaneously.
The DPDPA 2023 recognizes the right to individual privacy
and mandates clear guidelines for data fiduciaries—entities
responsible for processing data. The Act outlines obligations
Indian startups and cost-effective for both data processors and data principals (data generators),

data solutions ensuring that personal data is handled with care. In addition,
the Act introduces financial penalties for non-compliance and
India’s startup ecosystem is playing a role in the data establishes the Data Protection Board of India to oversee data
landscape. Many Indian startups are focused on providing low- governance.
cost solutions. Raga AI (testing and safety), Neysa AI (AI cloud
and PaaS), Floworks (developed an AI sales representative
using agentic AI) are some examples. Others include KissanAI
(an agriculture copilot), Sarvam AI (GenAI building blocks),
Xylem AI (LLMOps platform) and others.

Data 4.0: making your data AI-ready 27


Snapshot of various government initiatives related to access to datasets in India

Pillar I: Pillar II: Pillar III: Pillar IV:


Institutional capacity Privacy Framework Facilitate access to non- Making government
and governance personal proprietary data data available

• Creation of a • Establishing laws Creation of data • Making public the


specialized agency and regulations marketplaces or vast amount of
for overseeing that ensure exchanges government/public
management of protection of data
• Setting up of
the vast amounts privacy and
interoperability • Open Government
of data (public and personal data
standards Data Platform for
non-personal) and
• Bringing in consent India (since 2012)
providing data • Incentivizing private
procurement
governance rules to sector to participate • Increasing the
mechanisms for
manage said data in data exchanges amount of public/
processing of
sets open data
personal data Actions at a sectoral
• National Data level e.g., Data • Draft data
• Digital Personal
Sharing and Empowerment accessibility and
Data Protection
Accessibility Policy and Protection use policy (2022)
Act (2023)
(NDSAP) (2012) Architecture (DEPA)
(2020)
• Establishment
of the India Data India Dataset Platform
Management Office (IDP) (2023)
(2023)
• Draft National
Data Governance
Framework Policy
(2022)

28 Data 4.0: making your data AI-ready


Major policies Challenges and the road ahead
The Ministry of Electronics and Information Technology
While India’s data governance frameworks are evolving,
(MeitY) launched the Draft National Data Governance
concerns remain around data privacy and sovereignty. Many
Framework Policy in 2022, which aims to provide a unified
small and medium-sized businesses (SMBs) in India grapple
approach to data governance across sectors. This policy
with understanding and implementing data governance
outlines the creation of an India Data Management Office
practices. Ensuring that businesses of all sizes can comply
(IDMO), responsible for managing non-personal data
with regulations like the DPDPA 2023 will be essential for the
governance and improving access to datasets.
long-term success of India’s AI ecosystem.
Several government initiatives are helping businesses and
Data governance initiatives, including the establishment
startups access critical data for AI applications. The IndiaAI
of IDMO and the rollout of the India Dataset Platform, are
Mission, launched in 2024, aims to foster AI development and
expected to address these challenges by providing clear
deployment across the country. With a budget of INR10,300
guidance and resources for businesses. These efforts will
crore, the mission seeks to establish an India Dataset
enable Indian companies to securely manage their data,
Platform, which will serve as a centralized hub for non-
improve AI readiness and capitalize on the potential of data-
personal datasets available to Indian startups and researchers.
driven technologies.
In addition, sector-specific initiatives such as the Account
For Indian organizations to become AI-ready, there is a
Aggregator Framework and the Bhashini Project are improving
need to prioritize modernizing their data infrastructure,
data access and sharing within industries. The Account
enhancing data accessibility and ensuring robust governance
Aggregator Framework allows for the secure sharing of
frameworks. Further, to cater to the needs of Indian
financial data between organizations, while Bhashini focuses
enterprises, data platform providers can innovate and provide
on making language data accessible for AI development in
cost effective solutions.
Indian languages.
With government support and a growing partner ecosystem,
the country is well-positioned to become a leader in AI-driven
innovation. Challenges related to data security, privacy
and standardization require concerted efforts from both
businesses and policymakers. Addressing these gaps can help
Indian companies leverage the power of AI and usher in a new
era of data-driven growth.

Data 4.0: making your data AI-ready 29


05
Preparing data for
an agentic future

30 Data 4.0: making your data AI-ready


Preparing data for an agentic future

As we stand at the brink of an agentic revolution, where A multi-platform approach has emerged as the new norm.
AI continues to evolve, the very fabric of organizational There is no ‘one-size-fits-all’ solution can meet the varied
structures and operations is being redefined. The next level of and growing needs of modern enterprises. Companies are
AI innovation will automate tasks that, until recently, required opting for a combinatorial strategy—leveraging Databricks for
human involvement. GenAI would extend this transformation process-intensive workloads that require spark, while using
by replacing human involvement with autonomous agents and platforms like Snowflake to host SQL data warehouses. This
bots, capable of decision-making, data analysis and complex approach ensures flexibility, scalability and efficiency across
task execution. In fact, we are moving towards a future where diverse data needs.
traditional roles like Chief Data Officer (CDO) could be filled
by a silicon counterpart—a fully autonomous AI that manages
data, drives insights and executes on strategic goals with little
or no human intervention. Domain-driven approach to data
To handle this change, organizations must prepare their data To ensure that data is not only AI-ready but also aligned with
ecosystems and architectures for a reality where AI takes business objectives, organizations should adopt a domain-
centre stage. In this context, it becomes crucial to rethink driven approach. This strategy involves creating domain-
how data is stored, processed and leveraged for intelligent specific data products, which are enriched with business
decision-making. semantics and ontologies that make the data both valuable
and contextually relevant.

Data mesh further accelerates innovation. This decentralized


approach empowers teams to manage data as a product,
A data-ready future avoiding redundancies and allowing for quicker time-to-
insight. Marketing tech tools, such as Relational AI and
To prepare organizations for an agentic future, where AI Celonis, enable the creation of functional and process views
agents can access and utilize data as seamlessly as humans, of data, bringing operational efficiency to various business
several key measures need to be undertaken. These include: functions.
• Implementing a modern data stack Unified consumption layer and trusted data catalog
• Organizing data into well-defined domains At the heart of this transformation is a unified consumption
• Creating a unified consumption layer with a catalog of layer, a crucial architectural element that drives the unified
trusted data view of data across hybrid architectures. The data fabric,
supported by a comprehensive trusted data catalog, allows
• Establishing an automated control plane for data organizations to harness the power of both structured and
governance unstructured data, regardless of the platform on which it
• Leveraging AI itself for effective data management resides. This unified view is further augmented by AI-driven
tools, which streamline the process of accessing, processing
and analyzing data through various engines—be it SQL, APIs or
GenAI models.

Modern data stack


Data must not only be available but also scalable, secure and
AI-ready. This preparation is multifaceted, beginning with
the right infrastructure choices. Many organizations have
embraced hybrid cloud strategies, combining the flexibility of
on-premises systems with the scalability of cloud platforms.
Early adopters have gravitated towards hyperscalers like
Google Cloud Platform (GCP), Amazon Web Services (AWS)
and Microsoft Fabric. However, organizations are increasingly
adopting other cloud options like JioCloud and Ola Krutrim.

Data 4.0: making your data AI-ready 31


Enterprise data governance
The widening regulatory landscape, with stringent
requirements on data lineage, privacy and risk observability,
means that organizations must adopt a robust data
governance framework. This includes implementing consistent
data policies, centrally defined and executed at the federal
level, ensuring both global and regional compliance.

Automated data cataloguing, driven by AI, will play a pivotal


role in managing and maintaining the quality of data across
the enterprise. AI-powered tools will not only ensure that
data is of high quality but will also streamline the entire data
lifecycle, reducing the time to insights and enabling faster
decision-making.

AI at the core of the data-driven


enterprise
AI’s role will go beyond being a tool for analyzing data; it
is expected to fast become a critical enabler for the entire
data ecosystem. Thus, organizations must ensure their data
is defined, accessible, certified and secured for seamless
AI integration. Organizations that successfully embrace
this change will not only lead in the digital economy but
will also pioneer the next generation of AI-driven business
transformation.

32 Data 4.0: making your data AI-ready


Data 4.0: making your data AI-ready 33
The world’s leading data and analytics trade association
Developing, innovating, & elevating data & analytics
management best practices.

EDM Council is the world’s leading data management


and analytics trade association
EDM Council is the member-driven body dedicated to elevating data management and analytics as a
strategic priority for every organization, in any industry. Founded in 2005, we support data, analytics
and business professionals with best practices, standards and education to help them get the most value
from their data assets.

A Collaborative approach
We are a neutral, global, non-profit industry forum where businesses, organizations, regulators and other
public entities collaborate to develop and implement data best practices.

We invite companies and organizations of all sizes and revenue to join our vibrant community of 400+
global organizations, 25000+ business leaders, CDOs and data professionals who are leveraging data
and analytics to achieve better outcomes.

34 Data 4.0: making your data AI-ready


Acknowledgement
Core Team Edit Team Design Team
Mahesh Makhija Prosenjit Datta Jayanta Kumar Ghosh
Partner and Technology Consulting Leader
EY India Vikram Choudhury

Alexy Thomas Kaveri Nandan


Partner, Technology Consulting
EY India KTP Radhika

Tejas Bakre
Partner, Technology Consulting
EY India

Ahmedabad Delhi NCR Kolkata


22nd Floor, B Wing, Privilon Ground Floor 22 Camac Street
Ambli BRT Road, Behind Iskcon Temple 67, Institutional Area 3rd Floor, Block ‘C’
Off SG Highway Sector 44, Gurugram - 122 003 Kolkata - 700 016
Ahmedabad - 380 059 Haryana Tel: + 91 33 6615 3400
Tel: + 91 79 6608 3800 Tel: +91 124 443 4000
Mumbai
Bengaluru 3rd & 6th Floor, Worldmark-1 14th Floor, The Ruby
12th & 13th Floor IGI Airport Hospitality District 29 Senapati Bapat Marg
“UB City”, Canberra Block Aerocity, New Delhi - 110 037 Dadar (W), Mumbai - 400 028
No.24 Vittal Mallya Road Tel: + 91 11 4731 8000 Tel: + 91 22 6192 0000
Bengaluru - 560 001
Tel: + 91 80 6727 5000 4th & 5th Floor, Plot No 2B 5th Floor, Block B-2
Tower 2, Sector 126 Nirlon Knowledge Park
Our offices

Ground & 1st Floor Gautam Budh Nagar, U.P. Off. Western Express Highway
# 11, ‘A’ wing Noida - 201 304 Goregaon (E)
Divyasree Chambers Tel: + 91 120 671 7000 Mumbai - 400 063
Langford Town Tel: + 91 22 6192 0000
Bengaluru - 560 025 Hyderabad
Tel: + 91 80 6727 5000 THE SKYVIEW 10 3rd Floor, Unit No 301
18th Floor, “SOUTH LOBBY” Building No. 1
Bhubaneswar Survey No 83/1, Raidurgam Mindspace Airoli West (Gigaplex)
8th Floor, O-Hub, Tower A Hyderabad - 500 032 Located at Plot No. IT-5
Chandaka SEZ, Bhubaneswar Tel: + 91 40 6736 2000 MIDC Knowledge Corridor
Odisha – 751024 Airoli (West)
Tel: + 91 674 274 4490 Jaipur Navi Mumbai - 400708
9th floor, Jewel of India Tel: + 91 22 6192 0003
Chandigarh Horizon Tower, JLN Marg
Elante offices, Unit No. B-613 & 614 Opp Jaipur Stock Exchange Pune
6th Floor, Plot No- 178-178A Jaipur, Rajasthan - 302018 C-401, 4th Floor
Industrial & Business Park, Phase-I Panchshil Tech Park, Yerwada
Chandigarh - 160 002 Kochi (Near Don Bosco School)
Tel: + 91 172 6717800 9th Floor, ABAD Nucleus Pune - 411 006
NH-49, Maradu PO Tel: + 91 20 4912 6000
Chennai Kochi - 682 304
6th & 7th Floor, A Block, Tel: + 91 484 433 4000 10th Floor, Smartworks
Tidel Park, No.4, Rajiv Gandhi Salai M-Agile, Pan Card Club Road
Taramani, Chennai - 600 113 Baner, Taluka Haveli
Tel: + 91 44 6654 8100 Pune - 411 045
Tel: + 91 20 4912 6800

Data 4.0: making your data AI-ready 35


Ernst & Young LLP
EY | Building a better working world
EY exists to build a better working world, helping to create long-
term value for clients, people and society and build trust in the
capital markets.

Enabled by data and technology, diverse EY teams in over 150


countries provide trust through assurance and help clients grow,
transform and operate.

Working across assurance, consulting, law, strategy, tax and


transactions, EY teams ask better questions to find new answers
for the complex issues facing our world today.
EY refers to the global organization, and may refer to one or more, of
the member firms of Ernst & Young Global Limited, each of which is
a separate legal entity. Ernst & Young Global Limited, a UK company
limited by guarantee, does not provide services to clients. Information
about how EY collects and uses personal data and a description of the
rights individuals have under data protection legislation are available via
ey.com/privacy. EYG member firms do not practice law where prohibited
by local laws. For more information about our organization, please visit
ey.com.

Ernst & Young LLP is one of the Indian client serving member
firms of EYGM Limited. For more information about our
organization, please visit www.ey.com/en_in.

Ernst & Young LLP is a Limited Liability Partnership, registered


under the Limited Liability Partnership Act, 2008 in India,
having its registered office at Ground Floor, Plot No. 67,
Institutional Area, Sector - 44, Gurugram, Haryana - 122 003,
India.

©2024 Ernst & Young LLP. Published in India.


All Rights Reserved.

EYIN2409-016
ED None

This publication contains information in summary form and is


therefore intended for general guidance only. It is not intended
to be a substitute for detailed research or the exercise of
professional judgment. Neither EYGM Limited nor any other
member of the global Ernst & Young organization can accept
any responsibility for loss occasioned to any person acting
or refraining from action as a result of any material in this
publication. On any specific matter, reference should be made to
the appropriate advisor.

JG

ey.com/en_in

@EY_India EY EY India EY Careers India @ey_indiacareers

You might also like