Technology
Breaking through
data-architecture gridlock
to scale AI
Large-scale data modernization and rapidly evolving data technologies
can tie up AI transformations. Five steps give organizations a way to break
through the gridlock.
by Sven Blumberg, Jorge Machado, Henning Soller, and Asin Tavakoli
© Getty Images
January 2021
For today’s data and technology leaders, the 1. Take advantage of a road-tested
pressure is mounting to create a modern data blueprint
architecture that fully fuels their company’s digital Data and technology leaders no longer need
and artificial intelligence (AI) transformations. In to start from scratch when designing a data
just two months, digital adoption vaulted five years architecture. The past few years have seen the
forward amid the COVID-19 crisis. Leading AI emergence of a reference data architecture
adopters (those that attribute 20 percent or more that provides the agility to meet today’s need
of their organizations’ earnings before interest and for speed, flexibility, and innovation (Exhibit 1). It
taxes to AI) are investing even more in AI in response has been road-tested in hundreds of IT and data
to the pandemic and the ensuing acceleration of transformations across industries, and we have
digital. observed its ability to reduce costs for traditional
AI use cases and enable faster time to market and
Despite the urgent call for modernization, we have better reusability of new AI initiatives.
seen few companies successfully making the
foundational shifts necessary to drive innovation. With the reference data architecture, data and
For example, in banking, while 70 percent of technology leaders are freed from spending
financial institutions we surveyed have had a cycles on architecture design. Instead, leveraging
modern data-architecture road map for 18 to 24 this blueprint, they can iteratively build their data
months, almost half still have disparate data models. architecture.
The majority have integrated less than 25 percent
of their critical data in the target architecture. All Take the case of a large German bank. By using
of this can create data-quality issues, which add this reference data architecture as its base, the
complexity and cost to AI development processes, organization reduced the time required to define
and suppress the delivery of new capabilities. its data-architecture blueprint and align it with
each stakeholder’s needs from more than three
Certainly, technology changes are not easy. But months to only four weeks. Before adoption of the
often, we find the culprit is not technical complexity; reference data architecture, business executives
it’s process complexity. Traditional architecture would become disillusioned as the CIO, CFO,
design and evaluation approaches may paralyze risk leaders, and business executives debated
progress as organizations overplan and overinvest in architectural choices and conducted lengthy
developing road-map designs and spend months on technology evaluations, even when product
technology assessments and vendor comparisons differences had no material impact on the bank’s
that often go off the rails as stakeholders debate goals. To shift tactics, the company’s CIO identified
the right path in this rapidly evolving landscape. the minimal deviations required from the reference
Once organizations have a plan and are ready to architecture and presented to all the stakeholders
implement, their efforts are often stymied as teams examples of companies across industries that had
struggle to bring these behemoth blueprints to succeeded with the same approach. Executives
life and put changes into production. Amid it all, agreed they had the setup, market positioning, and
business leaders wonder what value they’re getting talent pool to achieve similar results, and the CIO’s
from these efforts. team quickly began building the new architecture
and ingesting data.
The good news is that data and technology leaders
can break this gridlock by rethinking how they Importantly, this isn’t a one-and-done exercise.
approach modernization efforts. This article shares Each quarter, technology leaders should review
five practices that leading organizations use to progress, impact, funding, and alignment with
accelerate their modernization efforts and deliver strategic business plans to ensure long-term
value faster. Their work offers a proven formula for alignment and a sustainable technology build-
those still struggling to get their efforts on track and out. One global bank implemented a new supply-
give their company a competitive edge. based funding process that required business
2 Breaking through data-architecture gridlock to scale AI
Exhibit 1
A reference data architecture for AI innovation streamlines the design
A reference data architecture for AI innovation streamlines the design process.
process.
ML
AI toolbox Ops
Physical
Mobile Web CRM
channels
Systems of
engagement Application databases
APIs and management platform
Data lake
Unified data,
analytics core Curated data vault (by domain) Analytics
Raw data vault
Ingestion Batch Exotic
Real-time streaming
Systems
of record Core processing systems External data Unstructured data
Cloud-enabled data platforms and services (serverless, noOps) Data
Ops
units to reprioritize their budgets quarterly against completing the previous ones. In fact, in our latest
immediate business priorities and the company’s global survey on data transformation, we found that
target technology road map before applying for nearly three-quarters of global banks are knee-
additional funds. This new process helped the bank deep in such an approach.¹
overcome underfunding of $250 million in the first
year while gaining immediate business impact from However, organizations can realize results faster by
refocused efforts. taking a use-case approach. Here, leaders build and
deploy a minimum viable product that delivers the
specific data components required for each desired
2. Build a minimum viable product, use case (Exhibit 2). They then make adjustments as
and then scale needed based on user feedback.
Organizations commonly view data-architecture
transformations as “waterfall” projects. They map One leading European fashion retailer, for instance,
out every distinct phase—from building a data decreased time to market of new models and
lake and data pipelines up to implementing data- reduced development costs when it focused first
consumption tools—and then tackle each only after on the architectural components necessary for its
¹The McKinsey Global Data Survey garnered responses from more than 50 banks, representing various regions and company sizes. To ensure
comparability of results and identification of key trends, several questions on key industry trends and demographics were extracted.
Breaking through data-architecture gridlock to scale AI 3
Exhibit 2
Each
Eachcommon
commonbusiness use
business case
use is associated
case withwith
is associated components of the
components ofdata
the data
architecture.
architecture.
Illustrative
Most common use cases, by component
AI tools
• Chatbots
• Marketing technology (eg, customer data platform or campaign management)
• Relationship-based pricing
• Intelligent dashboards showing spending patterns
Application programming interfaces (APIs)
• Data monetization
• Data ecosystems
• Virtual assistants
• ID proofing
• Master-data management
Data warehouse Data lake Data streaming
• Financial reporting • Campaign and performance • Personalization
(profit and loss, balance sheet) reporting • Anti-money-laundering (AML)
• Credit-risk reporting • Predictive marketing fraud and transaction monitoring
• Loan-application scoring • 360-degree customer view • Real-time data ingestion
• Compliance (drawing on historical
stores of multiple data types)
• New use-case and model testing
Shared ingestion layer
• Fast access and test-and-learn research and development via AI sandboxes
priority use cases. At the outset, leaders recognized as needed; and a process to shut them down
that for data-science teams to personalize offerings when they aren’t needed. Whereas physical
effectively across multiple online and mobile and virtual environments could once run up IT
channels, including social channels, they would bills for months and years, such environments
need fast access to data. Previously, data scientists can now be accessed on the cloud for less than
had to request data extracts from IT, and data were 30 minutes—the average amount of time that
often outdated when received. they’re actually needed—generating substantial
cost savings.
The retailer’s focus on the architecture its use
cases required enabled development of a highly Once organizations finish building the
automated, cloud-based sandbox environment components for each use case, they can then
that provides fast access to data extracted from a scale and expand capabilities horizontally
shared, company-wide ingestion layer; an efficient to support other use cases across the
manner to spin up analytics and AI sandboxes entire domain. In the case of the retailer, as
4 Breaking through data-architecture gridlock to scale AI
new personalized offerings become ready for programming interfaces (APIs). This approach
deployment, the organization moves the selected reduced time to market and made it easier to use
data features into curated, high-quality data fast-paced analytical modeling, enabling new
environments for production access. customer-360 and master-data-management use
cases, while reducing the complexity of the overall
environment.
3. Prepare your business for change
Legitimate business concerns over the impact any
changes might have on traditional workloads can 4. Build an agile data-engineering
slow modernization efforts to a crawl. Companies organization
often spend significant time comparing the risks, In our experience, successful modernization efforts
trade-offs, and business outputs of new and legacy have an integrated team and an engineering culture
technologies to prove out the new technology. centered around data to accelerate implementation
of new architectural components. Achieving this
However, we find that legacy solutions cannot requires the right structural and cultural elements.
match the business performance, cost savings, or
reduced risks of modern technology, such as data From an organizational perspective, we see a push
lakes. Additionally, legacy solutions won’t enable toward reorienting the data organization toward a
businesses to achieve their full potential, such as product and platform model, with two types of teams:
the 70 percent cost reduction and greater flexibility
in data use that numerous banks have achieved — Data platform teams, consisting of data
from adopting a data-lake infrastructure for their engineers, data architects, data stewards, and
ingestion layer. data modelers, build and operate the architecture.
They focus on ingesting and modeling data,
As a result, rather than engaging in detailed automating pipelines, and building standard APIs
evaluations against legacy solutions, data and for consumption, while ensuring high availability
technology leaders better serve their organization of data, such as customer data.
by educating business leaders on the need to let
go of legacy technologies. One telecom provider, — Data product teams, consisting mostly of data
for example, set up mandatory technology courses scientists, translators, and business analysts,
for its top 300 business managers to increase focus on the use of data in business-driven AI
their data and technology literacy and facilitate use cases such as campaign management. (To
decision making. As part of the training, the data see how this structure enables efficiency across
leadership team (including engineers, scientists, even the larger, more complex organizations, see
and practitioners) shared the organization’s new sidebar, “Sharing data across subsidiaries.”)
data operating model, recent technology advances,
and target data architecture to help provide context The cultural elements are aimed at improving talent
for the work. recruiting and management to ensure engineers are
learning and growing. A Western European bank is
In addition to educating business leaders, cultivating a learning culture through a wide range
organizations should refocus efforts from of efforts:
their legacy stack to building new capabilities,
particularly in the infrastructure-as-a-service — Providing engineers with clearly documented
space. A chemical company in Eastern Europe, for career paths. This includes establishing
instance, created a data-as-a-service environment, formal job levels for engineers based on their
offloading large parts of its existing enterprise productivity, with promotion rounds based on
resource planning and data-warehouse setup to qualitative feedback, their contributions to
a new cloud-based data lake and provisioning the open-source communities, their management
underlying data through standardized application skills, and their knowledge, all assessed against
Breaking through data-architecture gridlock to scale AI 5
Sharing data across subsidiaries
Across industries, regulators and companies’ risk, compliance, supply chain, and finance departments are increasingly asking
for granular data access covering the headquarters and subsidiaries. On the regulatory side, for example, companies exporting
products that can be used for both civilian and military applications must provide regulators full transparency across the value chain.
On the operational side, such transparency can help provide more advanced insight into global supply chains and operations and
improve productivity, reducing the resources needed to build and manage an end-to-end data architecture in every country.
In response, organizations are moving toward defining data-architecture strategies that can transfer learnings from headquarters
to subsidiaries or vice versa. Companies that do this well, such as Amazon, Google, and Microsoft, harmonize their business and
technology delivery models. This entails setting up a global team with a clear product owner, who owns the global data model, and
dedicated data architects and engineers, who create a shared data vault containing the granular transaction data of the subsidiaries.
Local engineers within the subsidiaries then make any customizations they need while remaining aligned with global teams.
By taking this approach, a French bank drastically improved the quality of its anti-money-laundering and know-your-customer
reporting while lowering the cost of the data architecture for subsidiaries by 30 percent. These positive results have laid the
foundation for groupwide scaling of another data lake to support other use cases, such as calculating risk.
a structured maturity grid. The bank also and virtual conferences. To support this, bank
revised its compensation structure to ensure leaders have instituted an agile performance-
that engineers at the highest job levels receive management model that emphasizes
compensation comparable to that of senior both knowledge and expertise. At other
managers in IT, data, and the business. organizations, the performance measurement
of top executives and team members includes
— Adopting a pragmatic approach to assessing their industry contributions; their success
expertise levels. Research indicates that expert metrics might include, for example, the
engineers are eight times more productive number of keynote presentations they deliver
than novices, so the success of modernization throughout the year.
efforts depends on effective recruitment,
management, and organization of talent. — Emphasizing engineering skills and
To provide a consistent measurement for achievements. To emphasize technical skills,
recruiting, upskilling, and advancement, the the bank encourages everyone in IT, including
bank used the well-known Dreyfus model for managers, to write code. This creates a spirit
skill acquisition to identify five aptitude levels of craftmanship around data and engineering
from novice to master, rate observable behavior and generates excitement about innovation.
through key indicators, and develop individual
training plans based on the feedback.
5. Automate deployment using
— Establishing a culture of continuous technology DataOps
learning. Continuous learning requires the Changing the data architecture and associated
sharing of expertise through formal and data models and pipelines is a cumbersome
informal forums, peer reviews, and freedom to activity. A big chunk of engineering time is spent
pursue online training courses, certifications, on reconstructing extract, transform, and load
6 Breaking through data-architecture gridlock to scale AI
(ETL) processes after architectural changes have standards and developed a code library to optimize
been made or reconfiguring AI models to meet new code reuse. It is currently defining an easier way
data structures. A method that aims to change this to deploy models in production to reduce time
is DataOps, which applies a DevOps approach to lags between model development and use. Once
data, just as MLOps applies a DevOps approach completed, this will reduce the typical time required
to AI. Like DevOps, DataOps is structured into to deploy models and apply results, such as
continuous integration and deployment phases with identifying the right mixtures, from weeks to hours.
a focus on eliminating “low-value” and automatable
activities from engineers’ to-do lists and spanning
the delivery life cycle across development, testing,
deployment, and monitoring stages. Instead of Today, most data technologies are readily available
assessing code quality or managing test data or in the cloud, making adoption a commodity. As
data quality, engineers should focus their time on a result, the difference between leaders and
code building. A structured and automated pipeline, laggards in the data space will depend on their
leveraging synthetic data and machine learning for ability to evolve their data architecture at a brisk
data quality, can bring code and accompanying ETL pace to harness the wealth of data collected over
and data-model changes into production decades and new data streaming in. Organizations
much faster. that can’t move as quickly risk derailing their
digital and AI transformations. The five practices
One large pharmaceutical company is working we have outlined, along with a positive vision
to bring biometric insights to its front line more and a compelling story for change, can enable
quickly using DataOps. It has defined automated organizations to move at the necessary speed,
ways to test new biometric analytics models against building momentum and value along the way.
Sven Blumberg is a senior partner in McKinsey’s Istanbul office, Jorge Machado is a partner in the New York office, Henning
Soller is a partner in the Frankfurt office, and Asin Tavakoli is a partner in the Düsseldorf office.
Copyright © 2021 McKinsey & Company. All rights reserved.
Breaking through data-architecture gridlock to scale AI 7