ISPE GAMP RDI Good Practice Guide Data Integrity by Design
ISPE GAMP RDI Good Practice Guide Data Integrity by Design
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
RECORDS AND
ISPE, the Developers of GAMP®
DATA INTEGRITY
Data Integrity
by Design
Disclaimer:
The ISPE GAMP® RDI Good Practice Guide: Data Integrity by Design provides practical guidance to support
pharmaceutical organizations achieve good data governance together with the efficient and effective implementation
and operation of compliant GxP computerized systems. This Guide is solely created and owned by ISPE. It is not
a regulation, standard or regulatory guideline document. ISPE cannot ensure and does not warrant that a system
managed in accordance with this Guide will be acceptable to regulatory authorities. Further, this Guide does not
replace the need for hiring professional engineers or technicians.
Limitation of Liability
In no event shall ISPE or any of its affiliates, or the officers, directors, employees, members, or agents of each
of them, or the authors, be liable for any damages of any kind, including without limitation any special, incidental,
indirect, or consequential damages, whether or not advised of the possibility of such damages, and on any theory of
liability whatsoever, arising out of or in connection with the use of this information.
All rights reserved. No part of this document may be reproduced or copied in any form or by any means – graphic,
electronic, or mechanical, including photocopying, taping, or information storage and retrieval systems – without
written permission of ISPE.
ISBN 978-1-946964-34-2
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 2 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Preface
Data Integrity by Design is the concept that data integrity must be incorporated from the initial planning of a business
process through to the implementation, operation, and retirement of computerized systems supporting that business
process. It promotes the application of critical thinking to identify how data flows through the business process, and
to proactively assess and mitigate risks across both the system and data lifecycles. It emphasizes data integrity as
foundational to protecting patient safety and product quality.
This ISPE GAMP® RDI Good Practice Guide: Data Integrity by Design supports organizations as they embrace
and implement a holistic approach by leveraging data governance and knowledge management activities to drive
continual improvement in data integrity. The Guide fosters a patient-centric mindset, focusing resources and
management attention on quality best practices that inherently facilitate meeting regulatory compliance requirements.
This Guide provides a bridge between the system lifecycle approach defined in ISPE GAMP® 5: A Risk-Based
Approach to Compliant GxP Computerized Systems, and the data lifecycle approach in the ISPE GAMP® Guide:
Records and Data Integrity. Data integrity can only be achieved when both lifecycle approaches are adopted,
understood, and actively managed.
Part of the ISPE GAMP® Guide: Records and Data Integrity series, this Guide replaces and significantly expands
upon the ISPE GAMP® Good Practice Guide: Electronic Data Archiving from 2007.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 3
Data Integrity by Design
Acknowledgements
The Guide was produced by a Task Team led by James Henderson (Eli Lilly and Company, USA), Lorrie Vuolo-
Schuessler (Syneos Health, USA) and Charlie Wakeham (Waters Corporation, Australia). The work was supported by
the ISPE GAMP Community of Practice (CoP) and sponsored by Michael Rutherford (Syneos Health, USA).
Core Team
The following individuals took lead roles in the preparation of this Guide:
The Leads wish to thank the following individuals for their valuable contribution during the preparation of this Guide.
The Team Leads wish to thank the following members of the CSA Industry Pilot Team for their exceptional efforts in
producing the CSA Appendix.
The Team Leads wish to thank the following members for their hard work on the Special Interest Appendix on Artificial
Intelligence: Machine Learning.
The Team Leads wish to thank the following individuals for their expert contribution to the document.
Particular thanks go to the following for their review and comments on this Guide:
Special Thanks
The Leads would like to give particular thanks to Chris Clark (TenTenTen Consulting Limited, United Kingdom), and
ISPE Technical Advisor, Sion Wyn (Conformity Ltd., United Kingdom) for their efforts during the creation process
of this Guide. The Team would also like to thank ISPE for technical writing and editing support by Jeanne Perez
(ISPE Guidance Documents Technical Writer/Editor) and production support by Lynda Goldbach (ISPE Guidance
Documents Manager).
The Team Leads would like to express their grateful thanks to the many individuals and companies from around the
world who reviewed and provided comments during the preparation of this Guide; although they are too numerous to
list here, their input is greatly appreciated.
www.ISPE.org
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 5
Data Integrity by Design
Table of Contents
1 Introduction ..................................................................................................................... 9
.
1.1 Background .................................................................................................................................................. 9
.
1.2 Case for Quality Program ............................................................................................................................ 9
.
1.3 Purpose...................................................................................................................................................... 10
.
1.4 Scope ......................................................................................................................................................... 10
.
1.5 Structure of the Guide ................................................................................................................................ 11
.
1.6 Key Terms .................................................................................................................................................. 11
.
1.7 Key Roles and Responsibilities.................................................................................................................. 14
.
2.1 Governance to Achieve Data Integrity by Design ...................................................................................... 15
.
2.2 Data Ownership ......................................................................................................................................... 19
.
3 Retention Strategy ........................................................................................................ 23
.
3.1 Retention Periods ...................................................................................................................................... 24
.
3.2 Readability ................................................................................................................................................. 25
.
3.3 Availability .................................................................................................................................................. 28
.
3.4 Access ....................................................................................................................................................... 30
.
3.5 Protecting Records and Data ..................................................................................................................... 31
.
3.6 Managing System Retirement.................................................................................................................... 36
3.7 Records Management and Retention through Mergers, Acquisitions, and Divestments ........................... 41
.
4 Implementing Data Integrity by Design .................................................................... 43
.
4.1 A Process to Achieve Data Integrity by Design .......................................................................................... 43
.
4.2 Business Process ...................................................................................................................................... 47
.
4.3 Data Flow Diagrams .................................................................................................................................. 48
.
4.4 Data Classification and the Intended Use of the Data ............................................................................... 49
.
4.5 Business Process Risk Assessment .......................................................................................................... 50
.
4.6 Data Lifecycle ............................................................................................................................................ 51
.
4.7 Data Nomenclature .................................................................................................................................... 55
.
5 System Planning ............................................................................................................ 59
.
5.1 Planning Computerized Systems to Efficiently Support the Optimized Business Process ........................ 59
.
5.2 Addressing Individual Systems .................................................................................................................. 61
.
5.3 System Risk Assessment........................................................................................................................... 63
.
6 Active Records............................................................................................................... 67
.
6.1 Creation ..................................................................................................................................................... 67
.
6.2 Processing ................................................................................................................................................. 69
.
6.3 Review, Reporting, and Use ...................................................................................................................... 75
.
7 Semi-active and Inactive Records ............................................................................... 81
.
7.1 Semi-active Records .................................................................................................................................. 81
.
7.2 Retention of Inactive Records .................................................................................................................... 81
.
7.3 Return to Active State (Retrieval)............................................................................................................... 86
7.4 Destruction ................................................................................................................................................. 87
.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 6 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Management Appendices
.
8.1 Introduction ................................................................................................................................................ 89
.
8.2 Key Concepts............................................................................................................................................. 89
8.3 Managing Knowledge ................................................................................................................................ 92
.
8.4 Mindsets and Behaviors............................................................................................................................. 94
8.5 Conclusion ................................................................................................................................................. 95
.
9 Appendix M2 – Understanding Data Integrity Compared to Data Quality .......... 97
.
10 Appendix M3 – Third-Party Data ................................................................................ 99
.
10.1 Introduction ................................................................................................................................................ 99
.
10.2 Assessments and Responsibilities............................................................................................................. 99
10.3 Data Governance for CxO ......................................................................................................................... 99
.
10.4 Data Storage Off-Premise........................................................................................................................ 100
.
10.5 Conclusion ............................................................................................................................................... 101
.
Development Appendices
.
12 Appendix D2 – Instrument Devices with Electronic Record Storage ...................105
.
12.1 Introduction .............................................................................................................................................. 105
.
12.2 Background .............................................................................................................................................. 105
.
12.3 Instrument Use......................................................................................................................................... 106
.
12.4 Accounting for Original Data .................................................................................................................... 106
.
12.5 Challenges ............................................................................................................................................... 107
.
12.6 Remediation Strategies............................................................................................................................ 107
12.7 Risk Register............................................................................................................................................ 109
.
12.8 Example Data Integrity Risks, Interim Controls, and Actions to Consider ............................................... 109
.
12.9 Conclusion ............................................................................................................................................... 110
.
Operation Appendices
.
16 Appendix O4 – Example Retention Periods and Requirements ...........................125
.
17 Appendix O5 – Maintaining Legacy Software .........................................................139
.
17.1 Introduction .............................................................................................................................................. 139
.
17.2 Non-Disposal of Retired Systems ............................................................................................................ 139
.
17.3 Compatibility to Modern Operating Systems............................................................................................ 139
17.4 Virtual Machine Solution .......................................................................................................................... 139
.
17.5 The Hardware Museum ........................................................................................................................... 140
.
17.6 Conclusion ............................................................................................................................................... 140
.
Special Interest Topics Appendices
.
18.1 Introduction .............................................................................................................................................. 141
.
18.2 Background .............................................................................................................................................. 141
.
18.3 Scope ....................................................................................................................................................... 141
.
18.4 Data Lifecycle (Iterative, Autonomous, and Adaptive) ............................................................................. 142
.
18.5 Concept Phase (Understanding the Business Case)............................................................................... 142
18.6 Project Phase (Data Modeling and Evaluation) ....................................................................................... 144
.
18.7 Operation Phase (Deployment and Monitoring)....................................................................................... 145
18.8 Further Reading ....................................................................................................................................... 146
.
19 Appendix S2 – Computer Software Assurance ........................................................147
.
19.1 Introduction .............................................................................................................................................. 147
.
19.2 Establishing a Lifecycle-based Approach and Basic Assurance .............................................................. 152
.
19.3 Risk-based Assurance ............................................................................................................................. 153
.
19.4 Example: Applying Risk-based Approach from ISPE GAMP® 5 and CSA ............................................... 157
.
19.5 Example: Applying ISPE GAMP® 5 and CSA Using Direct Leveraging of Testing
Throughout the System Lifecycle............................................................................................................. 160
19.6 Conclusion ............................................................................................................................................... 162
.
General Appendices
1 Introduction
1.1 Background
To date, many regulated life science organizations have been addressing data integrity on a system by system
basis. The initial focus has often been on achieving good data integrity outcomes by the remediation of technical
gaps in existing systems, and by developing good practice cultures and behaviors to ensure that compliance is
being achieved “in real time.” This focus may have resulted in missed opportunities to work with suppliers on the
development of enhanced functionalities supporting data integrity, for both existing and new systems.
The requirement to achieve data integrity is not new; data integrity expectations and holistic approaches have been
implied within regulations globally for some time, for example:
“(i) Product realisation is achieved by designing, planning, implementing, maintaining and continuously improving
a system that allows the consistent delivery of products with appropriate quality attributes”
“(xi) Continual improvement is facilitated through the implementation of quality improvements appropriate to the
current level of process and product knowledge.”
“Laboratory controls shall include the establishment of scientifically sound and appropriate specifications,
standards, sampling plans, and test procedures designed to assure that components, drug product containers,
closures, in-process materials, labeling, and drug products conform to appropriate standards of identity, strength,
quality, and purity.”
“Laboratory records shall include complete data derived from all tests necessary to assure compliance with
established specifications and standards, including examinations and assays”
Achieving data integrity requires a holistic approach across the entire organization, leveraging critical thinking, and
implementing data governance across all regulated business processes.
The US FDA Center for Devices and Radiological Health (CDRH) Case for Quality program [4] promotes a risk-
based, product quality-focused, and patient-centric approach. This initiative has been endorsed by the ISPE GAMP®
CoP (Community of Practice) Leadership [5] who believe that:
“such an approach is appropriate throughout the regulated life science industries, including pharmaceutical,
biological, and medical devices, and throughout the complete product life cycle, regardless of the specific
applicable predicate regulation. GAMP® strongly supports the adoption of new and innovative computerized
technologies and approaches throughout the product life cycle to support product quality, patient safety, and
public health.”
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 10 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Computer Software Assurance was born from CDRH’s Case for Quality [4] which enhances and incentivizes the
adoption of practices and behaviors to improve medical safety, responsiveness, and how patients experience
products. As part of that effort, Computerized System Validation (CSV) was identified as a barrier to new technologies
due to the lack of clarity on risk-based effort, compliance-focused approaches, and perceived regulatory burden.
In actuality, the FDA supports and encourages the use of automation, information technology, and data solutions
throughout the product lifecycle in the design, manufacturing, service, and support of life sciences [6]. An industry team
was formed and work begun on the development of an FDA draft guidance on Computer Software Assurance (CSA).
Appendix S2 contains a detailed discussion of CSA, and a case study based around a data integrity technical control
demonstrating the application of CSA in a real-life situation.
1.3 Purpose
This ISPE GAMP® RDI Good Practice Guide: Data Integrity by Design builds on the guidance contained in the various
ISPE GAMP® Good Practice Guide Series [7], including the ISPE GAMP® Guide: Records and Data Integrity [8], and
ISPE GAMP® 5: A Risk-Based Approach to Compliant GxP Computerized Systems [9].
The focus of this new Good Practice Guide is on unifying system and data lifecycles, from data creation to
destruction. The purpose of this new Guide is to provide a practical “bridge” between achieving good data
governance for regulated business processes and the efficient and effective implementation and operation of
compliant GxP computerized systems.
This Guide emphasizes the interrelationships between these operational imperatives and presents a common
approach, providing a harmonized framework that will help organizations achieve/enhance product quality and patient
safety.
1.4 Scope
This Guide is intended to encourage organizations to adopt a risk-based approach to ensuring data integrity in
support of patient safety and product quality.
The scope of this Guide is as broad as the scope of ISPE GAMP® 5 [9] and includes all computerized systems used
in support of GxP regulated business processes within the life science industries. While computerized systems
feature strongly within this Guide, as these systems provide technical controls for data integrity, data governance
also requires procedural and behavioral controls supported by a quality culture that leverages critical thinking and a
continual improvement mindset.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 11
Data Integrity by Design
This Guide is divided into seven chapters and related sub-sections, plus a set of appendices, as shown in Figure 1.1.
1.6.1 ALCOA+
ALCOA+ is the acronym for the key concepts that can help to support record and data integrity [8]. See Table 1.1.
Original • Original data is the first recording of data, or a “true copy” which preserves content or
meaning
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 12 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Complete • All data, and relevant metadata, including any repeat or re-analysis performed
Available • Available and accessible for review, audit, or inspection throughout the retention period
In this document the term audit trail refers to a data audit trail of operator entries and actions that create, modify,
or delete regulated records, as required by 21 CFR Part 11 [10] and EU Annex 11 [11], as distinguished from other
system and technical logs.
1.6.3 GxP
The term “GxP” is used within this Guide to represent the encompassing regulations (Good Practices) to which
different aspects of regulated companies must adhere. It is not intended to imply that all regulatory requirements
are the same across Good Manufacturing Practice (GMP), Good Clinical Practice (GCP), Good Laboratory Practice
(GLP), Good Distribution Practice (GDP), and Good Pharmacovigilance Practice (GVP, also known as GPvP), etc.
Data integrity and data quality are sometimes used interchangeably, but there is actually a difference between these
two terms.
It is the awareness of the similarities and differentiation between data integrity and data quality that is less prevalent
and therefore a number of examples are provided in Appendix M2 Table 9.1 to aid in this clarification.
The need for data integrity is well established through predicate rules, through data integrity guidance, and through
industry acceptance of data integrity as an essential component to protecting patient safety and ensuring product quality.
Data integrity is the assurance that the data is original and trustworthy, and that this assurance has been maintained
throughout the data lifecycle. The requirements for data integrity are discussed comprehensively in the ISPE GAMP®
Guide: Records and Data Integrity [8].
The ISPE GAMP® Guide: Records and Data Integrity [8] states that:
“Data quality relates to the data’s fitness to serve its intended purpose in a given context within a specified
business or regulatory process. Data quality management activities address aspects including accuracy,
completeness, relevance, consistency, reliability, and accessibility.”
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 13
Data Integrity by Design
“The assurance that data produced is exactly what was intended to be produced and fit for its intended purpose.
This incorporates ALCOA.”
The Organisation for Economic Co-operation and Development (OECD) [13] defines data quality as:
“Data quality is the assurance that the data produced are generated according to applicable standards and fit for
intended purpose in regard to the meaning of the data and the context that supports it. Data quality affects the
value and overall acceptability of the data in regard to decision-making or onward use.”
Data quality requires that the data is organized and able to be accessed, sorted, and searched to enable the business
to effectively use the data, and is reflected in the list below:
• Data Accuracy: The extent to which the data is free of identifiable errors
• Data Accessibility: The level of ease and efficiency at which data is legally obtainable, within a well-protected
and controlled environment
• Data Comprehensiveness: The extent to which all required data within the entire scope are collected,
documenting intended exclusions
• Data Consistency: The extent to which the data is reliable, identical, and reproducible by different users across
applications
• Data Currency: The extent to which data is up-to-date; a datum value is up-to-date if it is current for a specific
point in time, and it is outdated if it was current at a preceding time but incorrect at a later time
• Data Nomenclature: A consistent approach to metadata entry that facilitates identification of data relating to the
same product or process for use in trending and/or data analytics, for example, a data lake. Discussed more fully
in Section 4.7
• Data Granularity: The level of detail at which the attributes and characteristics of data quality are defined
• Data Precision: The degree to which measures support their purpose, and/or the closeness of two or more
measures to each other
• Data Relevancy: The extent to which data is useful for the purposes for which it was collected
• Data Timeliness: The availability of up-to-date data within the useful, operative, or indicated time
Note: Items in bold underline are reasonably addressed in ALCOA+ as part of data integrity attributes.
Data quality is defined by the Data Management Body of Knowledge (DMBOK) [14] as:
“…the planning, implementation, and control of activities that apply quality management techniques to data, in
order to assure it is fit for consumption and meet the needs of data consumers.”
See Figure 8.2 in Appendix M1 for the concept of data producers and consumers within knowledge and data
management.
In this Guide, data is classified as regulated, operational, or unnecessary, as originally defined in ISPE GAMP® RDI
Good Practice Guide: Data Integrity – Manufacturing Records [15]:
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 14 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
• “Regulated: used for a regulated decision or to support a regulated process, that is, data as required by, or in
support of, the predicate rules – what we have to keep
• Operational: non-regulated data used for business process decisions such as performance analysis and
management of maintenance schedules – what we want to keep
• Unnecessary: data not needed due to either the circumstances of its creation (e.g., during a non-regulated
activity or process) or because that data does not provide additional context, metadata, or meaning for the
activity or process – what we do not need”
Note that some data may needs to be retained for financial, health and safety, or other non-life science regulations.
The key roles recommended for data integrity by design are those identified in the following ISPE GAMP® publications:
• ISPE GAMP® 5: A Risk-Based Approach to Compliant GxP Computerized Systems (GAMP 5) [9]
• ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts (DI-KC) [16]
The definitions of these roles and their assigned responsibilities are not repeated here, however, the roles and where
they are described are listed below:
• Supplier (GAMP 5)
See Figure 2.3 for a schematic representation of the relationship between various owner roles, data lifecycle, and
record phase.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 15
Data Integrity by Design
Data integrity by design can be achieved through a governance framework combined with Quality Risk Management
(QRM) and Knowledge Management (KM).
The data governance framework provides the controls for data integrity and quality assurance. QRM is collectively
applied to product and process understanding to achieve patient safety, product quality, and data integrity resulting in
high quality foundational data. This is an input to KM to drive continual improvement to organizational and business
processes. Throughout all activities, critical thinking must be leveraged within a quality culture to strive for operational
excellence. This all-encompassing approach is shown in Figure 2.1. KM is explained in detail in Appendix M1.
Governance to achieve data integrity by design requires the following elements to be established:
• A long-term vision
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 16 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Figure 2.2 expands on the data governance framework with tools to aid in the understanding of the process and to
optimize system planning to achieve data integrity by design. These tools are discussed in detail in Chapter 4.
Some components of this governance approach may already be in place within the regulated organization; however,
these should be reviewed and expanded as necessary to achieve a fully comprehensive governance framework.
Effective data governance is the ultimate aim of any data integrity assurance program.
In order to achieve an effective data integrity by design strategy, a high-level vision for record and data management,
“what this might look like,” needs to be established at an organization level.
The high-level vision needs to leverage critical thinking across all data governance activities, including developing the
data flow through the business process and identifying critical-to-quality data and its owner(s). It also needs to apply a
risk-based approach to identifying and mitigating the risks and vulnerabilities of manual intervention, which otherwise
could result in inadvertent or unauthorized data manipulation.
Without an organization-wide vision for data integrity by design, there is the risk that important decisions relating to
the introduction or modification of processes and systems may be taken without proper consideration of the data
integrity implications. This may result in the organization adopting suboptimal solutions that prevent or inhibit the
ability to achieve and demonstrate good data integrity practices.
______________________________________________________________________________________
_
For Example, a strong company vision with respect to data integrity could be that:
As part of our commitment to support public health, patient safety, and product quality, our five-year vision
is that all regulated records will be created and maintained in electronic format according to good practice
ALCOA+ principles throughout their lifecycle (creation to destruction).
______________________________________________________________________________________
_
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 17
Data Integrity by Design
This vision should form a strategic approach to defining an appropriate data architecture and corresponding data
lifecycle that are considered when acquiring new systems and remediating existing record and data lifecycle
processes. This strategic approach sets the expectation that eventually eliminates all paper and hybrid records within
the organization that are difficult to maintain in a compliant state throughout their retention period.
The vision can be translated into annual goals and continual improvement objectives that funnel down to individual
contributors (who are the data generators) to provide assurance that integrity and quality are built into the process.
Governance can take place at many levels within an organization, for example, global, regional, group, national, site
wide, departmental, process, project.
From a governance point of view, “data integrity by design” can be restated as “data integrity by intent.” There needs
to be a clear mandate at every level of governance that achieving data integrity through intentional design is a
business-critical requirement, one that will be supported, endorsed, and resourced. This should be a commitment not
just for the quality unit, but for all functions involved in the design, implementation, validation, and operation of the
business process.
Governance to achieve data integrity and governance for the validation of computerized systems within a regulated
organization have been discussed in the ISPE GAMP® Guide: Records and Data Integrity [8] and ISPE GAMP® 5 [9]
respectively.
Validation of computerized systems for intended use is necessary to achieve data integrity, while data integrity
requirements must be considered as part of each validation effort in order to ensure that ALCOA+ imperatives are
delivered.
When determining and establishing governance approaches to achieve data integrity by design, there is an integral
relationship between data integrity and computerized system validation that must be recognized in the corporate
Quality Management System (QMS). The data needs will in turn create the risk scenarios that ultimately drive the
configuration and validation activities for the computerized system. Data, and the regulated use of the data, are the
drivers for many project activities within the system lifecycle. The system lifecycle ensures that ongoing controls
are established and verified to make certain the data is appropriately and effectively managed throughout the data
lifecycle.
Table 2.1 shows a list of practices that are shared by good governance frameworks for both data integrity and
computerized system validation.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 18 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
A company-wide data integrity by design strategy should be established as early as possible in the development of
any significant data integrity initiative. It may also be appropriate to determine a data integrity by design strategy at a
lower level, such as for a specific business process or project.
Effective data integrity by design should follow a risk-based approach that encompasses the full business process
lifecycle. This lifecycle addresses data (creation, processing, reporting and use, retention and destruction) and, by
implication, the lifecycle phases of all associated computerized systems (concept, project, operation, retirement).
For a regulated organization, the scope of governance activities to achieve data integrity by design should cover
GxP processes, data lifecycles, and computerized systems. The same governance elements are required whether
the component(s) of the computerized system are on-premise or leverage a service-based architecture (e.g.,
Infrastructure as a Service (IaaS), Platform as a Service (PaaS), Software as a Service (SaaS)).
This governance scope is shown schematically in Figure 2.3. This diagram is an abstract representation of the need
for, and relationship between, the data lifecycle and system lifecycle for the computerized systems involved in the
creation, processing, review, reporting and use, retention and destruction of the data within a business process. The
data lifecycle should be considered both within the context of the system lifecycle and separate to it. The lifecycles
run on independent timelines in that the data may outlive the system and need alternative arrangements for retention
after system retirement, or conversely, the system may continue in operation long after the retention period has
expired for a specific data set.
Within the operational phase of the system lifecycle, change control and configuration management are essential to
maintain the validated state of the system.
The roles and responsibilities depicted at the top and bottom of the diagram in Figure 2.3 need to be defined and
assigned to the appropriate individuals, and may only be meaningful through specific phases of the two lifecycles.
Of these roles, data ownership is a critical concept to ensure data integrity throughout the data lifecycle. Regardless
of where the data resides (on- or off-premise) or where it is generated (Contract Manufacturing Organization (CMO),
Contract Research Organization (CRO), etc.), data ownership remains with the regulated company.
Figure 2.3 shows that processes to manage records and data throughout the data lifecycle must be considered in
the initial data integrity by design phase. Data lifecycles can transcend multiple systems, with data flow between the
systems. Overall, a data flow diagram of the complete business process is an essential requirement for data integrity
by design.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 19
Data Integrity by Design
To consistently achieve an effective design that delivers compliance throughout the data lifecycle, the retention
requirements must be understood when defining the business process specification.
In many circumstances it may be ineffective to consider only the lifecycle of the data within a single system because
data may be transferred across multiple systems (computerized or paper-based) as part of the data flows required
to construct the regulated record. Where copies of the data exist in more than one system (e.g., source and target
systems), it is important to understand:
• Which record will be used to make any decision under the predicate rules
It is also possible that a single system supports multiple business processes, for example, a document management
system. Consequently, the retention and other data integrity aspects from several processes may need to be
considered in the system design and implementation. A holistic approach to data integrity by design is therefore
required over the full data lifecycle(s) for all impacted regulated data.
Consider a business process that is not computerized. The (business) process owner is accountable for all aspects of
the process.
Once the business process is supported by a computerized system, accountabilities are divided into two separate
aspects: process ownership and system ownership (responsible for the computerized system, as discussed below).
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 20 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Those activities associated with the performance of the business process remain with the process owner; the
separate role of system owner has accountability for those activities related to the technical support and maintenance
of the computerized system.
The process owner has accountability for the quality and integrity of the entire business process including the process
data throughout its data lifecycle, but may delegate data responsibility to data owner(s) at specified phases of the
data lifecycle.
As a further level of complexity, the business process itself may be divided into subprocesses. The process owner
may delegate some or all their responsibilities to sub-process owners, which may also include delegation of system
and data ownership. A hierarchy of process, system, and data owners is thus established, as depicted in Figure 2.4.
The subprocesses can each contain one or more computerized systems.
Figure 2.4: Simple Example Showing Business Process Divided into Two Subprocesses
Data ownership and system ownership creates a matrix of responsibilities as delegated data ownership may be
aligned with, for example, specific clinical trials, sites, or products, while the data may be processed in multiple
systems.
Data should never be without a designated owner at any point in the data lifecycle; effective handover of data
ownership responsibilities between individuals transitioning to/from an organization role (with designated data
ownership responsibility) is critical to successful maintenance of data integrity and data quality.
When considering the computerization of a complex process, the overall process owner should understand the
intended scope of the computerization and ensure the involvement of all subsidiary process, system, and data
owners.
It is important for an organization to establish and maintain a culture that supports data integrity and to have clear
policies for data governance. Data governance should address data ownership throughout the data lifecycle to make
certain data integrity risks are addressed throughout the data lifecycle. Electronic systems have become integral to
most business processes within the life science industry, and with this evolution comes the need to understand what
data is captured and who owns it. The concept of data ownership is not new and applies equally to paper data and
electronic data. The ease of movement of electronic data makes data ownership more complex.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 21
Data Integrity by Design
Data ownership is often defined at a company level, but is then refined to the system level (documented in the
roles and responsibilities section of the system Standard Operating Procedures (SOPs)) or even to a specific
project (documented in the quality and project plan) or clinical trial (documented in the trial master file or other trial
configuration documentation). The data owner must understand the business value of the data and often has a
position of responsibility within the business unit generating the data. Individual organizations should allocate roles
and responsibilities based on organizational structure and the specific system involved. The organization needs to
establish policies and procedures to support the role of data owner and ensure it is appropriately resourced for each
business process or system.
Data ownership can be a subset responsibility of the process owner (see ISPE GAMP® RDI Good Practice Guide:
Data Integrity – Key Concepts [16]). As defined within ISPE GAMP® 5 [9], the process owner is:
“the person ultimately responsible for the business process or processes being managed. This person is usually
the head of the functional unit or department using that system, although the role should be based on specific
knowledge of the process rather than position in the organization. The process owner is responsible for ensuring
that the computerized system and its operation is in compliance and fit for intended use in accordance with
applicable Standard Operating Procedures (SOPs) throughout its useful life….Ownership of the data held on a
system should be defined and typically belongs to the process owner.”
The data owner is responsible for the integrity of the data, and for defining the quality requirements for the data and
how the data is to be used. Because electronic data is easily shared, there is a need to make sure that the data
owner maintains awareness of the use of the data. In the case of clinical trial data and medical records, the data
owner must protect the interests of the patient and their confidential data. Non-GxP regulations such as GDPR [17]
and HIPAA [18] may also impact patient data.
The data owner must be knowledgeable about how the data is used throughout the data lifecycle (data that may span
the organization), and how and where it is retained in a secure archive for the required period. Data ownership is
typically associated with a particular job title or function in the organization, and each new person to that job inherits
the data ownership. Where an internal reorganization impacts job roles and responsibilities, special attention should
be paid to ensure that data ownership is not lost or overlooked.
Where data crosses organizational boundaries within the process, data ownership may be transferred to a new
data owner, although the original data owner may always remain accountable for the original data generated within
their scope. For example, analytical data owned by the Quality Control (QC) laboratory may be incorporated into a
manufacturing batch record, which has its own data owner. In the event of an investigation or audit, any queries about
the analytical data within the batch record requires assistance from the QC laboratory data owner.
Uncertainty in data ownership can be introduced during mergers and acquisitions and should be resolved as part of
the integration process, see Section 3.7.
It is important to separate the data owner from the system owner, who is responsible for system maintenance roles
(for example, IT groups responsible for the operating system, application, and database), to prevent a conflict of
interest. With current working practices, interest in the data often spans different departments (e.g., IT security is
concerned about safeguarding the data, and the legal department may be concerned with data privacy.) Data owners
have a direct interest in the data and therefore should not have system administration and/or privileged accounts.
Many organizations create a data steward role for the technical people with day-to-day responsibility for the data. Data
stewardship activities may be embedded in the responsibilities of other roles, rather than being a new and specific
individual role. The Data Steward role is defined in the ISPE GAMP® Guide: Records and Data Integrity [8] as:
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 22 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
“A person with specific tactical coordination and implementation responsibilities for data integrity, responsible for
carrying out data usage, management, and security policies as determined by wider data governance initiatives,
such as acting as a liaison between the IT department and the business. They are typically members of the
operational unit or department creating, maintaining, or using the data, for example personnel on the shop floor
or in the laboratories who actually generate, manage, and handle the data.”
In IT Service Management (ITSM), a platform change (e.g., database migration or cloud hosting provider migration)
may often be considered an infrastructure change, which involves the system owner but not necessarily the business
process owner. At a minimum, the data owner (as delegated by the business process owner) should be consulted
before any change occurs and be kept informed through its implementation.
At all times, the data owner should know where the data resides, and the process followed to verify any data
migrations, considering the mechanism used, reliance on network bandwidth during migration, error reporting tools
used, any data transformation etc., such as non-Unicode™ to Unicode™ conversion, or database changes from
SQL to Oracle®. The level of verification should be commensurate with the risks of the infrastructure change and any
migration involved.
Data ownership may be simple when a single organization owns and uses the data throughout the data lifecycle, but
when the creation, processing, and/or reporting of data is contracted to a second organization, the concept of data
ownership becomes more complex. The contract giver or sponsor company legally owns the data (even if they only
use data in summary form), while the contract acceptor may be the entity responsible for the creation, processing,
and reporting of the data.
The contract acceptor should have a data steward to address the day-to-day data responsibilities within the contract
organization, and report all concerns and risks to the data owner within the contract giver company who is ultimately
responsible for data ownership. The contract acceptor may be responsible for the data throughout the data lifecycle.
All of these responsibilities and expectations must be clearly defined within the quality contract and/or technical
agreement and verified through the vendor qualification process. In an environment involving multiple organizations,
a robust communication plan is necessary to ensure potential data integrity issues are reported to the “legal” data
owner. In this context, potential data integrity issues are not limited to instances of suspected data manipulation but
also include cyberattacks or security breaches, and the report should address the impact of these events.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 23
Data Integrity by Design
3 Retention Strategy
A strategy should be developed to ensure the long-term retention of regulated or operational data output by a
business process. The retention strategy should focus on the management of the data throughout its lifecycle, not just
on the initial creation of data within the process.
Use of electronic records as the original/official records supporting business processes continues to increase.
Therefore, it is imperative to apply critical thinking to develop the retention strategy in the planning phase for a new
process or system. This will help ensure data is maintained in its dynamic format throughout the data lifecycle as long
as reasonably possible.
The risk is that as technology advances, old file types become obsolete, driving companies to:
• Maintain obsolete technology to continue to read and access the files, which can increase data integrity risks
• Migrate the data so that new technology can continue to read the record
There are risks associated with any strategy, and assessing those risks and planning a broad methodology for record
and data retention at the beginning of a system’s lifecycle allows the plan to be adjusted as technology changes or
as systems go through their lifecycle. As such, it is imperative for a company to have a strategy in place that will lead
their employees through the process; without this, the company increases its risk as the system moves toward its
end of life. The longer a system goes without a retention strategy, the greater the likelihood the company will lose the
ability to migrate or maintain records to the end of their retention period. Typically, as data ages, the need for it to be
processible decreases until at some point, a risk-based decision to change to a static format may be appropriate. This
is discussed in Section 7.2.
In order for data integrity to be ensured within the system and data lifecycles, a retention strategy should be crafted
during the design phase and ensure that data remains secure and protected from deliberate or accidental changes,
manipulations, or deletions throughout the retention period. The strategy for archiving GxP data also needs to be
planned, whether it is archived electronically or using a paper-based system.
An electronic archiving system, both hardware and software, must be designated as a computerized system and
validated to ensure the integrity of the data, and make certain that it is accessible and readable. See Appendices O2
and O3 for more details on archiving requirements. The data governance for the records in any new computerized
system and the archive solution should be aligned early. Where problems with long-term access to data are
envisaged, or when computerized systems must be retired, mitigation strategies for ensuring continued readability
of the data should be established. These may include virtualization of the original computerized systems, which is
discussed in Appendix O5.
As regulated companies increasingly utilize cloud-based services, a SaaS provider may need to offer appropriate
strategies to ensure adequate controls are in place for data retention. See ISPE GAMP® Good Practice Guide:
IT Infrastructure Control and Compliance (Second Edition) [19] for additional considerations around cloud-based
services and data retention.
The high-level elements of the strategy should be defined in a corporate data governance document as discussed
in ISPE GAMP® Guide: Records and Data Integrity Section 3.3.6 Policies and Standards [8], listing the necessary
elements, plus ones to consider, for the record retention of a system or project. Refer to Appendix O1 – Retention,
Archiving, and Migration [8] of the same Guide for information on how to document the retention strategy for a
particular system.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 24 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
The key points to a retention strategy are knowledge and consideration of data classification and intended use,
readability, availability, access, and data ownership. A good place to start when developing a strategy is to document
this information in the form of a risk assessment with the goal of documenting the business needs, associated risks,
and current mitigation plans. As new information becomes available over the system lifecycle, the assessment can
be updated easily to reflect such changes. See Figure 4.1 Data Integrity by Design Process Flow Diagram for an
overview of the assessment during the system’s lifecycle. This approach keeps the retention strategy up-to-date
and representative of the current needs for the system against the ability and feasibility in technology. The following
sections provide a more thorough breakdown of the details to consider.
A regulated company generates many records with varied retention requirements during the course of its business
processes, for example:
• Regulated records retained beyond the minimum for business reasons (see Section 7.4.1)
• Records of scientific innovation or invention (including drug development data) that may be needed in perpetuity
for legal reasons
• Financial records falling under the Sarbanes-Oxley Act 2002 (US) [20] (or similar legislation in other countries)
• Records containing personally identifiable information (for example, human resources records and patient
enrollment records for clinical trials)
• Operational records needed solely for process improvements and business decisions (see Section 1.6.5).
Each of these records are required to be available for a different duration depending on what, if any, legislation
applies to that data classification (see Sections 1.6.5 and 4.4 for simple classification, and the more detailed example
in ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts, Appendix 8 [16]). Even within the regulated
records category, different retention periods are specified. Additionally, there may be legal and business reasons for
extending retention periods, including possible future use of the record.
It is therefore important to identify what records fall under what legislation, and the applicable retention periods.
Where different records have drastically different retention periods (they can vary from 6 months to 30 years), it is
important to segregate their storage. For example, a database containing a mixture of data (GLP, GCP, GMP, GDP,
and business records) needs to be kept for the duration of the longest retention period of any of the records, giving
rise to long-term storage of an excessively large amount of data, much of which is no longer required.
Where an organization is creating data that is relevant to a number of other entities (e.g., training records within
a Contract “x” Organization (CxO)), it is important that the organization has a clear approach for identifying and
managing the data – both for operational use and when archiving/destroying data – for the individual entities.
Implications concerning third-party data are discussed in Appendix M3.
The retention period, if not specified, should be sufficient to support any challenges to data integrity. In the absence of
a required retention period, the final disposition should be documented.
Records should be protected to ensure they are accurate and readily retrievable throughout the retention period.
Readability is implied in retention period and discussed in Section 3.2. Readability of audit trails and other metadata
is required under predicate rules to preserve the GxP content and meaning of the record, and should be retained and
available for review and copying for at least as long as the original record. During system planning and selection, it is
essential to include a user requirement for the ability to archive the complete data including audit trails and metadata.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 25
Data Integrity by Design
Failure to meet this requirement could result in either an inability to include the audit trails and metadata in the
archive, or a need to preserve the entire original database to allow recreation of the audit trails.
There may be business needs that require data to be retained beyond the GxP-mandated retention period; these
business needs are discussed in Section 7.4.1.
Appendix O4 Tables 16.1 and 16.2 contain examples of retention requirements and periods. Appendix M3 discusses
the important considerations around managing data generated by third parties, for example, CROs.
3.2 Readability
Data should remain readable during the complete data lifecycle, that is:
• For dynamic data, retained in such a way that it can be further interacted with if required by the intended use
and/or specific predicate rule requirements (see Section 4.4 on intended use)
With paper records, no special tools are required to read the data. If the paper record has endured (survived), it can
be read as a stand-alone complete record. It can be converted to a static electronic image, that is, a PDF or JPG
file by a scanning process, and easily and reliably read on any system having PDF or image viewing software. This
should be a verified process to ensure the electronic copy is a true copy, and is discussed in detail in Section 7.2.1.
In a computerized system, original records generated electronically often require additional resources such as
databases to supply the record content, along with associated metadata to provide the context and meaning of the
record. Typically, data created in an electronic system is readable by that same system, and often readable only by
that system, due vendor-proprietary formats and database structure.
If the record is static in nature (i.e., does not require user intervention or processing to be meaningful), then it can be
converted to a PDF or similar format by a verified process to provide ease of readability. Section 7.2.1 discusses the
differences between static and dynamic data and implication thereof.
Changes to the software application such as a software upgrade or database structure modifications can result in a
loss of readability for data created in earlier software versions. For readability considerations around inactive data
stored in a dedicated archiving system, see Section 7.2.2.
It is recommended to ensure software is backward compatible prior to the upgrade. If the software does not provide
backward compatibility, then confirm that the data can be migrated as and when required. Controls around data
migration are discussed in Section 3.5.3.
Changes to a GxP computerized system should be implemented under formal change control and configuration
management, which is detailed fully in ISPE GAMP® Good Practice Guide: A Risk-Based Approach to Operation of
GxP Computerized Systems Chapter 10 [21]. When reviewing a change proposal for a computerized system, it is
important to review vendor release notes for:
• Changes to the data structure or file format in the new version, and vendor claims around record compatibility
between old and new versions
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 26 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
• Changes to the storage approach, e.g., change in the database version or even to the database type (Oracle®
versus mySQL™, etc.)
Where there will be an impact on the readability of the existing data in applying the software upgrade, remediation
activities are needed, such as:
• Using vendor tools (if provided) to migrate the existing data into a newer format readable by the upgraded
software, and verifying the migrated data remains complete (including metadata), accurate, and consistent
compared to the original records. Data migration testing, including advice on statistical sampling, is covered in
detail in the ISPE GAMP® Good Practice Guide: A Risk-Based Approach to Testing of GxP Systems (Second
Edition) Appendix T9 [22].
• If a vendor tool is not available, determine if there is a commercially available tool that can provide the needed
data migration.
Even where there is no explicit vendor statement on the impact to the data readability, it is still recommended to verify
that a sample of existing data is readable (and, for dynamic data, can be interacted with) after the upgrade.
Data sampling should be risk-based and leverage statistical approaches to ensure sufficient data is sampled to
identify (and correct) any missing or corrupt data. This is in addition to, and separate from, validating the change to
the computerized system.
Where there is an upgrade to a computerized system containing master data (e.g., manufacturing recipes, customer
contact details), formal verification may be required to confirm that the data is unaffected and remains aligned across
integrated systems.
In the event of a database change, an extensive software upgrade with no migration tool or backward compatibility,
or at system retirement, an evaluation should be made of the feasibility of maintaining a functional copy of the original
computerized system (either as a physical or virtual machine) that will be used to read and interact with existing data
throughout the remaining retention period for that data.
• Newer Operating Systems (OS) have better backward compatibility features that may be able to run legacy
software without much trouble if it was produced in the last several years and if the data storage remains in the
same structure (e.g., flat file throughout or the same relational database).
• For laboratory and process control software, it is important to determine if the software application can be run
without connection to instrumentation and process equipment.
• The practical implications and limitations of maintaining a functional copy of an obsolete system using virtual
machine technology, as covered in detail in Appendix O5.
This should be combined with an assessment of the risk of not keeping the records in a dynamic format (see Section
7.2.1). At some point, as discussed in Section 7.2, maintaining readability of the old dynamic records may become
unfeasible and conversion to static format is inevitable.
One risk associated with long-term readability of records is personnel turnover. The personnel that generated and
interacted with the data may no longer be with the company and operation of the legacy system may be challenging
with inexperienced operators. It is advisable to archive training materials with the software so there is at least some
information available when accessing the old data by inexperienced personnel. Any information that can be provided
to a new user about the data also helps the data owner answer questions during a regulatory review.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 27
Data Integrity by Design
When selecting a new system for use with GxP data, incorporate the retention and archiving requirements into the
user requirements.
• The available functions to export and report data out of the system for archiving purposes
• The system’s ability to mark data as “Archived” and how archived data will be removed from the originating
system
- Is the data automatically upgraded as needed to suit the new software version as part of the system
upgrade?
- Does the data have to be revised or modernized by a separate step to keep it readable?
• The history of earlier versions of the system to understand the depth and frequency of changes to data structure,
file formats and databases, and also to assess the level of backward compatibility routinely maintained in the
system.
These could be addressed in a vendor evaluation; minimally, there should be a discussion with the vendor to
understand their vision and plans for the product going forward.
When a system is upgraded (upgrade of the application or operating system, or implementation of a later-model
system), the existing data may be automatically revised to be readable within the system, either during the upgrade
or when transferred into the upgraded system. This relies on the capability of the new system to read data created in
the older version, possibly by converting it into the newer format. This type of transfer is the simplest form of migration
but still requires a risk-based level of verification to confirm the integrity of the data. Special attention needs to be paid
to any changes in the processing algorithms that could produce a different result.
A system that internally manages its data revision during an upgrade reduces the effort required to keep data
readable in the system through future system upgrades during its operational life. A system with the inherent ability to
read data created in all earlier versions of the system, including data restored from the archives, greatly simplifies the
ongoing data retention burden.
The capability to export data for archiving and retention is critical when using SaaS solutions hosted off-premise.
There needs to be a mechanism to extract the data if and when the regulated company moves away from that
solution; this requirement should be defined in the Service Level Agreement with the SaaS provider. Many of the
considerations discussed in Appendix M3 – Third-Party Data may also be relevant for SaaS solutions.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 28 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
3.3 Availability
The need for the availability of records is determined by the current state of the records: active, semi-active, or
inactive.
Active records start from the first creation of data. Generally the record remains active through the processing
and review, reporting and use phases of the data lifecycle, where access is needed frequently to progress
the data to complete its business purpose, that is, its use. Active records are introduced in Section 3.3.1 and
discussed in detail in Chapter 6.
Semi-active describes the situation where the record is no longer routinely accessed, but may be needed
periodically such as for use in trending and control charts, and during annual product review or complaints
investigations. Semi-active records are introduced in Section 3.3.2 and discussed in detail in Section 7.1.
Inactive records are records no longer expected to be accessed other than for exceptional circumstances, for
example during an inspection or investigation. At this point in the record lifecycle, the records will be infrequently
or minimally accessed for the remainder of their retention period. Inactive records are introduced in Section 3.3.3
and discussed in detail in Section 7.2.
At the end of the retention period, the inactive record can be destroyed as long as there is no legal hold or
business reason to keep the data. Destruction is covered in Section 7.4.
There are no hard-and-fast rules as to when a record should transition from active to semi-active, or from semi-active
to inactive; it is a spectrum of decreasing need to frequently access the data and a desire to maintain performance in
the live system(s). Points and can move within the graph based on the regulated company’s choices.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 29
Data Integrity by Design
It should be noted that a record entering the semi-active or inactive state should not impact its static or dynamic state
– the regulatory expectations to maintain dynamic data in dynamic format remain.
Data stored in the computerized system that originally generated the data (the “live system”) is immediately available
to authorized users of that computerized system (although there may be restrictions on which users can access what
data – see Section 3.4). This immediate availability is efficient where data is active (being generated, processed,
reported, reviewed, and/or used) and being used by a range of interested parties (production operators generating
data, production supervisors collating and reporting data, quality unity reviewing data, authorized person releasing
the batch based on the data). Active records are discussed in detail in Chapter 6.
As it moves through the lifecycle, data may be needed less frequently; for example, batch release data from the
previous month may not be needed until the annual product review when it will be trended with other release data. In
this semi-active phase, where the data is needed infrequently, the options available to an organization are to:
• Leave the data in the live system but prevent further changes by segregating the semi-active (or inactive) data
from the active records, that is, the data is restricted as read-only or is rendered inaccessible to the majority of
users
• Look at off-line storage solutions, for example, electronic archives as discussed below and accept the
consequently slower access to the data inherent in moving it out of the live system. This should be balanced
against the need for data to be rapidly available in the case of a recall scenario.
Based on the business process and regulatory requirements, a read-only or reduced access approach may be used
to limit the potential for data changes, in which case ideally the system should offer incremental levels of control that
can be implemented over time, such as:
1. No new data can be written into the data folder but allowing users to work with existing data
2. Progressing later to preventing further processing or editing of the data, but the data is still available for viewing
and approval
3. Finally, after the business process cycle time is complete, moving to fully read-only
There may even be a “no access” level of control whereby the data can no longer be viewed, pending its transfer to
the archiving system.
It is important to limit which users can initiate the restrictions, with potentially a smaller subset of users able to
reverse the restrictions in case the data is needed for an investigation or audit. This is particularly important for
chromatography data where an inspector can request the data to be reprocessed to assess the impact of alternate
processing parameters.
When an organization determines that the data is no longer needed to be readily available for instant access, or
the amount of data in the live system is degrading system performance, moving the data from the live system to an
electronic archive is recommended. This brings multiple advantages:
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 30 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
• The amount of data in the live system does not become sufficiently large to impact system performance (e.g.,
long search times, delays in reading data from a database)
• The time required to back up the live system remains manageable (e.g., less than 24 hours, thus allowing daily
backups)
• The storage space required for backups of the live system is minimized
• The archived data is only accessible to a limited number of independent personnel and no longer available to the
originating users, thus reducing the risk of alteration during the retention period [23]; this is a specific regulatory
requirement for some areas. (See Appendix O4 for examples of requirements for record retention.)
There are more general regulatory requirements (listed in Appendices O2 and O3) for the use of archives in support
of long-term data retention. In its simplest form, archiving electronic data requires the creation of a complete and
accurate copy of the data (including metadata to preserve GxP content and meaning) in an off-line storage location.
Once the off-line copy has been verified as complete and capable of being restored into the live system when
needed, the original record can be deleted from the live system. As with any other regulated data, provision should be
made for backing up the archived data to guard against data loss.
Where validated automated archiving processes cannot be provided, manual archiving may be required. Manual
archiving can only be controlled by procedure/protocol and requires verification that all the required records are
archived. Manually administering the electronic archiving process, especially when dealing with flat files, carries an
inherent risk that more data may be deleted than was copied to the archive location, for example by choosing to copy
only the passing results for long-term retention. For this reason, validated automated electronic archiving systems are
preferred.
See Section 7.2.2 for a discussion of the important features for an electronic archiving system, including advanced
indexing capabilities to facilitate record searching. Archive requirements are included in Appendices O2, O3, and O4.
If a manual archive process is all that is available for a system, an archive protocol must be written to ensure all data
is appropriately archived.
Some companies choose to maintain the data in a secured area of the live system in a “no access” state for the
full retention period. If that approach is used, the data may be secure, but the advantages listed above will not be
achieved.
3.4 Access
Inherent in considerations regarding the availability of the data is the consideration of who should be able to access
the data during its lifecycle. Access controls should ensure that only authorized personnel can access data. This may
be controlled by business role and organization.
A detailed discussion of security management for computerized systems is contained in ISPE GAMP® Good Practice
Guide: A Risk-Based Approach to Operation of GxP Computerized Systems Chapter 15 [21] and in ISPE GAMP®
Good Practice Guide: IT Infrastructure Control and Compliance (Second Edition) Appendix 5 [19] for infrastructure
security.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 31
Data Integrity by Design
When active records are stored in the live system, access controls in that system are based on the principles of:
• Segregation of Duties: This is the general concept of having more than one person required to complete a
process or workflow as an internal control intended to prevent fraud and error. For example, segregation of
duties requires that authorization to generate data should be separate to authorization to verify data (e.g., a user
can review their peers’ data but not their own), and that users conducting normal work tasks in the system should
have access rights to do so while users with elevated access rights (e.g., administrator privileges, engineer roles)
should not conduct normal work tasks on the system [24].
• Least Privileges: A further aspect of assigning access to functionality is the principle of least privileges, whereby
each user is given sufficient access for their routine tasks and no more. Senior users may be given access
to higher risk functionality (e.g., creating new data storage folders or adding new users), with the highest risk
functionality (e.g., system administration, database administration or other enhanced access) reserved for
users outside of the department’s reporting structure (for example, members of the corporate IT department).
Careful consideration should be given to how and where passwords are managed, including resetting forgotten
passwords; there is potential for fraud if someone in the business process can reset a password and then
conceivably execute tasks under someone else’s account.
Assigning access controls, including database administrator access, is discussed in detail in ISPE GAMP® RDI Good
Practice Guide: Data Integrity – Key Concepts Section 4.5 [16].
In addition to controlling which functionality in the system is accessible to a particular user, there may also be a need
to control what data can be accessed by that user. This is especially relevant where a computerized system is shared
across multiple business processes and/or organizations, for example sharing a chromatography data system across
both the research and development and QC laboratories, or where a system contains confidential patient information.
In an electronic archiving system, access to the archived data can be restricted to a limited number of data
management personnel and no longer available to the originating users thus reducing the risk of data alteration
during the retention period [23].
If there is a need to restore the archived data to the original system or a functional copy thereof, special care must be
taken to manage any access to or editing of the restored data. This is discussed in Section 7.3.
System lifecycle considerations for archive systems are covered in Section 7.2.2; archiving requirements are
discussed in more detail in Appendices O2 and O3.
Logical security (system access controls) alone are not sufficient [11, 25]. Physical access controls such as perimeter
security, building and facility access control should be in place in addition to logical security. The storage location
must be physically secured against unauthorized access, for example, by instigating key card door controls on the IT
server room such that only authorized IT personnel can gain access to the system server or archive storage media.
It is important to protect records supporting GxP decisions to ensure that the information is secure from unauthorized
modification or deletion. The procedures and processes implemented to address risks to active records should be
commensurate with the importance of the data, the process, and complexity of the system. All data must be in a
format that permits secure storage and available for review and reporting.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 32 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
As discussed in the ISPE GAMP® Good Practice Guide: IT Infrastructure Control and Compliance (Second Edition) [19]:
“Lack of security may compromise availability of applications and services, record integrity and confidentiality,
reputation with stakeholders, and may lead to unauthorized use of systems that would ultimately impact product
quality.
• Availability: ensuring that authorized users have access to information and associated assets when required
• Integrity: safeguarding the accuracy and completeness of information and processing methods
• Confidentiality: ensuring that information is accessible only to those persons authorized to have access.”
There are processes, controls, and procedures necessary to protect the safety, confidentiality, integrity, and
availability of the data in the operational environment. Such safeguards include (but are not limited to):
It is important to develop the backup approach and disaster recovery strategy based upon the risk of the system
and the criticality of the process and data. Backup and restore, and disaster recovery as part of business continuity
management are discussed in detail in Chapters 13 and 14 of the ISPE GAMP® Good Practice Guide: A Risk-Based
Approach to Operation of GxP Computerized Systems [21].
Backup and restore procedures ensure that the accurate and reproducible copying of digital assets, including the
data and software, are protected against loss of the original data so that the data can be restored, if required due to a
disaster.
Conventional backup processes typically involve creating a duplicate copy of system data on a fixed-time basis, such
as a nightly scheduled backup. Other controls that can be implemented utilize sophisticated network architectures
including database mirroring and redundancy across diverse and geographically dispersed datacenters, which can
result in close to 100% uptime. Synchronous replication – where data is written to the primary storage and a remote
replica simultaneously ensuring that both versions are always identical and where one replica is always available – is
another approach to lessen the risk of loss of data. Data transfer speed should ensure that replication is performed at
a necessary rate to prevent degradation or loss.
The Business Continuity Plan and the Disaster Recovery Plan collectively address how to continue or resume
operation after a disaster. A disaster is defined within the ISPE GAMP® Good Practice Guide: IT Infrastructure Control
and Compliance (Second Edition) [19], as:
“Any event (i.e., fire, earthquake, power failure, etc.) which could have a detrimental effect upon an automated
system or its associated information.”
The Business Continuity Plan (BCP) describes the steps the business must follow to restore the critical business
process following a disruption and addresses what the business must do to continue without the computerized
system(s).
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 33
Data Integrity by Design
A Disaster Recovery Plan (DRP) is defined by the ISPE GAMP® Good Practice Guide: A Risk-Based Approach to
Operation of GxP Computerized Systems [21] as:
“A sub-set of Business Continuity Management that focuses on regaining access to an IT system, including
software, hardware, and data following a disaster.”
Disaster recovery must address the provision for replacing or restoring the computerized system. The DRP
must provide details of how to restore the complete computerized system including the infrastructure, hardware,
application, and database. The recovery site is usually chosen a geographical distance from the main site in case
the disruption is due to a natural occurrence such as an earthquake or flooding. If there is a need to restore specific
instruments, as in the case of laboratory systems, the DRP should contain procedures for obtaining replacement
instruments.
It is important to document how much downtime is acceptable for a specific computerized system/business process.
The process owner must define the allowed maximum system down time expressed as the Recovery Time Objective
(RTO) and the maximum loss of data that can be tolerated by the business process defined as the Recovery
Point Objective (RPO). This information is used by the IT specialist to design the necessary backup process and
architecture to support these requirements. Companies must ensure response times are described within their
internal or external Service Level Agreements (SLAs).
All of these processes, controls, and procedures should be periodically verified and may be assessed during periodic
review to provide assurance that they still meet organizational and regulatory expectations, including doing trial
restores of the backup data. Consideration should be given to perform walk-throughs and mock-executions of the
DRP and BCP.
The same data integrity requirements (including confidentiality and security) apply to systems managed internally by
the regulated company or externally. The expectations from the regulators are the same whether organizations use
physical infrastructure on-premise, virtualized servers on-premise, hosting at datacenters, or leveraging off-premise
cloud-based services. The regulated company has the ultimate responsibility for patient safety, product quality, and
data integrity, whatever and however the data is stored.
In cases where the system is supplied using cloud-based infrastructure (SaaS, PaaS, or IaaS), the responsibility for
the procedures related to disaster recovery are shifted in whole or in part to the cloud provider. One of the benefits
of a cloud-based infrastructure is its resiliency, that is, the ability of the service to respond to issues, although the
positive achieved in resilience is often challenged by a decrease in security controls. One of the advantages of using
a cloud-based infrastructure is system uptime, which providers often report as 99% to 100% [26].
The ISPE GAMP® Good Practice Guide: IT Infrastructure Control and Compliance (Second Edition) [19] provides
some insight into the advantages and risks of using various infrastructure services.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 34 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
• High availability
“Cloud computing introduces a flexibility in resource capacity, but also introduces new risks to regulated
companies. These risks include:
The use of infrastructure outsourcing has also presented compliance challenges including security, availability,
integrity, confidentiality, and should the contract be terminated, data deletion or an inability to access the data.
The two critical activities/considerations for the users of cloud-based infrastructure services are the supplier
assessment and the contract or Service Level Agreement (SLA).
As with any supplier assessment, it is critical to assess the services delivered by the cloud provider and the controls
and procedures established to ensure the operations. It is important to define the expectations of the regulated
company and the corresponding responsibilities of the provider. One of the considerations is the notification of
changes to the hosted service, including upgrades to infrastructure, platform, or software application. At a minimum,
the notification should include details of the change to allow the regulated company to evaluate any potential impact
to their systems and data.
As part of this assessment, it is important to evaluate the disaster recovery process at the supplier including
review of the DRP and the frequency and conditions of the disaster recovery testing. The DRP should address the
communication strategy during a disaster. It is important to document how a disaster at the cloud provider will be
communicated to the system users.
It is also important to determine if the cloud-service provider subcontracts any part of their service. If this is the case,
such as a SaaS provider using a third-party PaaS or IaaS provider, it is important to understand how the cloud-
service provider evaluated their subcontractors and how problems with service at the subcontractors will be prioritized
and communicated.
The ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts [16] states:
“Even if the third party is not subject to healthcare regulations when they enter into a Quality Contract/Technical
Agreement with a regulated company, they must be aware of the requirements of the environment in which they
are working.”
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 35
Data Integrity by Design
3.5.3 Migration
The MHRA ‘GXP’ Data Integrity Guidance and Definitions [12] provides significant guidance on controls required
for data transfer and data migration. Fundamentally, any transfer or migration process must be validated and must
preserve the GxP content and meaning of the data.
The ISPE GAMP® Guide: Records and Data Integrity Appendix O1 [8] discusses the option of storing electronic
records in formats other than the original record format. Appropriate points when considering record transfer or
migration include:
The decision to convert electronic records into a different format (an alternative file type, or dynamic to static) should
be based upon a documented risk assessment considering the requirements for record retention, access, and use.
ISPE GAMP® Guide: Records and Data Integrity Appendix O1 [8] includes a comprehensive table, Table 18.1, that
describes multiple risk factors for the conversion of electronic records to an alternative format.
Transferring data between different applications will also likely require a conversion of format. When this is needed
for active data, discussions with the vendor of the target system will determine if the entire active data set can be
converted directly:
• Via an export from the current system and import into the new application
One challenge with retention of records is that the records may be in a format proprietary to an individual vendor and
therefore the data may only be read in the originating application.
The conversion of inactive records to a vendor-neutral format was introduced in the ISPE GAMP® RDI Good Practice
Guide: Data Integrity – Key Concepts Section 4.3.6.3 [16]. The vision of vendor-neutral formats has been a desire
for industry for many years, especially in the realm of analytical laboratory data. Leveraging a vendor-neutral format
could help to break the reliance between the data lifecycle and the system lifecycle. Data would no longer be
exported from one system and imported into another, but exist in file structures that live in a common data lake with
the ability of various applications to open, view, interrogate, and mine the data for new scientific insights, as well as
provide the ability to easily share data between collaborating companies. This is an ongoing industry initiative and not
yet fully implemented or available.
Any “temporary” storage locations of data need to be tightly secured and controlled. There should be a record of
transferring data between storage types, formats, or computerized systems. A Data Migration Plan should define
the scope of the migration, the personnel and data involved, and the strategy to be applied. The ISPE GAMP® Good
Practice Guide: A Risk-Based Approach to Operation of GxP Computerized Systems Chapter 17 [21] examines
migration in detail and offers guidance on creating a Data Migration Plan. Migrating data of questionable integrity
does not improve the integrity of the data but may protect the data from further risks going forward.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 36 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
As with all migration/conversion activities, it is essential to consider if all data, metadata (including audit trails and
review and approval signatures), and relational traceability between records is preserved after migration to the new
application. Care must be taken when migrating IT systems with databases to ensure that the associations are not
disrupted and that the data viewing capabilities remain. Risk assessment and management for any changes in data
or metadata should be evaluated and documented. It is important that migration or data transfer processes are
designed and validated to confirm that data integrity is preserved (the accuracy, completeness, content, and meaning
of the data) following a risk-based approach.
When the migration/transfer involves cloud-based solutions, there are additional considerations required as described
in the ISPE GAMP® Guide: Records and Data Integrity [8]:
• Understand and accept which aspects of control are being delegated to a provider
• Agree on the need for supplier support during regulatory inspections, depending on the architecture and
services provided”
Often using PaaS or SaaS solutions, system upgrades may be driven by the service provider and not by the
regulated company. If the upgrade forces the need for data migration, a risk-based approach should be undertaken to
managing the migration. Appendix O5 offers additional considerations around off-premise solutions.
As stated in the ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts [16], system retirement:
“involves decisions about data retention, migration, or destruction, and the management of these processes.
Retiring a computerized system has major implications about how the data created in the system during its
operational life remains available, enduring, and readable in the remaining phases of the data lifecycle.”
• What data in the system needs to be retained after the system retirement (it may need administrator access to
view all data within the system)?
• What data is within its retention period that needs the system to read and display the data?
For dynamic data, a functional copy of the legacy software may be needed to read or interact with the data if the data
has not been migrated to a format readable in a current system. This is discussed more in Section 3.2.2, with details
around maintaining legacy software covered in Appendix O5.
Appendix M10 of ISPE GAMP® 5 on System Retirement [9] presents an overview of the system retirement process
including data integrity considerations.
Managing system retirement is comprehensively discussed in the ISPE GAMP® Good Practice Guide: A Risk-
Based Approach to Operation of GxP Computerized Systems, Chapter 18: System Retirement, Decommissioning,
and Disposal [21], which contains a detailed process flow diagram showing the critical activities and records to be
produced (see Figure 3.2), plus a RACI matrix (Responsible/Accountable/Consulted/Informed) of the roles and
responsibilities involved.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 37
Data Integrity by Design
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 38 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
The primary consideration during system retirement is to ensure that data and metadata generated and stored in
a system are retained and readily available and readable as defined in the system retirement plan. The system
retirement process should have been considered during the creation of the organization’s retention strategy. The
strategy should be based on risk, taking into account regulatory expectations and business requirements, data
criticality, and the complexity of the computerized system. The strategy should require the identification of record type
(GMP, GLP, GCP, etc.), record retention period, and archival and retrieval processes for each system.
The 2010 ISPE GAMP® Good Practice Guide: A Risk-Based Approach to Operation of GxP Computerized Systems
[21] remains valid in every respect, but consideration of subsequent regulations and guidance on data integrity,
specifically expanding roles and responsibilities to include data owner and data steward, generates a slightly modified
RACI matrix as presented in Table 3.1.
Data Owner R • Responsible for preserving the quality and integrity of data and documentation
related to the system being retired and ensuring compliance with relevant retention
policies
• Consulted on content of the Retirement Plan
Quality Unit C • Consulted on content of Retirement Plan for regulatory and compliance aspects
• Responsible for approval the Retirement Plan and Report
Archivist (SME) R • Consulted on content of Retirement Plan for organization records retention policies
• Responsible for executing the archive aspects of the Retirement Plan
• Acts as Data Steward for archived data
• Responsible for preserving the quality and integrity of data and documentation
related to the retired system and ensuring compliance with relevant retention
policies
Data Steward R • Responsible for ensuring that good data integrity practices are followed during the
planning and execution of the retirement, decommissioning and disposal phases
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 39
Data Integrity by Design
Prior to beginning the retirement process, there needs to be a documented risk assessment to understand the
concerns and difficulties to be addressed. The retirement process should be defined, and any transfer or migration
process to be used validated, to ensure that no data or metadata is lost or compromised during the process. The
records should be copied and verified as complete and accurate before the destruction of the original equipment. It is
essential to ensure that the metadata, such as the audit trail or other information has transferred completely, as loss
of such information permanently compromises the integrity of the copy of the record.
Transferring inactive records that have already been archived from the system into a secure off-line archive location
may require additional steps, such as restoration back to the original application prior to conversion and transfer. This
is common when the inactive records have been archived in the original existing format, which is likely to be vendor
dependent. Therefore, it is critical to plan for this migration before decommissioning the original computerized system
application.
If some of the data has reached the end of its retention period and can be discarded there should be a clearly defined
process requiring data owner, process owner, and quality (and maybe legal) approvals prior to disposal. This requires
the ability to segregate such data from the rest of the data without creating issues and that the date of data creation
is stored with the record. Naming conventions that include the date of creation may be beneficial because when data
is migrated, the file creation dates may be reset to the date of migration. Data destruction is discussed further in
Section 7.4.
When a system reaches the end of life and is to be decommissioned, a retention strategy is needed for data that has
not reached the end of its retention period. As discussed in Section 3.2.2, ideally, dynamic data should be maintained
in a dynamic format, but sometimes this is not possible. If data is required to be retained for decades (e.g., traceability
records for tissue donors, as contained in Appendix O4), there may come a point where a risk-based decision to save
it in a static format may be justifiable (i.e., the need to sort or reprocess it is very unlikely).
For example, a risk assessment may find that there is little or no foreseeable need for reprocessing after 10 years,
while retaining a computer (or even a virtual environment) with an unsupported operating system and application gets
increasingly problematic. It is important to consider and document the risk of creating static records from the dynamic
records [12] and to ensure that the records are still able to support the reconstruction of the activities performed and
decisions made. This approach should not be considered for active records. This is further discussed in Section 7.2.
Information stored in PDF files may be substituted and transformed when transferred to a new server if the fonts were
not embedded when the PDF was generated. Such substitution or transformation can alter the record permanently
and may alter the meaning, such as when symbols are transformed or changed to standard fonts, rendering the
information useless if it becomes unreadable. It is important to consider this during the initial transfer plan.
Another significant problem with PDF files is that they are limited to a static representation of the data and may not
contain the complete and accurate record of what was originally recorded by an electronic system; this is extensively
discussed in Section 7.2.1. Finally, PDF files also have very limited searching and indexing capabilities, and indexing
within a document management system is needed to ensure the records can be found when needed.
Readability of the retained data depends on the format of the data. If the data has remained in a vendor-specific
format, a functional copy of the legacy software may need to be retained to allow for data readability (as discussed in
Section 3.2.2 and Appendix O5). Training and operating materials should be archived along with the legacy system to
assist access by inexperienced personnel.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 40 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
The scope of the system retirement strategy should include everything from simple devices and non-networked
systems through to and including enterprise systems. Different scenarios relating to system retirement are described
below. Note that for Step 1 these are broken down by system types, with common activities listed first, and system-
specific activities listed under system type.
The system is removed from active operations, normal end user access is withdrawn, and interfaces with other
systems are deactivated. SMEs involved in decommissioning activities are allowed access and permissions sufficient
to perform their allotted tasks. During the development of the retirement plan, it is essential to consider all associated
records, qualification data, calibration records, etc., as well as the regulated data and metadata required to be retained.
System Types
System not connected to a computer: This system generates electronic data but does not permanently store the
data (titrator, osmometer, etc.). The retirement plan should document the archival and retrieval of the temporary data
stored on the system. When retiring this system, no additional data will be generated from the system. Data typically
generated from these systems is static in nature and therefore may be retained in printed or electronic format. These
files are often saved in a read-only format (e.g., PDF) for ease of archival. Appendix D2 discusses some of the
implications of managing the data during the use of these systems.
System connected to a non-networked computer: Systems are connected to the computer for system control
and data acquisition. Data from these systems use temporary media to transfer data to a more permanent secure
location. When retiring such a computerized system, no additional data will be generated from the system. Data
typically generated from these systems is dynamic in nature. The retirement plan should document the archival
and retrieval of the data stored on the system and permanent storage location. This plan should allow for data to
be maintained in its original dynamic format. In most situations, it is necessary to retain a copy of the application
software to retrieve data in its dynamic format.
System connected to a networked computer: These are systems connected to a computer for system control and
data acquisition, where the computer is connected to a network allowing direct data transfer to a secure location.
When retiring such a system, no additional data will be generated from the system. Data typically generated from
these systems is dynamic in nature. The retirement plan should document the archival and retrieval of the data stored
on the system. This plan should allow for data to be maintained in its original dynamic format. In most situations, it is
necessary to retain a copy of the application software to retrieve data in its dynamic format should this be required. A
virtualized copy of the system can be retained for this purpose until the end of the retention period is reached – see
Section 3.2.2 and Appendix O5.
Enterprise Systems: Enterprise systems are typically used to meet the needs of multiple sites. When retiring an
enterprise system, no additional data is added to the system from the specified retirement date. An enterprise system
may be retired at a single site or across all sites:
• If only a specific site is retiring the system and it will remain in operational use at other sites, then data availability
is simplified since the system is still active.
• It is more complicated if the whole enterprise system is to be retired from all sites as alternative arrangements
will be needed to maintain data availability and readability.
• When one site pilots a new system, then there may be an interim negative impact on data quality and
nomenclature across the organization, as the new system may store data differently and with an enhanced
nomenclature compared to the existing system. This incompatibility continues until all sites have migrated to the
new system. See Sections 1.6.4.2, 4.7, and Appendix M2 for further explanation of data quality and nomenclature.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 41
Data Integrity by Design
The retirement plan should allow site data to be retrievable in its dynamic format with the permission of the
designated data owner. An enterprise system typically remains in use until data is migrated to a new enterprise
system or to a long-term archive. When retiring a complete enterprise system to move to a new one, no additional
data is added to the system from this point forward. The retired system typically remains in use until the data stored is
either migrated to the new system or has reached its record retention requirement.
Retirement of the Archive System: When an archive system is due for replacement (reaches the end of its lifecycle
or an improved archiving solution is found), special care must be taken to preserve the integrity of the data throughout
the process. If the archive system is connected to other systems, replacement interfaces need to be created to link
to the new archive and validated to ensure they are transferring complete data. Retirement of an archive system
requires all of the same considerations as retirement of any other type of system but is complicated by the diversity
of the records and the possibility that the new archive system does not support all of the file formats. This may be the
time to convert dynamic data to static data based on unsupported file formats. Refer to Section 7.2.1 for details about
this conversion.
Decommissioning is the controlled shutdown of a retired system. A system may be stored if required to be reactivated
at a later date, such as to retrieve regulatory data or results.
Data, documentation, software, or hardware can be permanently destroyed. Each may reach this stage at a different
time. Data and documentation should not be disposed of until they have reached the end of the record retention
period as specified in the record retention policy.
During mergers, acquisitions, and divestments, it is essential to plan and define the approach to managing the
impacted systems and data. A fixed timeframe may be imposed by legal contracts to review associated and impacted
data as part of the transition period.
• What systems are involved, which will be kept and which will be retired?
• Is a new system purchase required as part of consolidation or divestment? If so, this will need validation for
intended use before data can be imported.
• Can data generated by the same business process across the two companies be transferred into a single
system, e.g., all batch record data can be merged within the chosen Manufacturing Execution System (MES)?
• How will both the active data from the live system and the inactive data from an archive system be managed?
• What legacy systems are needed to read the archived data and how will they be managed?
• Who will be the data owners at the end of the transition phase?
• There may be changes in the underlying business processes during the transition period; how will these impact
the data flow ongoing?
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 42 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
For divestments, the transition period should be used to identify how to segregate data between the companies. An
independent party may provide the data segregation process and provide a dataset to the recipient. In that situation,
it is vital to identify the data requirements early, ensure that data segregation is performed at least twice, so there is
an opportunity to rehearse data segregation and verify the resulting data set, and address any issues before the final
cut of data is taken.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 43
Data Integrity by Design
This section presents an approach to gaining a more systematic and enhanced understanding of the data set/data
system under development with respect to effectively managing the data lifecycle for regulated records.
A process to achieve data integrity by design throughout the data lifecycle is presented schematically in Figure 4.1.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 44 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
This process supports the development, maintenance, and retirement of compliant data sets and data systems.
The data integrity by design process facilitates delivery of a design that ensures the computerized system(s) is/are fit
for intended use with respect to the data lifecycle and the business process.
The key development related activities for a process to be computerized are described or referenced in more detail
below.
• Define Existing Business Process: In addition to the process workflow it is critical to capture the business
process and end to end data flows (incorporating data transfer across interfaces) in order to provide assurance
that the full data lifecycle is understood, including high-risk areas and compliance gaps.
• Scope “To Be” Business Process: Use critical thinking to identify opportunities for end to end business
process improvement, with a focus on prevention and detection opportunities to minimize existing risks to data
integrity and data quality for regulated and other business-essential records.
Business process mapping is explained in detail in Section 4.2, while Section 4.3 introduces data flow diagrams.
Once the business process map and data flow diagrams have been generated, a business process risk
assessment (see Section 4.5) can be used to identify potential data integrity risks inherent in the process that
must be addressed as far as possible by the computerized systems selected.
• Develop Requirements: Critical thinking should be applied to analyze the regulatory requirements for data
integrity, and determine the necessary and most effective controls to meet the intended use. The intended use
inherently impacts the controls needed; for example, a system intended for use in early stage drug discovery
may have a somewhat different set of data integrity requirements than one intended for use in product release
testing with potential direct impact on patient safety.
It is also important to realize that the computerized system is comprised of people, process, and technology,
and therefore the requirements should inherently address not just the system technical controls but also the
applicable data governance aspects (e.g., user access controls and segregation of duties) and the ability of the
system to be configured to meet the intended use.
Sections 4.4, 4.6, and 4.7 provide guidance around defining the intended use of the data, the data lifecycle, and
the importance of creating data nomenclature to be used throughout the process or organization.
Establish high-level functionality and data requirements for the “to be” business process and data lifecycle (see
the ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts Appendix 6 [16]). In purchased
systems, the user requirements may be a combination of the business process requirements and the vendor
specification.
The historical focus of a typical User Requirements Specification (URS) has been on the required functionality
of the system to support the business process, including the interfaces to other systems and the ability to collate
data from multiple systems into a cohesive data set. Data and data lifecycle considerations should be carefully
considered and specified with equal rigor in order to ensure that good practice expectations for data quality and
data integrity can be incorporated into the design for a new system at the earliest opportunity.
To properly develop the design, a more detailed process map and data flow diagrams may need to be developed
at this stage in order to facilitate the identification of data integrity requirements. As part of data integrity by
design, how the system is designed to be available to a user in their environment should be considered. For
example, identifying the physical location of terminals and determining if they are sufficiently robust (e.g., to
cleaning processes in the operating environment) and maintained so that the terminal is available to an operator
for recording entries contemporaneously.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 45
Data Integrity by Design
The preparation of the URS for a computerized system should indicate the priority of each requirement, for
example, “must have,” “should have,” and “could have.” As part of this process, data integrity-specific requirements
should be prioritized as a part of an iterative process of QRM (according to ICH Q9 (pharmaceuticals) [27], ISO
14971 (medical devices) [28], and ISPE GAMP® 5 [9]) in order to assess each requirement’s potential to impact the
quality and integrity of data within the system. Known data integrity trouble spots based on industry and company
experience (e.g., interfaces) should be explicitly examined for potential risks.
Data-related requirements should be clearly cross-referenced to process step activities and data flow
considerations, and to risks to public health, patient safety, and product quality. Requirements must be clear,
correct, and unambiguous. Cross-references to specific regulatory requirements (such as the predicate rules of
US GMP [29] as well as US FDA Guidance for Industry: Part 11 [30] and EU EudraLex Chapter 4 [31] and Annex
11 [11]) may also be helpful. An example of regulated record retention periods is contained in Appendix O4.
For computerized systems, data integrity requirements address availability, performance, resilience, security,
deployment model/architecture, and stability. These requirements are technical or procedural and include specific
controls such as second person verification, secure time stamp, compliant electronic signature, electronic audit
trail, backup and restore, etc. There may be other data integrity related considerations for a particular system,
for example, that static1 records are searchable or that dynamic records are able to be reprocessed by duly
authorized users.
Minimally, systems that generate static data in a flat-file format may need to be networked with centralized data
storage to protect the data. An automated and validated mechanism to capture the flat files into a database
system for increased security and traceability is desirable.
• Develop Design: Identify and evaluate any commercially available systems with the potential to meet the
requirements. Where there is no commercially available solution, establish a design project to develop a system.
The initial design of a purchased, commercially available system may be configuring a single system with a
workstation or may include combining multiple systems with a workstation to automate a process. The automated
system could be from a different vendor but configured to work with the same software, for example, connecting
a dissolution bath, fraction collector, and a UV/Vis spectrophotometer. (Chapter 5 offers guidance on planning
computerized systems.)
This design activity may be performed with help from the supplier or using internal support but should be
presented to system owner’s Subject Matter Experts (SMEs) and quality representatives for review.
• Perform Risk Assessment: Data integrity focused risk assessment should be performed early in the project
phase and repeated as more information becomes available and greater knowledge is obtained (see ISPE
GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts, Chapter 4 – Risk Management Approaches
[16] for more information). System level risk assessments are covered in Section 5.3.
• Identify High-Risk Functionality: (see Section 5.3) – A “must have” requirement for data integrity is typically a
property or characteristic of the developed system that assures data set/data system quality and data integrity
within the business process. “Must have” requirements for data integrity are not automatically high risk. While
“must have” status may infer a high severity of harm if the requirement is not met, the additional factors of
likelihood of occurrence and probability of detection in the ISPE GAMP® 5 [9] risk assessment methodology may
result in an overall medium or low risk priority for the requirement.
Having identified the high-risk functionality, QRM can be used to prioritize subsequent activities.
• Develop Strategy to Control Risk: A strategy to control end to end process and data flow ensures that a data
set of the required quality will be produced consistently. (This is discussed throughout the system lifecycle
considerations in Chapters 6 and 7.)
1
Static and dynamic records are defined and discussed in detail in Section 7.2.1.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 46 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
The elements of the control strategy should describe and justify how the chosen configuration and technical
and procedural controls contribute to the final quality and integrity of the data output. These controls should
be based on product and process understanding as defined in the key concepts of ISPE GAMP® 5 [9]. A
detailed understanding of the system should be supported by the quality system, such as incident management
processes. The quality system requires adequate tools to support critical thinking to assess the impact of these
types of events (and evolves as more knowledge of the system is gained).
Sources of process variability, such as manual intervention, that can impact patient safety, product quality, and
data integrity should be identified as risks, appropriately understood, and subsequently controlled. Understanding
sources of variability and their impact on the data within the system can provide an opportunity to implement
additional or alternative controls that reduce the risks. Science-based product and process understanding, in
combination with QRM, supports the control of data sets and data systems such that any variability can be
compensated for in an adaptable manner to deliver consistent data quality.
The adoption and implementation of innovative technologies and control paradigms, for example, process
analytical technology, may allow the design of an adaptive process step (a step that is responsive to input) with
appropriate process controls to ensure consistent product quality and data integrity.
Enhanced understanding of the data system and data set performance can justify the use of alternative
approaches to determine that data is fit for purpose.
• Build Business Process Solution: It is uncommon now for systems to be developed as custom solutions. Build
process solution more typically represents the application of a selected system configuration to meet intended
use and the provision of supporting procedures for that use. In the case of some SaaS solutions, configuration
may not be available, in which case build solution is limited to the implementation of use procedures.
- Supplier and System Selection (see ISPE GAMP® 5, Appendix M2 – Supplier Assessment [9])
With the move to more purchased and built solutions, including SaaS, benefits can be gained by collaboratively
working with the supplier to improve the technical controls built into the system to prevent and detect data
integrity issues.
- Managing Continual Risk Assessment and Process Improvement throughout the Data Lifecycle: (see
Figure 4.1) Throughout the data lifecycle, companies have opportunities to evaluate innovative approaches
to improve data quality and integrity (see ICH Q10 [32]). This includes re-evaluating controls to determine
if additional controls or adjustments to those controls are needed. Even after risk mitigation there may be a
level of residual risk that should be periodically reviewed and reassessed.
In the operational phase of the system, process performance can be monitored to ensure that it is working as
anticipated to maintain data integrity as expected. This monitoring includes:
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 47
Data Integrity by Design
• Critical aspects of the process impacting patient safety, product quality, and data integrity
• Process step parameters, including changes and the cause of those changes
Upon gaining additional process knowledge and performance monitoring data, the data integrity-specific configuration
of the computerized system and/or control elements can be revised accordingly.
It should be noted that each of the data lifecycle processes may require a different focus of design such that
appropriate controls are in place. These controls may vary for different records based on data classification and
intended use. For example, in-process checks as part of a batch record need to be retained until the batch expiration
date plus 1 year, compared with biological tissue traceability records [33, 34], which need to be retained for at least
30 years. The design of the retention solution for these two examples is likely to be very different. A more detailed
listing of retention periods is contained in Appendix O4.
The ability to map the business process is important to enable understanding of the business activities, and decision
points in a business process. The business process mapping or modeling may be depicted as flowcharts or tables,
or in combination. If a table is used to describe the business process, the location (where), responsible person/
role (who), the proper time to perform the action (when), and the output(s) of the action (what) should be included,
depending upon the needs [8]. Business process mapping illustrates the steps performed to fulfill the business
purpose, and should include both manual and computerized system aspects of the process.
As shown in the ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts, Section 3.2 [16], a business
process can be described in very simple terms using a block diagram. A single block can involve one or many manual
operation(s) or computerized system(s). The level of granularity of the process map is dependent upon the level
required for the business to identify the associated risks to data integrity, product quality, and patient safety.
Business process maps are useful to help the business identify the risks associated with the use of the system
including data integrity risks. Depending upon the needs, the process map may include details such as:
• Critical decision points, including actions (e.g., review/approval/disposition) mandated under the predicate rules
• System functions
• Responsible person/roles
• System interfaces
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 48 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
If any parts of the process are outsourced, it is important to include this information in the process map. Data created
by an outsourced facility remains the responsibility of the marketing authorization company, and, as described in the
ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts [16], needs to be available for review, such
as during a clinical trial. A process map used to help IT support the system should contain information about the
hardware and servers supporting the system.
The process map can help with the creation of SOPs, work instructions, and training materials before the operational
phase. It is also an essential basis for the end to end testing of the business process before the system is released
for use.
It is essential for a data flow diagram to capture the level of detail that will truly identify all data activities and the
subsequent identification of their potential risks. For example, the use of a data flow diagram incorporating ancillary
systems could identify the use of unvalidated and uncontrolled spreadsheet to evaluate specification limits. However,
a separate data flow diagram may not be needed for a very simple business process.
Depending on the complexity of the computerized system involved and/or the business model (e.g., outsourcing
services or systems), the data flow could be simple or very complex. It is crucial not to overcomplicate the data flow
as this can inhibit the clarity and understanding of the diagram. The more controls and interfaces in place, the more
complex the data flow diagram becomes.
A data flow diagram can be created by identifying for each step in the business process map:
• From where will information come into the process step (e.g., sample or batch ID, limits and specifications) and
how (e.g., electronic interface/manual transcription)
• What data will be generated during that step of the process and how is it captured? What calculations or
processing must be performed on the data and how and where will those be done? What metadata is needed to
provide the GxP content and meaning to the data?
• For manufacturing processes, what are the Critical Process Parameters (CPPs) and Critical Quality Attributes
(CQAs), and how are they managed and fed into the process control?
• Where does the data need to go next (e.g., to the next system used in the process, or to a laboratory information
system or Enterprise Resource Planning (ERP) system, or both)? How will it be transferred? How is the transfer
verified? What data cannot be transferred? How and where is it stored?
Figure 4.2 shows a simplified outline from which to build a data flow diagram. Split out all the data sources and
downstream systems/processes, and add detail on the data being transferred.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 49
Data Integrity by Design
Figure 4.2: A Simplified Outline from which to Build a Data Flow Diagram
The data flow diagram should graphically illustrate the creation, use, and movement of data elements throughout a
business process: the “data” view of activities. The data flow diagrams can aid understanding and identification of:
• General and specific data integrity risks, e.g., time pressures, equipment limitations, resource constraints
The above process can help ensure that the system remains in a compliant state.
Fundamental to managing data is to determine what data is needed and for what use. It should be noted that there
are non-GxP requirements for data classification and retention, e.g., GDPR [17], HIPAA [18], and legal hold; however,
this Good Practice Guide focuses exclusively on the GxP requirements.
Taxonomy is mentioned in Appendix M1 Section 8.3 as an essential component to KM; it is the organizing of data into
related groups. In this Good Practice Guide, the taxonomy is based on classifying data as regulated, operational, or
unnecessary, as defined in Section 1.6.5.
Within the regulated data (i.e., data that must be kept), it is important to understand the intended use of the data now
(e.g., for making quality-critical decisions on a batch) and in the future (e.g., for trending and data analytics). Section
4.7 discusses the need to set business rules to define the data nomenclature for data used in ongoing collation and
data analytics, such as trending CQAs and compiling quality metrics.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 50 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Once the data classification is identified, the following considerations should be made to complete the understanding
of the data lifecycle:
• Identify the intended use and criticality of the data: data with high criticality has the greatest potential impact to
product quality and patient safety. The data needs to be retained and archived.
• Record type: original record and true copies: see Section 7.2.1 data lifecycle considerations
• Essential controls: access control and data management, see Section 3.4
• Retention state, including static versus dynamic state to support the intended use: see Section 7.2
Note: As experience of a process is gained, data classification may change, for example, something that was initially
considered to be unnecessary becomes useful operational data.
A business process risk assessment is a non-system-specific high-level assessment of the business process and
data flow, and the potential data integrity risks inherent in the process. It is aimed at identifying key process-level risks
to patient safety, product quality, and data integrity, and identifying the essential controls to manage these risks. As
with any assessment, it requires the application of critical thinking by knowledgeable and experienced SMEs.
The business process map should be created early so that it can be used to drive the identification and assessment
of risks to the process and subsequently act as a feeder to the system planning and lifecycle. (See Chapter 5.)
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 51
Data Integrity by Design
The ISPE GAMP® Guide: Records and Data Integrity [8] represents the data lifecycle in five phases, and is shown in
Figure 4.3.
• Creation: Data capture or recording should ensure that data of appropriate accuracy, completeness, content,
and meaning is collected and retained for its intended use. This could include manual data entry as well as
automated capture.
• Processing: Data is processed to obtain and present information in the required format. Processing should
occur in accordance with defined and verified processes (e.g., specified and tested calculations and algorithms),
and approved procedures.
• Review, Reporting, and Use: Data is used for informed decision-making. Data review, reporting, and use
should be performed in accordance with defined and verified processes and approved procedures. Data review
and reporting is typically concerned with record/report type documents. Second person reviews2 should focus on
the overall process from data creation to the calculation of reportable results, including the metadata required to
support the GxP content and meaning of the data. Such reviews may cross system boundaries and include the
associated external records and may include verification of any calculations used. The data reporting procedures
should contain the complete data set and define the data handling procedures, and ensure the consistency and
integrity of the results. If there are automated controls available, for example, a validated exception-reporting
process, the extent of second person review may be reduced.
2
Second person reviews for laboratory data are explicitly required and discussed in the FDA Guidance for Industry: Data Integrity and Compliance
with Drug CGMP Questions and Answers [35] and PIC/S PI 041-1 (Draft 3) Good Practices for Data Management and Integrity in Regulated GMP/
GDP Environments [24].
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 52 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
• Retention and Retrieval: Data should be retained securely. Data should be readily available through the defined
retention period in accordance with defined and verified processes and approved procedures. Retention periods
vary by data classification, intended use and applicable regulation, and some records, e.g., validation and
qualification records, need to be retained for the life of the system or process.
• Destruction: The data destruction phase involves ensuring that the correct original data is disposed of after the
required retention period in accordance with a defined process and approved procedures.
The middle phases (those between Creation and Destruction) may fall in any order and may repeat.
A detailed discussion of the data lifecycle is available in the ISPE GAMP® Guide: Records and Data Integrity, Chapter
4 – Data Life Cycle [8].
Where validated technical controls and/or automated tools ensure the integrity of the results, it may be possible to
justify a reduced rigor of routine data review. Where the technical controls alone cannot ensure the integrity of the
results, procedural controls including enhanced data review may be required to address the deficiencies based on risk.
This section describes the data lifecycle with a focus on regulated records. Data integrity of a regulated record
should be ensured from its creation to the end of its retention period and resultant destruction. A regulated record
is a collection of regulated data (and any metadata necessary to provide meaning and context) with a specific GxP
purpose, content, and meaning, and required by GxP regulations. Records include instructions as well as data and
reports, as defined in Section 21.2 of the ISPE GAMP® Guide: Records and Data Integrity [8]. As a regulated record
progresses though its data lifecycle, critical thinking can be used to identify areas that may impact data integrity such
as:
• Selecting metadata that accompanies a record (e.g., thermocouple identification and location, associated audit
trail entries)
• Any alarm information or annotations relating to possible issues with the data from earlier in the data lifecycle
• Evaluate the use of human interaction, such as manual tasks and processing of data
• Converting data into a different electronic format (e.g., native formats to PDF)
• Migration3 of records between media (paper, electronic) or to other computer systems (e.g., application program
interface, data migration)
• Transfer of records to cloud servers or servers from different organizations within the regulated companies and
third parties
3
See Section 6.8 of the MHRA GxP Data Integrity Guidance and Definitions [12] for the definition of data transfer and data migration.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 53
Data Integrity by Design
The data flows discussed in the Section 4.3 can be broken down into component parts that form the steps of the
data lifecycle. A data lifecycle approach ensuring data integrity is recommended as it has been found to improve
the effectiveness of data integrity controls and results in a more easily managed and understood program. The data
flow diagram facilitates the abstraction of risks from systems to data lifecycle phases, allowing standardized/modular
controls to be developed for each lifecycle stage. These modular controls can then be used any time a system
performs the associated lifecycle stage.
In order to mitigate data integrity risks and based on critical thinking, different controls may be implemented for
different areas of the lifecycle, for instance:
The data lifecycle may not be a system-centric view. Data may cross systems and organizations throughout the data
lifecycle, which must be considered, especially when addressing data integrity controls including security and audit
trail. Where a record is needed to support multiple processes or operations, it is better from a data integrity and data
quality point of view to retain a single copy of the record and link to it rather than having multiple copies in different
locations.
A record’s data flow derived from understanding the business process (as described in Section 4.2) provides a
structure to identify, assess, mitigate, and communicate potential data integrity issues/risks associated with each
data lifecycle stage. Understanding the record’s data flow through mapping helps understand the data lifecycle of
the regulated record. Section 3.2 of the ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts [16]
presents detailed examples of this for manufacturing, QC laboratory, and electronic Case Report Form (eCRF) data.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 54 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Manufacturing Example
In the manufacturing area, In-Process Checks (IPC) are performed to confirm optimal process setup and for
monitoring during production to ensure that the in-process samples conform to specifications. Such IPC data should
be available in the manufacturing record, and available for quality review as part of final disposition.
Additional examples for the manufacturing data lifecycle are available in the ISPE GAMP® Good Practice Guide: Data
Integrity – Manufacturing Records [15].
QC Laboratory Example
In the QC laboratory, raw materials arriving on site are tested to verify that the quality of the raw materials meets
specifications. An ERP system may be used to manage and track raw materials from receipt through use. Challenges
to data integrity may occur during the capture of incoming material data, associating the data with the material lot,
and ensuring it is considered when dispensing inventory for manufacturing (within expiry).
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 55
Data Integrity by Design
In a clinical study business process, data may be entered into an eCRF system from the patients’ Electronic Health
Record (EHR) or a paper record at the investigator site. Consideration should be given to mitigating data integrity
risks when verifying the source data entered in the eCRF, reviewing, and querying the data, to provide assurance that
the data is complete, accurate, and consistent with protocol and clinical expectations, before conducting statistical
analysis and reporting the outcome. Data integrity controls, including data reviews, should be incorporated into the
business processes and optimized via critical thinking at the individual study level.
Data integrity risks within the business process activities should be considered when mapping a record’s data
lifecycle; issues identified in the early phases of the data flow and lifecycle are carried forward across different
computer systems and processes, compromising the integrity of the regulated record. It is also important to document
and define the records in the lifecycle that will support GxP decision-making.
Security of clinical records is a particular concern in view of data privacy rules for medical records (HIPAA [18], GDPR
[17]) and must be addressed by the data integrity controls.
Additional examples for clinical data are available in the ISPE GAMP® Good Practice Guide: Validation and
Compliance of Computerized GCP Systems and Data (Good eClinical Practice) [36].
Alignment between manual tasks and multiple computerized systems are needed to support data integrity of the
records throughout the lifecycle. System lifecycle considerations are discussed in the next section.
Data nomenclature is a critical enabler of data quality and especially data analytics. In simplest terms, it is a
framework to ensure that data is captured and named consistently from the start to the end of the business process,
that is, “lot number” in one system is not labeled as “batch number” in another system. Inconsistent identification
results in using aliases to relate the data by its different names. Standardized nomenclature means the metadata
is coherent and relatable, ensuring that the data is usable for reporting, trending, data analytics, and even
machine learning/Artificial Intelligence (AI) applications. Defined after the business process has been mapped, the
nomenclature framework is applied throughout the data flow diagram so that all systems supporting the business
process have nomenclature uniformity.
Without standardized nomenclature, each site in an organization is permitted to develop their own data nomenclature
for materials. The firm manufactures a material, aspirin: Site A calls it “Aspirin,” Site B names it “ASA,” Site C uses
“Acetylsalicylic Acid,” and at Site D the material name is “Acid, Acetylsalicylic.” All sites use the same electronic batch
management system, which manages each batch using the material name. When conducting a review of 5-year
manufacturing trends for this material across multiple sites, the comparison is neither easy nor fast because of the
lack of nomenclature framework across the sites. A data dictionary could be implemented to define all the aliases
for the material name; however, it is simpler and more cost-effective to start with a defined nomenclature framework
when designing the business process.
Nomenclature is distinct and separate to normalizing, as used in data analytics. Normalizing is a way to eliminate
redundancy and rescale disparate data sets such that a like-for-like comparison can be achieved. For example,
comparing sample throughput between two laboratories of unequal size is meaningless, but normalizing the data as
sample throughput per capita enables an evaluation of laboratory efficiency.
Nomenclature is not inherently part of the data entered, but it is an essential basis for ensuring coherent, consistent
entry and syntax of metadata to support the data.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 56 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
A good nomenclature program starts from the end of the process and works backwards: “begin with the end in mind.”
During the process/system concept and delivery phases, business process owners must develop requirements to
describe how they need to access business information: the speed of access, depth of access, breadth of access,
and metadata to enable queries of the business data. For example, are most queries site-centric or are they global for
a particular activity or operation?
Nomenclature is an activity that develops and applies business rules to various data and metadata elements (objects)
to standardize data and metadata entry and syntax as it is recorded to ensure consistency; such consistency
enables queries that will retrieve all requested data. The framework must be applied consistently across the entire
organization or its value is greatly diminished.
To create consistency in metadata, it is necessary to start with business rules that guide the creation of new metadata
values when they are requested. Some business rules will be universally applied, such as there should not be two
values that describe the same real-world article, while other business rules will be unique to a specific data element.
Table 4.1 provides an example list of business rules to aid understanding. These are only examples: the technology
limitations and required data elements will cause each organization to create their set of business rules. Business
rules must be shared not only to all internal stakeholders but also to suppliers or contract organizations that need to
follow these rules in their system design, configuration, or operations.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 57
Data Integrity by Design
Business rules must be developed by a team of people with varied backgrounds and skill sets, as the activity of rule
creation demands knowledge of the automated systems involved (and their technical limitations) and the data used
by a business area under discussion. The team also needs someone who has strong analytical capabilities to create
a systematic approach to building data rules. It is desirable to create rules for a number of metadata entities, and then
apply them in a test environment to test the robustness of the rules, as errors and gaps will frequently be found in
early tests.
Process Owners
Owners need to have the business connections and resources to identify and educate uniquely qualified personnel
to become Data Stewards, or leaders of a data steward unit. By reason of prior experience, the process owners
may serve as business experts in crafting business rules for data entities, but their primary role is accountability to
senior level personnel for Data Governance Steward activities, championing the value of data stewardship within the
organization, clearing political obstacles, providing resources and systems, and enabling Data Governance Stewards
to be successful in their work.
Once rules are developed, they must be maintained and disseminated to the individuals who will maintain and
enforce the rules. This is a critical role and a difficult one as well, because there will be tremendous organizational
pressure to bypass the rules, e.g., (1) It takes too long; (2) But we always call it ‘”XYZ;” (3) Those people are too
controlling; (4) Our site head says we do not have to follow those rules, etc.
The Data Governance Steward must guard the data and associated metadata so data mining and data analytics will
be possible in the future. If an organization elects to go with a dictionary approach, the Data Governance Steward
should own the dictionary and be responsible for its maintenance. The Data Governance Steward will often be forced
to refuse a request, because the entity request already exists under another name. They must research every request
to prevent duplicate entities. In addition, Data Governance Stewards working across multiple business sites must
constantly coordinate to keep business rules and approved lists aligned, ideally with tools that span all organizational
sites where their entries will be used. Centralization and control give Data Governance Stewards the capability they
require to create consistent metadata entries. In turn, these bring tremendous value to the organization as global data
searches are simplified to empower the “big data” views that global organizations desire.
Data Experts
Data experts are people with deep business and/or systems knowledge in a specific area of operations, such as
Manufacturing, Support Services, Calibration, Automation, Reference Standards, Stability Management, and so
forth. They assist Data Governance Stewards by providing examples of the various ways in which data is received,
managed, and reported in parts of the organization. Their deep knowledge of the various areas helps the Data
Governance Steward develop business rules robust enough to cope with infrequent and unusual scenarios.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 59
Data Integrity by Design
5 System Planning
Organizations should approach data integrity using a holistic top-down view, starting from the perspective of the high-
level business process, then looking at the subprocesses before looking at the individual computerized systems and
activities. This approach is depicted in Figure 5.1 and provides assurance that data integrity truly is designed in at the
business process level.
Confidence in data integrity starts with the capability of the system to provide technical controls for data integrity.
The relationship the customer has with the supplier is essential when determining which supplier to use and system
to purchase. It is important to select a supplier that has an established quality system and hardware/software
development process, as determined through an appropriate vendor qualification process.
It is essential to understand the business process, to understand and determine how the data will flow through the
different business processes, and to determine what interactions (i.e., transfer of data) are utilized by the existing
business process (manual/automated) within the data lifecycle. Only after understanding the business processes can
improvements be made to the existing processes as needed and as new technology becomes available. An example
of this can be seen in the data archiving process: switching from an internally developed script-based backup process
to a backup process leveraging a commercial utility.
The diagram in Figure 5.2 is a high-level representation of a new process along with identification of existing
processes that may impact the new process. When developing a process flow, greater details are needed to clearly
identify each process step and the complexity of those processes. Once there is an understanding of this, it is
possible to examine the data directly involved with the critical processes to understand where it resides in each
process stage and what equipment/computerized system supports that data.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 60 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Within the pharmaceutical industry, there will be a range of business processes used to execute different activities.
Additionally, each process is supported by various computerized systems. The size and complexity of the business
process determines the initial scope of the user requirements for these computerized system(s).
Appendix D1 of the ISPE GAMP® Guide: Records and Data Integrity [8] gives examples of business processes and
how it can lead to generation of user requirements. The requirements can be manually developed from the business
process or automatically derived using tools such as Business Process Model and Notation.
Figure 5.3 is a graphical representation of the relationship between the system and data lifecycles, showing how
a combination of manual tasks and/or multiple computerized systems may support a single data lifecycle. Only
the project stage of the system lifecycle is shown, purely to denote the need for validated technical controls. Data
destruction and system retirement occur independently of each other and may or may not happen at different times.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 61
Data Integrity by Design
A complete system may consist of several components/processes. There are multiple considerations when designing
a new system. For example, what phases within the process are well established and what phases need to be
improved or established?
A well-defined, consistent system with no manual intervention carries less risk than a system with manual input.
Considerations for a system should include system access control and data management as both capabilities vary
by system. The system should be chosen and/or defined to mitigate the risks to data integrity by providing robust
technical controls that can be validated as fit for purpose. Existing constraints may dictate selecting a particular
system solution over other commonly used options. An example of this is a relational database that provides strong
traceability and audit trailing for active data compared to flat-file storage; however, if the archiving system can only
manage flat files, then the database solution becomes less desirable.
When evaluating the purchase of a new system, it is important to assess the new system’s capability to meet the
business needs and requirements, including the regulatory requirements based on the environment in which the
system operates. Additionally, it is imperative that the appropriate personnel are involved in the planning process
(e.g., business technical resource, IT, engineering, etc.).
Prior to planning and purchasing a new system the high-level business process, GxP data elements, and high-level
data flows between the different process steps or activities should be established. This includes the need for the
system to operate and integrate with existing systems that support the business process.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 62 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Figure 5.4 is an example Ishikawa diagram (fishbone analysis). Although Ishikawa diagrams are typically used as a
root cause analysis tool, here it is used as a means to determine which data is directly involved in critical processes.
The use of Ishikawa diagrams is just one example of a simple tool to help understand where data may be found.
Organizations should select a tool or approach that best suits their needs.
As determined from the Ishikawa diagram, the systems that print the data, store e-data, and apply e-signatures need
focus.
There may be existing constraints that must be accommodated, for example, if the organization utilizes an archiving
system that works with flat files only, it is necessary to ensure that any system proposed supports this process. This
could involve forfeiting the security of storing the live data in a relational database before archiving or even converting
the data to flat-file format to facilitate the archival process. Converting to flat files for archival purposes must not
compromise the integrity of the data.
The GxP data lifecycle needs to be planned along with the identification of the business processes and data elements
directly involving critical processes. The GxP data in each system needs to be mapped with the systems, and the
data lifecycle and controls required to maintain the integrity of the GxP data need to be documented.
Business process mapping brings an understanding of the use and governance of data. From this mapping and the
application of ISPE GAMP® 5 [9] principles, organizations can then plan what activities based on risk are required
for the computerized system(s) that will support the business process. Many vendors provide software systems that
enable full control and monitoring of a process. Therefore, the relationship between the regulated company and the
computer system supplier is the foundation towards achieving a successful, on time/within budget project that meets
all the business and applicable regulatory needs. See ISPE GAMP® 5, Appendix M2 – Supplier Assessment [9] for
more details.
When performing an analysis of a process and its data, there must be adequate representation from the business
to provide the expertise of personnel involved in the process being analyzed and the associated equipment
computerized system(s). For example, when planning the batch release process, some or all of the following
roles may need to be involved: process owners, data owners and stewards, technical and process SMEs, IT,
Engineering, QC, Quality Assurance (QA), and computer system QA personnel. It may be valuable to define the RACI
responsibilities for each role.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 63
Data Integrity by Design
The system risk assessment is a foundation to QRM and requires critical thinking to identify the potential risks to
patient safety, product quality, and data integrity. Even after risk mitigation there may be a level of risk remaining that
should be periodically reviewed and reassessed.
Consider for the regulated and operational data at each step in the data flow diagram:
• Is the data and metadata at risk of unauthorized alteration by manual interventions or by systemic/technology
issues?
• Is the data and metadata stored in a secure location immediately after creation?
• How will the data be transferred from one system to the next system in the process?
• Can and will all required metadata (including audit trails) be transferred with the data?
Such an understanding of the types of data involved, the essential metadata, the transfer and use of the data, and the
risks involved, facilitates the design and specification of systems and processes in support of the business process
and the integrity of the data therein.
When identifying system lifecycle risks, it is recommended to map how system access is configured, and how data
is managed and stored. During the evaluation process, document the controls in place, risks present, and required
mitigation. Always strive to implement a system with the strongest (preferred) controls that reduce the risk as low as
reasonably possible.
Table 5.1 shows the key considerations to help identify risks and suggests corresponding mitigations of varying
effectiveness that a computerized system can provide. It does not address behavioral issues such as writing down
passwords, or sharing passwords due to a lack of user licenses.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 64 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Table 5.1: Key Considerations for Risks and System Level Mitigations
How is the user authenticated before Role based ID/password Unique ID/password
the capture can occur?
How does the system reduce the risk Simple password Robust, sophisticated password
of another user accessing the system enforced
with someone else’s credentials?
How does the system deal with User must logoff to allow another user System is able to manage access
multiple concurrent users? to access the system to multiple application windows
concurrently
How are new devices (e.g., balances New devices can only be added by an New devices can only be added by
etc.) authorized on the system? application user individuals with elevated privileges
How does the system prevent Parameters modifiable by authorized Parameters locked, changeable
parameters from modification that user only only with elevated access via formal
could influence the result or process? change control
How is the data transferred from Manual data transfer to next system Automated data transfer pulled by
the data creation system to the next by selecting files to transfer, possibly next system, leveraging checksums
system in the business process even using USB flash drives and handshakes around the
workflow? connection and data transfer
Where is the data stored? Initial data storage – local drive Initial data storage – secure server
Is the data stored to a durable Temporary data storage Permanent data storage
medium?
For duplicate readings, where does The separate audit trail documents All readings are contained and audit
the system store the duplicates, and the multiple readings trailed within the electronic form/
how it is documented which one is worksheet
used?
Data is not secured at the initial Deletion privileges removed from all Data is automatically captured into a
location (local drive) storage folders secure server at the time of creation
(can use third-party software to do
this)
Is every calculated result stored? Procedural controls require manual System automatically stores a new
saving of all results result every time the calculation is
run. Result shows the version number
(e.g., fifth calculation shows result V5)
How does the system reduce the risks Field mandatory blockers (e.g., red Manual data entry has been replaced
of error in manual data entry? asterisks that mandate the field must with an automated, validated interface
be populated), range calculators to capture the data (e.g., bar code
(low and high limits) that dictate the scanning or serial interface)
acceptable range for a certain field
How does the system avoid data or Procedural controls require all edited System automatically saves all edits
files being overwritten? files to be saved with an incremental under a file name with an incremental
version number version number and audit trail of
changes
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 65
Data Integrity by Design
QRM principles can then be applied to implement and verify controls as part of the computerized system validation.
A periodic review strategy, including frequency of review, should be developed to identify the processes monitored
to confirm that the system is maintained in a validated state during its operational phase. This includes a review of
procedural controls needed to maintain data integrity to ensure unacceptable risks have not arisen. The amount of
residual risk should be minimized and reviewed within the periodic review. The rationale should be documented to
ensure undue risks are not accepted and that appropriate mitigations are in place. In addition, the strategy should
also contain a review of:
• Deviation records
• Incidents
• Problems
• Upgrade history
• Performance
• Reliability
• Associated training
The periodic review timing should be defined according to local procedures and based on the system’s impact on the
data.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 67
Data Integrity by Design
6 Active Records
As shown in Figure 3.1, records become active at data creation and generally remain active through the processing
and review, reporting and use phases of the data lifecycle. Active records are typically characterized by the need for
frequent access by a range of personnel as part of the business process and the systems supporting the process
must provide effective controls around such access and activity.
Successful data integrity assurance programs commonly evaluate data integrity risks from two different perspectives:
system lifecycle and data lifecycle. With respect to data integrity, system lifecycle considerations ensure data integrity
on an individual system level, while data lifecycle considerations provide cross-system assurance. This approach has
been recognized to provide three key benefits:
• It promotes more consistent use of risk mitigating controls across systems within a data lifecycle
Starting from the business process and data flow diagram discussed in Sections 4.2 and 4.3, it is recommended to
use critical thinking to assess the path the data will travel through the data lifecycle and whether it will pass through
one or multiple systems during its lifecycle:
• If the data will remain within a single system, such as a well-configured and validated document management
system that takes and maintains the data, then this reduces the opportunities for data issues.
• If the data will travel through multiple systems, then there needs to be appropriate measures taken to define what
data will transfer between which systems. For example, a clinical trial is an area where massive records and
data are recorded from the CRO and trial sites, and then aggregated before going to the sponsor. In this process
there are increased risks that should be documented, as should the mitigation strategies intended to protect the
records from corruption and/or loss.
The business process risk assessment (see Section 4.5) focuses attention on the underlying process, to facilitate
critical thinking through the more detailed data and system assessments. When assessing risks from the data
lifecycle and then from the system lifecycle, there may be common areas of risk identified. Instead of addressing
them separately, a holistic approach can be applied. Password control is an example of a risk mitigation strategy from
the data lifecycle applied across multiple systems.
6.1 Creation
Creation is the initial phase in the overall data lifecycle and therefore is the first place that data integrity may be
compromised. If data integrity is not ensured at this phase, there is no way to later regain that integrity. It is essential
to consider and document the risks and necessary controls to ensure data is accurate, complete, and fit for purpose.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 68 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
It is important to understand the data lifecycle (such as intended use, review and reporting, and retention needs) and
how those lifecycle elements can impact data creation. These items are mainly pulled from a process flow or risk
assessment, as stated in Sections 4.3 and 4.5.
The data lifecycle should be considered both within the context of a system lifecycle and separate to it, as data is fluid
and moves from location to location. It is important to consider:
• What ability users have to adjust data once it has been recorded and how that adjustment is captured (such as
audit trail, change control, etc.)
The data lifecycle starts with the first creation of data. Creation dictates the initial and fundamental quality and
accuracy of the data as it moves through the lifecycle. Integrity of the data cannot be improved as the data moves
through the lifecycle, that is, the integrity is only as good as that of the original record at the time of creation. It can
degrade during the lifecycle if proper controls are lacking. It is important to have the well-defined business processes
documented in order to assess the risks to data integrity during creation.
In the creation phase of the data lifecycle, data is captured or recorded from an instrument, device, another system,
or manually. It is essential to ensure the data (file) naming convention and storage location are documented and
secured from unauthorized modification. It is possible that data is stored in multiple different locations (e.g., local
drive, secure network server, or LIMS). Therefore, the data or system owner should ensure that data is secure at
each location with the identification of which record takes precedence for decision-making.
If data must be transferred to a different location, automated transfer is recommended. Automated data transfer can
be validated proactively to ensure consistent, repeatable, reliable transfer, whereas manual transfer processes can
only be verified by human review after the event (by confirming the number and size of files transferred, and possibly
only reviewing a subset of the files). If the transfer process includes the subsequent removal of the data from the
original location, then there must be confirmation of successful data arrival into the target location before it is deleted
from the original location. This can be achieved by validation of the automated data transfer process.
When identifying data lifecycle risks, it is recommended to map how and where original data is created and
subsequently transferred to the next location or system. During the evaluation process, document the controls in
place, risks present, and required mitigation.
• Data is recorded manually, lacking an independent audit trail to capture repeat measurements or data, and is
vulnerable to deliberate or accidental transcription errors.
Suggested mitigation: replace manual data recording with electronic data capture
• Creating data by calculation, in Excel for example, the user can create an average sample weight by entering
an averaging formula and hitting Enter. This calculates the average of the values, but this average is not
automatically stored. The user can then keep editing the individual sample weights until they get an average they
like, and then hit Save.
Suggested mitigation: ensure that the system enforces automatic data saving at the time of creation. (Guidance
concerning the validation, control, and use of spreadsheets is discussed in ISPE GAMP® Guide: Records and
Data Integrity, Appendix D5 – Data Integrity for End-User Applications [8].)
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 69
Data Integrity by Design
A computerized system involved in the creation or first capture of data needs all of the essential data integrity
technical controls (e.g., access controls, privileges, data management, etc.) plus the following abilities specific to
creation, to:
• Support automated data collection/capture of the measured or calculated value to eliminate manual transcription
• Generate a complete record with all of the metadata integral to the record
• Retain sufficient information (who, how, using what equipment, etc.) to allow later reconstruction of the activity
performed when creating the data
• Store multiple or repeated readings with no overwriting and full audit trail on all values
• Transfer the record to the next system in the business process using a validated automated interface
Integrity lost at the time of data creation cannot be subsequently restored so it is essential that any system used in
data creation provides strong and effective technical controls during data creation activities.
6.2 Processing
Often the data created and stored may need to be processed to convert it into a meaningful value. How the
processing is applied, how much processing is applied, and was there any repeated reprocessing can impact the
value generated; therefore, processing can be an inherent source of data integrity risk.
A validated computerized system typically reduces data integrity risk; however, where there is a possibility for manual
intervention, the risk increases [12]. This has been amply demonstrated by the many US FDA warning letters [40]
regarding reprocessing chromatography data, where:
• It has been a subjective decision made by the analyst that the initial automated integration is in some way
deficient based on their personal experience without any defined good/bad criteria for assessing integration.
• The analyst has been able to manually intervene, forcing changes to the integration parameters and/or baseline
compared to the initial parameters used by the validated chromatography data system and thus generating a
different result.
The impact of this combination of subjective interpretation and manual intervention has resulted in multiple citations
for data manipulation. This risk factor is not restricted to chromatography data but rather is prevalent wherever the
acceptance criteria are not clearly defined with examples or numerical limits, and where a human decision can
override the validated controls.
As shown in Figure 6.1, the data integrity risk from processing is impacted by two factors:
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 70 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Table 6.1: Examples of Data Integrity Risk as Impacted by the Consistency and Subjectivity of Processing
Method development for a new Experimental PAT system using Blood pressure is taken
drug candidate, with co-eluting Near Infrared (NIR) to predict with a manual cuff (aneroid
peaks and entirely manual when to manually stop the sphygmomanometer) and typed
integration mixing process into the patient history
Sample weight manually Ingredient weighed manually Blood pressure is taken with an
transcribed from manually and weight transcribed into automated monitor and typed
recorded weight in laboratory control system for storage into the patient history
book into the Chromatography
Data System (CDS)
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 71
Data Integrity by Design
Depending on the types of data and processing, these risks need to be remediated in either the data lifecycle or the
system lifecycle or a combination of both working together to reduce the residual risk.
Consideration of processing risks from a data lifecycle perspective is important because it helps identify risks not
seen by considering computerized systems individually. It is particularly important when processing activities involve
multiple systems or applications. While the system lifecycle may adequately identify processing risks inherent to
a single system, it is not well suited for identifying risks to processing that involve multiple systems or applications
working together.
A significant benefit of using the data lifecycle perspective is that it encourages the use of consistent risk mitigating
controls across the data lifecycle.
In the processing phase of the data lifecycle, data is processed to obtain and present information in the required
format. While all critical data processing considered in the system lifecycle and the data lifecycle should occur in
accordance with defined and verified processes (e.g., specified and tested calculations and algorithms) and approved
procedures, data lifecycle processing considers modes of data processing that fall outside individual system
boundaries. Additionally, it considers cross-system boundary effects that may impact data quality.
• Record/report documents are electronically compiled with processed data from separate electronic sources
• Statistical data processing within an application that relies on an Application Program Interface (API) to a
separate application to perform all or part of the processing
It is recommended to start with a business process map and data flow diagram (see Chapter 4). Map the data
lifecycle for the subject data values (e.g., conductivity, weight). Include the boundaries for the systems, applications,
and procedures that perform the processing. Share this map with a cross-disciplinary team to identify risks. Risks
identified can then be addressed using a mitigation strategy based on risk priority. Identifying and planning mitigation
of processing risks from a data lifecycle perspective relies on many of the same methodologies and critical thinking
used to identify risks at a system level.
While it is possible to evaluate the lifecycle steps of individual data values without creating a data flow map, this is
not recommended. The data flow map visualization is one of the best tools for promoting a team’s critical thinking
activities. It helps teams start from a common point of understanding, understand and verify the flow of data, and
break down the lifecycle steps to identify and mitigate risk.
While this section focuses on the processing stage of the data lifecycle, it is not suggested that a map be made for
each stage in the data lifecycle; rather, one lifecycle map should be created to include all the phases a data value
encounters.
Section 4.3 of the ISPE GAMP® Guide: Records and Data Integrity [8] offers a list of topics for consideration during
risk assessment in the processing phase.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 72 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Mitigating processing risks, as with any risks, should be based on the risk priority (Refer to ISPE GAMP® 5, Chapter
5 – Quality Risk Management [9]). Separate from the process of assessing and choosing commensurate controls to
manage risk, common controls can be used when mitigating risks identified using a data lifecycle perspective.
As discussed, a data lifecycle perspective helps identify risks that exist between and outside of systems (e.g., manual
processing, multisystem/application processing), as system risks are commonly managed using the system lifecycle.
It therefore makes sense that processing risks are commonly mitigated with controls that are not system specific.
However, this is not always the case as can be seen in the following examples:
• Validation/qualification testing
Where there is a business process need to allow manual processing, it must be recognized that this presents an
ongoing residual risk to data integrity within that process. Increased rigor of review for manually processed data
provides limited mitigation but cannot wholly eliminate that residual risk. Chromatography data is the classic example
where manual processing is needed if the automated method is unable to process an atypical chromatogram.
Continual improvements to the analytical method and ongoing user training should be employed to reduce the need
for manual processing.
Figure 6.2 details examples of ways to mitigate two common processing risks. As shown, mitigation of risk (assurance
of quality) is rarely achieved through only one control. Controls working together have a cumulative impact towards
mitigating risk, but some level of residual risk may always remain.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 73
Data Integrity by Design
Note: While Figure 6.2 shows four mitigating controls for human and system data processing, it is common that
process activities executed by humans require more mitigating controls due to variability. Procedural controls alone
are typically not sufficient to mitigate data integrity risks as they rely on human actions.
Even after risk mitigation there may be a level of residual risk that should be periodically reviewed and reassessed.
Each regulated company should determine what level of residual risk is acceptable within their organization.
When using system-specific process controls, consider if these controls should be a standard requirement for
all systems. Standardizing controls in this way is significantly more efficient as it prevents the need to develop
new controls (for a given risk) for each piece of new equipment. For example, standardizing the use of network
permissions (e.g., Lightweight Directory Access Protocol) on all systems eliminates the need to develop a
permissions strategy for each new system.
The ISPE GAMP® Good Practice Guide: Data Integrity – Manufacturing Records [15] states that:
“Where a process is well defined (‘we know exactly how to do this’) and consistent (‘if we do it like this, we
always end up with the correct result’), has no manual intervention (‘it all happens automatically’) and an
objective output (‘we all agree on the result’), data integrity controls can be achieved by validating the system
and maintaining it in a validated state.”
For processed values such as those described in the Point examples in Table 6.1, well-applied and managed
computerized system validation and access controls restricting changes to the preset limits are sufficient to mitigate
the (low) data integrity risks associated with the processing step.
For all other processed values (e.g., Points to in Table 6.1), the risks need a combination of controls involving
people, process, and technology for mitigation. As discussed in Section 6.2.1.3, some of the mitigating actions need
to address risks at the level of the data lifecycle and business process rather than at the individual system level.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 74 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Before writing out requirements for technical controls around processing, it is important to first risk assess what can
impact the validity of a processed value. Use critical thinking to identify likely sources of impact on the processed
values (and considerations for essential data integrity controls around these), such as:
• Initially acquired data in: Where does the initial data come from? Could the initial data be impacted, e.g., by
electrical interference or signal loss? Is there a transfer of the data to a separate system for processing? If so,
are the units and other metadata transferred with the data or reapplied in the target system?
• Calibration data and scaling: Many sensors initially generate an analog electrical signal which is converted to
a digital signal and scaled using calibration factors. Where does the calibration data come from? Who is able
to change it? Are the previous calibration values available or are they overwritten? Can calibration points be
excluded?
• Averaging results: Where multiple readings are taken automatically or manually, and the average is within
limits but individual readings are outside the limits; this may be acceptable for a rolling average of a pressure
transducer reading in a Process Control System (PCS) to dampen out momentary pressure spikes but would not
be acceptable for analytical results in a QC laboratory (Out of Specification investigation required).
• Manually entered values: Are the privileges available to a sufficiently granular level to control who can enter
and who can review? Would the system flag a value entered out of range? Can the system enforce a second
person review on an entered value?
• Changes to calculations (formulae), integration parameters, or methods: Who can make changes? What
record is there of the change? Is it possible to require an approval step before the system allows the change?
• Changes to processed result: Are all versions of the result stored? Can a comparison be made between
the first and last processed results to see how the value changed? Where a change has been made, does the
system provide tools to highlight or flag human intervention (refer to Sections 6.3.1 and 6.3.2.1), for example as
part of exception reporting?
Fundamental within the consideration of processing is defining the controls for human intervention. The extent of
subjectivity in a manual interpretation can be reduced using an SOP to define when it is appropriate to intervene,
how, and what additional activities and reviews are required after the intervention; and this in turn may increase the
consistency of the processing. This was mentioned in Graph Point , laboratory example, in Table 6.1.
Physical security may be the only available way to reduce a risk, for example, PID controllers lack logical access
controls and can only be secured in a locked control cabinet. Where there is a residual risk that cannot be addressed
by system technical controls, consider what, if any, procedural controls could reduce this risk (while recognizing that
procedural controls are, at best, optional for the operator to follow and therefore cannot offer reliable mitigation).
At system periodic review, evaluate the residual risk to assess whether, for example, a newer software version can
provide the appropriate technical control.
Validating the system technical controls should give assurance that the controls will consistently perform as intended
to reduce the data integrity risk associated with automated processing.
In the case of human intervention in the processing activity, additional human review may be required to evaluate the
justification and validity of the intervention and consequent process result. Validation of the computerized system has
no impact on the requirement to review human intervention, as the user is able to impact the processed result.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 75
Data Integrity by Design
Validation of computerized systems is well documented in ISPE GAMP® 5 [9], with detailed application of those
principles to particular system types discussed in the detailed ISPE Good Practice Guides [41, 42, 43]. For this
reason, extensive guidance on is not reproduced here.
As described in Appendix S2, the most recent iteration in risk-based approaches to validation, Computer Software
Assurance (CSA), relies on critical thinking and a thorough understanding of the risks associated with the business
process to define the approach that should be used to validate the system for its intended use.
The validated system must be placed under change control and configuration management to maintain the validated
state throughout the operational life of the system, especially during upgrades or changes of use. The processes and
procedures for the operational phase are covered in the ISPE GAMP® Good Practice Guide: A Risk-Based Approach
to Operation of GxP Computerized Systems [21].
Review, reporting, and use of data are the phases in the data lifecycle where the data generated fulfills its purpose
in the business process. Before the data can be used there must be a review step to assess the integrity of the data
and to determine whether specifications, targets, limits, or criteria have been met. Data review is essential to evaluate
if the data is fit for its intended use in a regulated environment and representative of the actual process or product
state. Data review and the associated audit trail review are detailed in ISPE GAMP® Guide: Records and Data
Integrity, Section 4.4 and Appendix M4 [8].
Creating strategies for reporting, review of those reports, and outlining the use of that information provides
increased assurance of data integrity and is an essential part of data management. See Figure 2.3 for a schematic
representation of the correlation between data and system lifecycles.
It is essential to apply a risk-based approach when establishing a data review process. The risk is based on the
complexity of a system, the controls that are in place within the system and data lifecycles, and the criticality of the
decision that the data will be used to make.
The scope and rigor of routine data review may be much narrower than a review conducted as part of an investigation
where, for example, a review of training and site attendance records may be included to verify that personnel were
onsite and trained to complete an activity (as discussed in ISPE GAMP® RDI Good Practice Guide: Data Integrity –
Key Concepts Appendix 18 [16]).
During routine data review, the scope of the supporting data records reviewed may be limited to records of
preceding activities, for example, laboratory notebooks reviewed as part of evaluating a chromatographic result. If a
chromatography method is locked to only allow the analyst to select the method, then the reviewer verifies the data
and correct method selection. In contrast, if the method is not locked down, the reviewer needs to ensure that all
method parameters and associated data are set as defined in the method.
Reviewing data is not limited to the result alone – the metadata, including a history of the data (any audit trail of
operator actions that create, modify or delete regulated data), must also be included in the review as it is inherent in
providing the context and meaning of the GxP data. Where data is transferred to a higher-level system, such as CDS
to LIMS or PCS to MES, it is not always possible to transmit all of the metadata, in which case some level of review
may need to occur in the source system.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 76 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Data cannot be reviewed effectively in isolation from its metadata, including an audit trail of changes to that data. As
defined in ISPE GAMP® Guide: Records and Data Integrity Appendix M4 [8], there are three main types of audit trail
review:
• Review of audit trails as part of normal operational data review and verification:
- For audit trails that capture changes to data, a review of each audit trail record must occur prior to data
release. Where a system lacks necessary audit trails, a risk assessment should be performed to establish
appropriate mitigation. These mitigations may include improved procedural controls, real-time verification, etc.
• Review of audit trails for a specific data set during an investigation (e.g., deviations or data discrepancies):
- During an investigation the review may not be limited to audit trails, but may include a review of the system
logs, such as system configuration and operational events.
• Review and verification of effective audit trail functionality (e.g., verification of audit trail configuration as part of
periodic review).
Reviews can be done 100% manually; however, this is time-consuming and may not detect all of the issues. A system
that offers functionality, such as automated reporting tools and alarm or exception reports, may reduce the checks
needed by a human reviewer while improving the overall detection rate of data integrity issues and errors [12].
The use of electronic signatures in reporting and approval allows a direct linkage between the signature and the data
to which it is applied, and can streamline the reporting process to reduce the time to release data and/or product.
• Spotting patterns, having instincts and “gut feelings” (the feeling of “oh, there’s something not quite right here”)
Human reviewer time is a finite resource and is best saved for investigating suspect data in careful detail to decide if
there is evidence of a data integrity issue.
The most effective data review approach is to use computers and software for general screening of all results to
identify suspect data, and then to use people to investigate in careful detail data flagged by the computer as “suspect
data” or high-risk samples to decide if it is evidence of a data integrity issue. This is the principle of review by
exception.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 77
Data Integrity by Design
“An ‘exception report’ is a validated search tool that identifies and documents predetermined ‘abnormal’ data or
actions, which require further attention or investigation by the data reviewer.” [24]
• A computerized system that offers exception-reporting capabilities, with the ability to configure the limits and
specifications of the exception report
• Detailed product and process understanding feeding into a detailed data integrity risk assessment to identify
what should be automatically evaluated in an exception report
• Robust verification of any exception-reporting tool, including positive and negative case testing
• An audit trail of changes to the exception-reporting tool or limits, available for review as part of the data review
process
• A documented review process identifying the actions to be taken by the human reviewer in response to suspect
data flagged in the exception report
• Training for all review personnel on how to leverage review by exception, including training on the system
features and functionality used for a detailed review of the original data
Dynamic data must be reviewed electronically because a paper printout cannot represent the original record [23].
The data reviewer must have the skill and training in both the business process and in the software application to
competently review the data in electronic format.
Significant gains in efficiency can be achieved through the use of reports or views for consistent review of data.
Such reports/views need to be validated and secured against unauthorized alteration to ensure they represent the
complete data set needed for review.
Many software applications have important metadata stored separately to the data, for example, in a separate
audit trail documenting the operator actions that have created, modified, or deleted data within the system. Some
systems may store such metadata across multiple locations, not all of which are named audit trails. There may also
be versioning information with details on changes to methods or recipes stored as part of the method or recipe. It is
essential that the metadata is reviewed in conjunction with the data itself to provide the GxP context and meaning of
the data.
• What information or metadata is relevant to assess the integrity of the data under review (irrespective of whether
or not it is stored in something called an audit trail)?
• How and when should the rigor of the review be escalated, e.g., what constitutes suspect data requiring
additional review effort?
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 78 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
• Where such processes are used, is the relevant metadata included in that review by exception? What are the
triggers for additional levels of human review?
One way to mitigate the risk of reviewing audit trails independently of the data is to have a record with its associated
metadata embedded or viewable within the data review window, rather than reviewing a separate audit trail log of
actions. With the increased focus on patient safety through data integrity, it is becoming increasingly important to
provide easy visibility and access to the metadata with the data for review purposes.
The best person to review the data and metadata is someone who understands the business process that the data
supports, e.g., sample analysis in a QC laboratory or a manufacturing batch operation, should review both the data
and metadata together. Where related or supporting data is located in a separate system, a person knowledgeable in
that system should evaluate that data. For example, the validity of a chromatography result in the CDS is dependent
upon the sample preparation data recorded in a paper laboratory book or ELN. Excluded data should be included in
the review process, such that the reviewer makes an independent decision as to whether or not the exclusion was
scientifically justified, e.g., the system suitability failures before the samples were introduced. A detailed SOP should
outline the review process, and when and how to scale the scope and rigor of the review.
In the same way that not all relevant metadata is contained in something called an audit trail, not all metadata in the
audit trail needs to be part of routine data review. The term “audit trail” is frequently misused and misunderstood, and
a formal assessment should be made of what metadata needs to be reviewed, where it can be found, and how and
when that review will occur. Records of changes to system configuration, user accounts, etc., need to be considered
for periodic monitoring of the ongoing system controls and compliance.
Further guidance on data review and audit trails is found in the ISPE GAMP® Guide: Records and Data Integrity
Appendix M4 [8].
It is important to minimize the number of systems that a user needs to access to complete the data review. In an ideal
world all data, including metadata, can be transferred to the highest-level system (e.g., an ERP for a manufacturing)
in an organization and reviewed and approved electronically in that single system. When transferring data between
systems, there should be an entry in the audit trail that data has been exported from the sending system and a
corresponding entry in the receiving system’s audit trail. Detail on the design and validation of interfaces is covered in
ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts Section 4.4 [16].
In reality, the review is often performed in the source system, with only the final results and electronic signatures
passing to higher-level systems. At the enterprise level, it may only be a summary report that gains final approval
for batch release, so it is critical to patient safety and product quality that the data flow and relevant metadata are
understood throughout the business process and reviewed at each level.
With manufacturing data, it is common to need to reapply elements of the metadata (e.g., units) to the data when it is
received in the target system. This requires an additional level of verification to ensure the metadata is consistently
and correctly reapplied. Metadata transmission in manufacturing systems is covered in detail in the ISPE GAMP®
Good Practice Guide: Data Integrity – Manufacturing Records Section 3.3.2 [15].
It is recommended to analyze the metadata elements in the source system, if they can be accepted by the higher
systems upstream (and how the fields are represented), and the purpose of those metadata points (what value do
they bring). For each business process, the business process mapping and data flow (see Chapter 4), helps identify:
• Whether or not it is possible to transmit all required metadata with the regulated record to the next-level system,
e.g., from CDS to LIMS or from PCS to MES
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 79
Data Integrity by Design
If it is not possible to transfer all of the required metadata, the regulated record needs to be reviewed in the source
system.
The specific requirements for the use of electronic signatures with regulated records have been documented since
1997 [10] and include signature metadata:
When establishing a process for the use of electronic signatures, it is important to consider the distinction between
a signature required under the predicate rules versus a need to authenticate on the computerized system to confirm
identity and access.
With the current focus on managing data electronically it is important that the electronic signature [12] is:
Where Single-Sign-On technology is used to access a GxP application, it does not obviate the need for the entry of at
least one electronic signature component (typically this is the user’s password) before an electronic signature can be
generated.
Where there is the possibility to later edit a signed record, it must be clear that the signature was associated with an
earlier version of the record and does not apply to the edited record. Signatures should be permanently linked to the
version of the record signed, with an obvious representation in the original signed record and absent in the edited,
unsigned version of the record. This can be achieved by a visible signature manifestation or by applying a state
change (e.g., from “open” to “approved” on signing, and then back to “open” on later editing) to an electronic record.
Users of the system and data must be trained and understand the implications of such manifestations and state
changes, to ensure that any decision-making only occurs on approved data.
Some software applications apply signatures as proof of use authentication at steps in the process where no
predicate rule requirement for a signature exists, for example, requiring entry of a username and password to begin
an analysis and recording a signature against the start of the analysis. The predicate rule requirement is for attribution
of the analysis start to an individual operator but no signature. The signature is only required on the final result, as
an accountability for having generated that result. Such excessive application of electronic signatures brings no
additional compliance or patient safety benefits and should be avoided.
Below, batch release and eConsent are presented as examples with very specific requirements for electronic
signatures. It is important to identify specific requirements that are applicable to an organization and the business
process.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 80 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Batch Release
Where an electronic signature is used for batch release, only a specifically authorized person (or in Europe, Qualified
Person) should have the access to apply such a signature [11, 25].
eConsent
The use of electronic consent (eConsent) in clinical trials is approached quite differently to electronic signatures used
with other regulated GxP records, and may include any of the following as electronic signatures as stated in MHRA/
HRA Joint statement on seeking consent by electronic methods [44]:
• Typewritten
• Scanned
The MHRA/HRA Joint Statement [44] places the onus on the study personnel to justify the method of applying the
electronic consent signature based on:
• The risks, burdens and potential benefits (to the participants and/or society); and
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 81
Data Integrity by Design
Semi-active data is needed only periodically, having fulfilled its primary use in support of decision-making while it was
active. It is an internal company decision defined within a documented policy/procedure as to when records become
semi-active and inactive.
Some organizations keep semi-active records in the originating system (ideally in a restricted access or read-only
state) until all periodic use of the record has passed, for example, the annual product review has been completed and
the data will not be included in future trending or control charts. As the records transition to inactive, this could be the
trigger to transfer the data into a separate archive system to serve out the remainder of its retention time.
Other organizations choose to transfer records to the archive as soon as they transition from active.
The advantages and disadvantages of keeping data in the originating system versus a dedicated archive system are
discussed in Section 3.3.
Records need to be retained for the ongoing support of patient safety and product quality: pre-clinical toxicological
and pharmacological data, clinical trial data, adverse event reporting, batch release data, donor data for blood and
tissue, for example. The data should be available to reconstruct the activities and decisions used to support the
development, manufacture, and release of drugs. The regulatory requirements for retention are intended to ensure
the data is available for at least a minimum period commensurate with the use of the data.
Regulatory requirements define the need for electronic records retention over a protracted period of time for safety
and efficacy data, and vary by country. (See Appendix O4 for examples of requirements for record retention.) The
expectation is that data be maintained in a human-readable form so that it may be reviewed in the same fashion as it
was when first used to make critical decisions.
Implementation of business solutions to meet the regulatory requirements of preserving data integrity and maintaining
the records in human-readable form may prove challenging for organizations when extensive time periods are
involved. There often comes a time in the lifecycle of data where maintaining the record in its dynamic format is no
longer realistic and a company will need to convert the dynamic record into a static format that captures as much
content and meaning as possible given the static nature of the record. Any loss of content and meaning should be
captured in a documented risk assessment. The connotations of static and dynamic data are discussed later in this
section. See ISPE GAMP® Guide: Records and Data Integrity [8] for further details including consideration of the
prospective use of the data.
As described in Chapter 3, it is important to plan the retention strategy for a system in the beginning of its lifecycle.
An organization must ensure that the accuracy, content, and meaning of records is maintained throughout the
retention period. Record retention and archiving are different concepts, but archiving can be used to meet electronic
records retention requirements.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 82 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Data must be readily available throughout the retention period, based on organizational and regulatory needs.
Wherever data is archived, procedures must be established to ensure that the archived data is maintained against
disaster and loss. Data must remain accessible and readable.
One traditional view of archiving is to move data from the online production server to off-line or near-line storage
in a separate application or read-only database. Another approach is to retain the records and data in the original
application, provided that they are logically separated from live data and are set as read-only. These approaches are
discussed in Section 3.3. There may be a change in data stewardship when the data is transferred to an archive,
however, data ownership remains unchanged as discussed in Section 2.2.
The intention of the rest of this section is to go into further detail of aspects of record retention within the data lifecycle
and system lifecycle. In doing this, the intention is not to repeat information presented in the ISPE GAMP® Guide:
Records and Data Integrity [8] and in the ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts [16].
Based on the data flow diagrams (described in Section 4.3), the regulated data identified therein needs to be retained
and readable for the mandated retention period as defined in Appendix O4.
Data may not be limited to records in one application alone and ensuring that the data is readable may depend on
multiple applications interfaced together, each with version dependencies. For example, the sample could be a chain
of data that runs through a LIMS, including imported data, methods, analyzed data, and summarized data. It needs
to be ensured that none of these records has been adversely affected by changes to the LIMS system or the source
system. Additionally, metadata such as the audit trail should be checked, because if the audit trail is negatively
affected, then data integrity could be permanently lost.
Best practices for general archiving are presented in Appendix O2, and specific considerations for GLP archiving are
contained in Appendix O3.
It was discussed in detail in the ISPE GAMP® RDI Good Practice Guide: Data Integrity – Manufacturing Records
Section 3.2 [15] that the objectivity and consistency around how the regulated record is produced may impact
whether the initially acquired data points need to be kept. With validated controls and configuration management
regarding the scaling and calibration settings, the processed data values (e.g., temperature in °C) may be sufficient
and the initially acquired values (e.g., analog signal in mA) unnecessary. This is very different to the situation with
chromatography data where the initially acquired values (e.g., channel data from the detector) must be kept because
the value of the processed result can be significantly impacted by the integration parameters applied by the analyst.
Ideally all dynamic data is retained in a dynamic format throughout the retention period, however this may become
unfeasible over time (see Section 7.2.1). For the regulated data to be retained, it is therefore important to consider:
• The intended use of the data ongoing when it is semi-active and inactive, i.e., is it retained purely for regulatory
and audit requirements?
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 83
Data Integrity by Design
• The likelihood of a prospective need for the data to be reprocessed (and how much will this likelihood decrease
over time)
These considerations, addressed in a risk assessment, impact how and when the data is to be archived; the aim
must always be to preserve the GxP content and meaning of the regulated data throughout the retention period.
From a business perspective, the importance of the stored data is likely to decline over the retention period. Clinical
trial data may be used as an example. Drug recalls are more likely to occur early in the licensed use of the drug; thus
the clinical trial data represents the major drug safety and efficacy data available during this time. After a long period
of use of the drug, the clinical trial data maintained as dynamic records becomes less significant as data from patient
use of the drug provides more extensive information on a wider scale. It should be noted that if existing clinical data is
re-used for a new submission, the retention period is restarted and may require continuation to maintain the data in a
dynamic format.
During corporate acquisitions or outsourcing activities there needs to be a process that verifies the data, metadata,
and the ability to open records in their dynamic format for review is maintained throughout the transition between
companies. See Section 3.7 for further information on records management through mergers, acquisitions, and
divestments.
Inactive, archived data is unlikely to be routinely accessed by the generating departments. Therefore, it is highly
recommended that the data is protected from access, specifically access to edit, once it is deemed to be inactive and
archived. This might include a process to migrate the data away from the department or division that initially collected
and stored the information [23], unless that data is required for a specific review project, regulatory submission, or
investigation.
It is likely that a record may need to be migrated at least once during its retention period. Each migration carries a
risk to the content and structure of a record and potential loss of metadata. Taking the concept of reduced usefulness
over time into account enables the migration risk to be balanced against the relevance of the record, perhaps making
rendition to a static format, e.g., PDF more acceptable. If data is to be migrated, it should be done following an
approved process, procedure, or protocol. Data migration is discussed in Section 3.5.3.
Many organizations use a data lake (also called data fabric, data mesh, data swamp, etc.) as a storage repository
for all structured, semi-structured, and unstructured enterprise data to enable them to visualize, report, and perform
data analytics. A data lake allows the storage of all data of an organization without requiring the creation of data silos,
and is heavily reliant on structured nomenclature (see Section 4.7). The use of a data lake is further discussed in
Appendix S1.
Static data has been defined within guidances as listed in the ISPE GAMP® RDI Good Practice Guide: Data Integrity
– Key Concepts Appendix 5 [16].
“Static is used to indicate a fixed-data record such as a paper record or an electronic image.”
“A static record format, such as a paper or electronic record, is one that is fixed and allows little or no interaction
between the user and the record content. For example, once printed or converted to static electronic format
chromatography records lose the capability of being reprocessed or enabling more detailed viewing of baselines.”
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 84 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
From these definitions it can be seen that some original records may be created in a static record format: a completed
paper form, a photograph, a printout from an instrument that does not keep electronic records where the printout is
the only record created. A true copy of static paper records can be created as a static electronic record via a scanning
process with verification that all information has been captured with no loss of color or pages. The amount and types
of quality checks performed on the scanned records should be based upon risk.
There should be a documented policy with respect to the destruction of original paper records. For example, if there
is potential for litigation around the records, then it may be inappropriate for the company to destroy original records.
Other static records might be generated as copies of original records that were created in a dynamic electronic format
(note that a static record cannot be a true copy of dynamic data). Static records of this type need to be considered as
one of two distinct kinds of records:
• Copies containing as much meaning and context of the original electronic record as can be converted into a
static format but containing, at a minimum, the final or reported data
• Summary reports used to present data into a meaningful format to make further quality decisions. (The use of
summary reports is discussed later in this section.)
Static electronic records are commonly stored in PDF format. It is assumed that PDF records are static as they
cannot provide the kind of user examination or interaction with the record as that of a dynamic record. Equally, there
is an expectation that static PDF records cannot be modified or altered by users and therefore there is no requirement
for levels of access control or audit trails for PDF records. However, as the capabilities of applications that read
or display PDF documents increase, it is essential to challenge the integrity of static PDF records. The abilities to
edit, delete pages, reorder pages, sign, un-sign, or re-sign PDF files, or convert to editable formats like word and
then back to PDF, without the technical controls expected for electronic GxP records, need to be considered for risk
assessment when relying on PDF record formats.
Dynamic data has been defined in the ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts,
Chapter 3.1.4 – Data Requirements [16], under the discussion of “true copies:”
“dynamic data requires additional processing to derive a meaningful value (reportable result) that can be used for
decision making.”
An extensive suite of regulatory definitions for dynamic data can be found in the ISPE GAMP® RDI Good Practice
Guide: Data Integrity – Key Concepts [16].
When archiving inactive electronic records, the possible future requirements for examination, review, and
reprocessing need to be considered before deciding on a format or transformation of the data.
In most cases, the simplest and most successful format to ensure future processability is the original vendor format,
with an intent to restore the original electronic dynamic data into the same or updated version of the application. This
should make certain that all metadata, audit trails, and traceability between metadata, as well as the dynamic nature
of the record, are preserved. It should most easily allow reconstruction of the data as it was reviewed at the time of
the quality decision it supported. The difficulty with this approach is that it requires that the original application is still
available and capable of reading older records.
One key situation where summary records may be used is where the original records are held and maintained by a
third-party contracting company, while the regulated company is only provided verified summary data.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 85
Data Integrity by Design
Summary documents do not purport to be true copies of the original record but must be verified [23] to include only
data or results that have been reviewed and approved by looking at the original complete record and the key data
recorded in the summary accurately. Typically, the summary is verified as an accurate (but not a true and complete)
representation of the original data. One example is a Certificate of Analysis, which summarizes results from multiple
analytical tests, whose original records exist in several different GxP systems.
Key features in a dedicated archive system could include (but are not limited to) the ability to:
• Capture and manage data from multiple sources and in multiple formats, including both flat files and database
files
• Capture data based on user-defined criterion (time or event based) and/or manual tagging of the files as ready
for archiving
• Generate a secure log entry of the capture of the archive files from the originating system
• Restore archived files to designated locations for further viewing or reprocessing in the native applications, and
document such activities in a secure log
• Indexing and tags of records and record content for ease of retrieval
• Set a minimum retention period for individual records or folders (including an option for legal holds to indefinitely
extend the retention period if required). (Retention periods and legal holds are discussed in Section 7.4.2.)
Retention is a challenge especially for electronic records with the constant changes in technology leading to system
upgrades or system replacement. When a computerized system undergoes an upgrade, it must be determined if
the new software can restore and read data archived from previous versions of the system, as well as all supporting
information to enable the reconstruction of the activities.
Verification from the vendor of the system should be sought to ensure that historical dynamic, previously archived
data can be restored in any future installed or upgraded version of the application based on:
• There may be a need for an intermediate transformation, or a bulk migration process of previously archived data
• There may be a dependency to restore records from the archive to the live system, and then re-archive from the
newer application version
• There may be a process to update the archived records, which requires validation to ensure the records could
be retrieved, but the process is only carried out if archived records were specifically required to be revived
Another challenge to records retention is that there are no common data standards, as discussed in Section 3.5.3. If
the upgraded version of the computerized system cannot restore previous versions of data or if the decision is made
to retire the current system and obtain a different computerized system, a process needs to be determined to ensure
the continued retention and readability of previously acquired data. This can be extremely challenging for a company
when system upgrades are sometimes not optional. It is important to understand the impact of the system changes
on the data and, if possible, plan for them.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 86 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Throughout the system lifecycle, there are many system changes and upgrades requiring assessment as to the
readability of the inactive records as well as any archived records. As part of system periodic review, a representative
sample of archived inactive data may need to be retrieved and checked to determine that they still are in human-
readable form through the original application. This is especially important when upgrades to operating system and
application software are occurring and the review frequency should be aligned with those changes. Data may have
been archived from a much earlier version of the application.
However, the ability to restore the data to its original or updated application depends on that system still being in its
active lifecycle state, i.e., that the application is still available and validated for use, either still generating new records,
or for reading historically archived data. See Appendix O5 for a discussion of managing legacy software.
To avoid the possibility that archived data cannot be restored into the latest version of the application, it may be
necessary to keep all data live in the environment of the most recent version of the application. A second instance of
the application can be deployed, specifically to house inactive archived data. This application must be updated to the
same version as the production system, and the complete data it contains is migrated for each upgrade, ensuring it is
able to be read in the most recent version.
At any point during the retention period, data may need to be retrieved for further viewing, and in the case of dynamic
data, reprocessing. In effect, the inactive data must be restored to an active state, usually within the originating
system or a copy thereof.
• Define the scope of the archived data to be restored and the justification for doing so
• Authorize the restoration of archived data into the originating system or compatible application
• Ensure the restored data is highlighted, and where possible segregated at least by date, from the current data if
restored into the originating system is still in operational use; this is to prevent confusion with current data
• Reprocess the restored data to obtain new and alternative results if required
• Manage the retention of new results including maintaining data integrity and archiving of those results in
compliance with current active record policies, procedures, and controls
• Assess and document the subsequent impact those results may have on previously approved decisions (e.g.,
batch release), trending, and annual product reviews
• Ensure the restored data is deleted from the originating system after the investigation is completed, with the
deletion formally approved and documented
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 87
Data Integrity by Design
7.4 Destruction
7.4.1 Data
Data destruction is the final stage in the data lifecycle and often the most difficult to manage. However, because this
process is, in many cases, irrevocable and involves the deletion of GxP critical data, it warrants special consideration.
There are several driving forces for deleting data at the end of the data lifecycle:
• A contractual requirement between a sponsor company and their CRO/CMO requires deletion of the sponsor
data after a specified time
• To remove the management burden of records that no longer need to be retained. The cost of discovering each
electronic record during litigation processes is significant, and increases in proportion to the volume of records
stored. Therefore, limiting the volume of records in a controlled manner saves considerable cost, and also
optimizes the time taken to retrieve records.
• To remove confidential or financial records that are no longer required for regulatory purposes and that present a
business risk to the organization if retained and accessed by unauthorized individuals
Irrespective of the drivers for deletion, the process used should take account of the following factors:
• A reverse step method is recommended to enable the deletion to be reversed in the case of serious failure.
• Irrespective of whether the deletion process is automated or manual, it is essential to ensure that only the
appropriate records have been deleted. Where an organization operates a fully automated deletion process,
stringent validation of the process should be carried out. In addition, the automated process should contain in-
built verification of the correct operation of the process with error notification. The performance of the deletion
process should further be regularly reviewed for correct operation.
• Any litigation or legal holds must prevent the data from being deleted, and a check of any legal hold status must
be part of the process for the deletion of data.
• The data owner is responsible for ensuring that the correct data is deleted, i.e., that data still required by
regulation or the business is not deleted. This means that the data owner has a central role to play in the deletion
of data.
Due to the criticality of the process, it is recommended that in addition to approval by the data owner, there is process
owner and QA approval for the deletion of GxP data. In a contract situation (CRO/CMO/contract testing laboratories,
etc.) the data is owned by the contract giver (e.g., sponsor company), and explicit approval for the deletion should be
obtained in advance from that contract giver. Additional approval by the legal department may also be put in place.
Such approvals should be at a stage in the deletion process before the data is irrevocably lost.
There may be occasions when the business choses to retain the data after the mandated retention period. For
example, based on ICH E6 (R2) [45] GCP requirements, clinical records could be destroyed 2 years after the last
approval of a marketing application, but in reality, that same data will be required when the company adds new
formulations and uses the same data in a new submission.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 88 ISPE GAMP® RDI Good Practice Guide:
Data Integrity by Design
Note that it is sometimes necessary to retain data beyond its retention period for a number of valid business reasons,
including where there is a possibility of it being required in connection with legal proceedings; consideration should be
given to a legal approval step prior to deletion. (Product liability and patent defense considerations are out of scope
for this Guide.) Time-based data destruction may not be supported in all systems, again leading to data surviving
beyond the retention period.
Data destruction is performed when all legal, regulatory, and business requirements have expired, for example, 15
years after the record is created. The destruction phase ensures the correct original record, any true copies, and any
uncontrolled copies are appropriately destroyed. Consideration should be given to managing the disposal of records
even from the backup copies of the data. The disposal of data should be performed following an approved disposal
protocol where verification of the appropriate records is performed prior to final destruction. Section 4.6 in the ISPE
GAMP® Guide: Records and Data Integrity [8] provides procedural requirements and additional information related to
data destruction.
7.4.2 System
As discussed in Section 7.4.1, there is a reluctance within the regulated industry to actively destroy data even at the
end of the mandated retention period. Once the decision has been made to proceed with data destruction as the
final step in the data lifecycle, there are systems available that provide a consistent, validated mechanism for the
controlled and automated deletion of records based on a data retention policy, making the destruction process simple
and well documented.
When choosing an archive system with the intent to use automated data deletion, it is essential that the system
includes the ability for an authorized user to:
• Set a project retention policy defining that the data is needed for a specified time after archiving, such that
the system automatically retains the data in that project throughout the specified retention period before
automatically deleting it
• Set different retention policies for different data folders, projects, or studies including a “never delete” policy (e.g.,
Table 16.1 in Appendix O4, there are different retention periods for batch data versus validation data versus
clinical data, etc.)
• Configure the system to exclude specific data or records from the retention policy within a folder, project, or study
to prevent their automatic deletion (e.g., for records required to be kept as part of a legal hold)
• Generate a system log of data deletion detailing the records deleted, from which folder, project, or study and
when (note that this is not called an audit trail as the deletion is not an operator action but an automated system
action)
• Control, by granular access privileges, who is able to set or amend project retention policies and record legal
holds
• Capture in an audit trail the setting of, and changes to, retention policies and legal holds, including the option to
enter a reason for the change
The data destruction features listed above should be risk assessed and validated based on their risk priority. It is
essential that legal holds are applied immediately upon notification of a possible need for the data for legal purposes.
Critical thinking should be applied when assessing and evaluating the use of automatic deletion as well as the
controls needed around such an activity, as it may not be appropriate for all data, situations, or organizations.
Data retention is not only governed by life sciences regulations, but also by business, financial, safety, and
legal requirements, etc. It is important to ensure that the organization’s destruction policy involves all impacted
stakeholders.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 89
Data Integrity by Design Appendix M1
Appendix M1
8.1 Introduction
With the vast amount of data generated from the development, manufacturing, and marketing of products in the life
science industry, KM enables organizations to make decisions more efficiently and encourages an organizational
culture of learning. It drives and facilitates continual improvements to data integrity and product quality, resulting in
increased patient safety. This appendix presents the concepts and discusses tools to help organizations increase use
and advance their maturity level in KM.
KM and QRM are identified as the two enablers to Pharmaceutical Quality Systems in ICH Q10 [32].
“Systematic approach to acquiring, analysing, storing, and disseminating information related to products,
manufacturing processes and components.”
“Sources of knowledge include, but are not limited to, prior knowledge (public domain or internally documented);
pharmaceutical development studies; technology transfer activities; process validation studies over the product
lifecycle; manufacturing experience; innovation; continual improvement; and change management activities.”
QRM has been extensively discussed in ICH Q9 [27] and ISPE GAMP® 5 [9].
The DIKW pyramid is the commonly used model in the area of data science to describe the relation and hierarchy
of data, information, and knowledge. The DIKW hierarchy is considered foundational in many information science
curriculums and is commonly represented as a pyramid with the foundational base of data [46]. An example is shown
in Figure 8.1.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 90 ISPE GAMP® RDI Good Practice Guide:
Appendix M1 Data Integrity by Design
In the context of the DIKW hierarchy, the following definitions provide context:
• Data: Symbols that represent the properties of objects and events [47]
• Information: Information consists of processed data, the processing directed at increasing its usefulness [47],
e.g., data with context
• Knowledge: As defined by the Cambridge Dictionary [48], knowledge can be described as: awareness,
understanding, or information that has been obtained by experience or study, and that is either in a person’s
mind or possessed by people. However, in the context of an organization, knowledge can be a combination of
content (explicit knowledge), information, as well as tacit knowledge.
• Wisdom: Wisdom is the ability to act quickly or practically in any given situation [49]
An alternative to the DIKW triangle, replacing wisdom with insights has been proposed as a reflection of the use
of current technology tools and approaches [49]. Insights is more fitting in the current day, as wisdom is widely
agreed as a “uniquely human” characteristic. Insights may be derived by people with knowledge and experience;
however, new trends suggest that insights may also be derived by new computing or AI models that identify trends
and correlations previously not possible to see with experience alone [49]. Figure 8.2 utilizes the concept to view the
relationship of the producers of data and information and the consumers. This notion is driven by a strong foundation
of data that enables information, knowledge, and insights.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 91
Data Integrity by Design Appendix M1
Data transforms into information by assigning a meaning or context to data. Furthermore, the accumulation of a
data bundle or the linking of various data can also represent information. The moment the data is processed, linked,
and stored, whether by a machine or a human being, it becomes information. The ability to apply that information
appropriately translates into knowledge. Prospective utilization of information and knowledge leads to insight. Context
is essential throughout the information, knowledge, and insight stages.
One’s process information may be another’s data, e.g., bioanalytical results are data inputs for pharmacokinetic
analysis.
The foundations of ICH Q8 [50], Q9 [27], Q10 [32], Q11 [51], and Q12 [52] build upon science, application of risk-
based approaches, and utilization of prior knowledge. The ability to capture, store, and provide visibility of product
knowledge is critical to enable the development, application, and manufacture of medicinal products as well as to
support continual improvement and post-approval changes. Product knowledge is typically classified as explicit
(documented or codified) or tacit (intuitive) and may be developed or acquired from data generated during research,
commercialization, manufacturing, and continual improvement activities.
Figure 8.3 provides a representation of how the data governance framework provides controls for data integrity/data
quality and links with QRM, KM, and data management. Leveraging the data integrity governance and framework
(inclusive of the quality-culture mindsets and behaviors discussed in Section 8.4) and actively utilizing QRM (middle
of diagram) provides high quality foundational data that an organization can use to create information, knowledge,
and insights.
It is important to remember that achieving insights is not the ultimate destination but a waypoint on the journey.
Proactive KM provides feedback into the data governance framework, allowing for optimization of the organization
and driving continual improvements to the business processes.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 92 ISPE GAMP® RDI Good Practice Guide:
Appendix M1 Data Integrity by Design
This is reflected in EU GMP Part 1, Chapter 1, Section 1.4 [1], which states:
“A Pharmaceutical Quality System appropriate for the manufacture of medicinal products should ensure that:
(ii) Product and process knowledge is managed throughout all lifecycle stages; …
(xi) Continual improvement is facilitated through the implementation of quality improvements appropriate to the
current level of process and product knowledge.”
Reflecting on the DIKW/I model [46], this underscores the importance of data that is fit for purpose with both data
integrity and data quality, and can therefore contribute to product and process knowledge. As discussed in Section
1.6.4, data quality is defined by the OECD [13] as:
“Data quality is the assurance that the data produced are generated according to applicable standards and fit for
intended purpose in regard to the meaning of the data and the context that supports it. Data quality affects the
value and overall acceptability of the data in regard to decision-making or onward use.”
It is important to remember that data quality is not synonymous with data integrity and the controls associated with
data integrity (ALCOA+) do not ensure the quality of the data generated. This is further discussed in Section 1.6.4,
and Appendix M2 shows examples of a lack of integrity and/or quality.
Knowledge, like data, has a lifecycle as depicted in Figure 8.4. The knowledge lifecycle can be described as:
“A continual cycle that describes how knowledge moves through an organization.” [53]
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 93
Data Integrity by Design Appendix M1
The knowledge lifecycle is similar the data lifecycle (Figure 8.5). It is interesting to note that the APQC’s (American
Productivity & Quality Center) [54] knowledge flow process deals with the creation, processing, review, reporting, and
use aspects of the data lifecycle. However, in the regulated industry it is important to ensure the integrity of the data
throughout the mandated retention period. It is also essential during the retention period that the data is retrievable
for review in support of regulated processes or for regulatory inspection.
The ICH Q10 [32] KM definition suggests a systematic approach. Case studies within and outside of the
biopharmaceutical industry show that multiple approaches, for example, content management guidance, the
application of data taxonomy (see Section 4.4 on data classification and Section 4.7 on data nomenclature for
further details), lessons learned, communities of practice, etc., are often warranted to help manage knowledge
across the pharmaceutical product lifecycle and the supporting business processes. Organizations with best in class
KM capabilities often measure KM maturity and take a programmatic approach to managing knowledge. APQC4
developed their Knowledge Management Capability Assessment Tool (KM-CAT™) [54] in 2007, which is industry
neutral and measures KM capabilities and their respective level of maturity. The KM-CAT™ measures 5 levels of
maturity over 4 categories inclusive of 12 subcategories with 146 questions.
A key outcome of KM is facilitating knowledge flow. A strong foundation of data, information, and knowledge is
required to enable the objectives of ICH Q10 [32]:
4
APQC [54] is a not for profit research organization that is the recognized leader in the practice of knowledge management.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 94 ISPE GAMP® RDI Good Practice Guide:
Appendix M1 Data Integrity by Design
Organizations should proactively consider how knowledge associated with products and pharmaceutical processes
will be developed, curated, and used, and develop standard approaches to managing such knowledge. Standard KM
tools and processes may be utilized to enable knowledge flow. Examples include:
• Taxonomy
• Lessons learned
• Expertise location
• Provide forums and processes to connect people and expertise to tacit knowledge
With the advances in technology and innovation, a strong foundation of quality data and information also may
facilitate using advanced computer systems leveraging AI/machine learning/deep learning to generate predictive
insights. Appendix S1 on AI and ML discusses such systems.
Quality data is required as a foundation, so that data can ascend the DIKW/I hierarchy (Figures 8.1 and 8.2) and
generate additional value to the organization and patients by feeding back to drive continual improvements to data
governance. Fundamentally, data, information, and knowledge are organizational assets and must be appropriately
managed to protect and ensure availability of such assets.
“The development, execution, and supervision of plans, policies, programs, and practices that deliver, control,
protect, and enhance the value of data and information assets throughout their lifecycles.”
One of the objectives of data management is for the organization to control its data resources, i.e., data governance.
An important element for both data governance and KM is a quality culture. The ISPE Cultural Excellence Report [55]
explores the term “cultural excellence,” proposing that within any given organization, there is not a separate quality
culture, safety culture, data integrity culture, etc. Rather, one primary corporate or organizational culture exists that
influences the behaviors and actions of personnel giving rise to quality, data integrity, and safety outcomes that matter
to the patient and the business. The six dimensions of the Cultural Excellence Framework [55] are:
• Gemba Walk
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 95
Data Integrity by Design Appendix M1
• Cultural Enablers
The ISPE Cultural Excellence Report [55] recommends moving from a culture of compliance towards a culture of
excellence, where:
“there is deep understanding throughout an organization of the elements critical to product quality.”
Quality culture is a foundational mindset required to create data that has quality, which is controlled throughout its
lifecycle to ensure integrity, which in turn enables the organization to use the respective data to create information,
knowledge and insights. The quality-culture mindset and the mindset to treat data and knowledge as an asset,
comparable to physical assets in manufacturing or laboratories, is necessary to deliver value to the business and
ultimately the patient.
Knowledge and insights, whether from people or systems, may be lost or impeded during mergers, acquisitions, and
divestments. Particular care should be taken to evaluate the respective corporate systems and ensure not only that
governance and procedures are reviewed but the tacit knowledge of such systems as well. An acquired company
may be forced to align to the corporate QMS (including data governance and KM) of the purchasing company, even if
the purchasing company has weaker processes.
8.5 Conclusion
In summary, ICH Q10 [32] recognizes QRM and KM as two enablers to a pharmaceutical quality system. The role of
quality foundational data is critically important for QRM and KM. Data is used in organizations to create information,
knowledge, and insights. The data, information, knowledge, and insights must be available to the organization so
they can use them to contribute to the objectives in ICH Q10 [32] (as reproduced in Section 8.1) and go beyond
compliance to deliver value to the business and the patient.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 97
Data Integrity by Design Appendix M2
Appendix M2
Integrity Compared to Data Quality
This appendix contains examples to aid in understanding the differences between data integrity and data quality, and
the need for both.
Table 9.1: Examples of Data Integrity and Data Quality (or Lack Thereof)
Manufacturing Bill of
Materials (BoM)
The BoM is approved by the right persons, is secure and unalterable
with a formal change control process, which includes formal revision and
review/re-approval, and an audit trail is in place to capture before and
after data values. The component parts and quantities listed are correct;
there is only one valid version available to be added to a process order at
any one time.
Vendor
Management in an
There are security and approval processes in place to ensure duplicate
vendors cannot be added to circumvent a vendor listed as blocked on the
ERP System ERP system. There is only one valid entry for a given approved vendor
on an ERP system.
Analytical
Laboratory [56]
The laboratory has new equipment, great training and an excellent quality
culture. Independent audits give them glowing reports. They create data
with a high amount of integrity that can absolutely be trusted. In contrast,
each manufacturing site has its own Electronic Batch Record System
(EBRS). They each have a different standard for describing materials,
procedures, methods, qualifications, and the like. No two sites describe
their processes identically for the same product, even though all groups
electronically submit data to the same laboratory. Quality would like to
assess the manufacturing capability of product XY across the nine global
sites where it is manufactured. Due to site-centric data descriptions, IT
has to create a different query at each site then combine them to provide
the quality unit with the data required for the assessment.
Contract Laboratory
[56]
The manufacturer conducts a cursory review of SOPs and deviations at
the contract laboratory every two years. If any deviations occur in the
contract laboratory, they are responsible for investigating and closing
them—but the contact laboratory receives no payment for deviation
activities. They are paid for the number of test results they provide
the manufacturer. This business scenario provides ample motivation
for the contract firm to take shortcuts in practices, conduct superficial
investigations, and release test results with inadequate review, and
few chances to detect the poor integrity of the underlying data. The
manufacturer can create necessary reports quickly and efficiently, but the
data in the reports could lead to incorrect conclusions about capability,
because the data in the report cannot be trusted.
The manufacturer has a single, global EBRS installation with strict data
management practices that ensure database attributes are defined only
once. Validated reports are available for routine operations. However,
each of the nine manufacturing sites uses a local contract laboratory to
conduct in-process and release testing. These contract laboratories keep
data on their local systems, entering the final reportable value in the global
EBRS system using a secured network connection in the laboratory.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 98 ISPE GAMP® RDI Good Practice Guide:
Appendix M2 Data Integrity by Design
The analytical laboratory example above demonstrates why data quality is needed and how data nomenclature
(see Section 4.7) helps with quality, and how data analytics (see Appendix O1) then relies on data standardization,
definitions, and libraries. A laboratory can have excellent data integrity, but if the analytical information used to make
a decision is delivered late, or not at all, then the data quality is useless.
“Data integrity’s focus is providing a value that can be trusted by users. Data quality’s focus is providing attributes
around data values (context, metadata) so values can be sorted, searched, and filtered in an efficient and timely
manner, confident that the complete data set is included.” [56]
It is important to remember that it is possible to have integrity without quality and quality without integrity, but that it is
essential to have both quality and integrity.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 99
Data Integrity by Design Appendix M3
Appendix M3
10.1 Introduction
There are multiple scenarios where regulated data is under the control of a third party (the contract acceptor). It is
important for the marketing authorization holder or sponsor company (the contract giver) to understand and document
the risks and controls to ensure that the roles and responsibilities are clearly articulated in support of maintaining data
integrity throughout the data lifecycle. This appendix presents some of the risks and data concerns associated with
the use of contract organizations (such as CRO, CMO, or other variants; collectively referred to as CxO or contract
acceptors) or cloud providers. The use of any of these third-party services results in similar data integrity concerns.
Key to the use of third parties is the need to appropriately assess the provider and clearly define the responsibilities.
“The responsibility for the quality of IT software and services will always reside with the life sciences company
that uses them. Having a vendor or even an independent third party produce an independent attestation
regarding the control environment’s effectiveness does not affect that obligation. However, with the expanding
use of such services, the need to maximize the efficiency of quality assessments has become a more significant
challenge. In addition, suppliers are starting to offer services with significant GxP risk, such as laboratory
information management systems (LIMS) as an SaaS application. The use of such high-risk services is a driver
for a structured and controlled approach to supplier assessment.” [57]
Assessment can be done by direct vendor audit or postal questionnaire, and/or by performing due diligence on the
available documentation provided by the third party, such as SOC2+ reports [57] and GxP supporting information.
It is important to ensure that the requirements for long-term archival storage and viewing in human-readable
form throughout the retention period are included in the CxO or cloud provider assessment. It is also important
that contract givers include SMEs with detailed understanding of both IT and data integrity requirements in the
assessment and ongoing vendor management process.
The most important thing to understand if regulated activities have been outsourced to a CxO is that the marketing
authorization holder is no longer in control of their data and it is therefore crucial that they appropriately manage
the contract acceptors. Data created by a contract acceptor remains the responsibility of the contract giver, and, as
described in the ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts [16], needs to be available
for review throughout the record retention period.
The principles for data governance, and details of data ownership and access, should be outlined in the quality
contract and or technical agreement with the CxO [24]. This should include not only access and ownership of live
data but also inactive archived data. This is further discussed in Section 4.2 in the ISPE GAMP® RDI Good Practice
Guide: Data Integrity – Key Concepts [16]. Retention requirements for such data, and even deletion conditions for the
same, must be covered in the quality contract and/or technical agreement.
The CxO may or may not have the expertise to maintain the data in a long-term human-readable format and this
should be evaluated when selecting a third party. One outcome of the assessment discussed in Section 10.2 is an
understanding of the capabilities and weaknesses associated with the services offered. It is important that these are
addressed in the quality contract and/or technical agreement and monitored through a robust vendor management
program.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 100 ISPE GAMP® RDI Good Practice Guide:
Appendix M3 Data Integrity by Design
Often the summaries of any outsourced activity may be presented to the marketing authorization holder in static
format, with the original electronic records secured and archived by the CxO. It is important that the validity of the
summary data compared to the original data has been managed and verified to ensure it is complete and accurate.
Where a CxO stores the contract giver’s data in the cloud, there is a risk that any interruption in the cloud agreement
could cause a loss of that data. The contract giver should consider storing the data from the CxO in their own servers
or cloud provider’s space rather than leaving it up to the CxO; this is especially important if the contract giver decides
to terminate the contract. There may be decisions to make about whether archived records are returned to the
contract giver, in what format, and how to make certain that the contract giver has the software available to read them.
If transferring these electronic records is required, potentially with a change in format, the process should be validated
to ensure the integrity and security of the data during transfer, particularly if the data includes confidential or private
information. If the transfer is completed successfully, the data should only be deleted by the CxO, once the marketing
authorization holder has given explicit documented instructions to do so.
Data can be stored off-premise either in a commercial datacenter or by hosting in the cloud. For both of these
options, ISO/IEC 27001:2013 [58] applies. This standard provides the requirements for information security
management systems and addresses people, processes, and technology to ensure confidentiality, integrity, and
availability. It includes requirements for areas of governance, risk management, and compliance, such as:
• Physical security
• Network security
• Logical security
• Cyber security
It remains the responsibility of the marketing authorization holder to ensure controls are in place and adequate to
protect the integrity, security, and availability of their regulated data throughout the mandated retention period. Such
requirements should be defined in an SLA before the hosting service begins, and the ongoing performance of the
provider monitored.
If the hosting services are IaaS, it should be straightforward to retrieve any data when needed. Where SaaS is used,
it is important to consider:
• What data governance is needed to ensure data integrity, availability, security, confidentiality, and privacy?
• Can the data be read outside of the SaaS application, or will migration/conversion be required?
• Will the SaaS provider archive the data for the marketing authorization holder, and if so, how will any data
destruction be managed based on predicate rule requirements and any business reasons to extend the retention
period?
• If SaaS provider will not be responsible for archiving the data, how does the marketing authorization holder
obtain the data from the SaaS provider?
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 101
Data Integrity by Design Appendix M3
• What will happen if the relationship with the SaaS provider breaks down?
• At the end of the contract, how will it be ensured that all data associated with the SaaS solution are completely
removed from all instances with evidence of destruction from each instance (if applicable) to be provided?
10.5 Conclusion
There should be no resultant increase in data integrity risks arising from the use of third parties. Data governance
must be in place and effective at all levels, with data ownership and final responsibility for the data remaining with the
marketing authorization holder.
It is important to identify any risks associated with the use of third parties who are creating, processing, or storing
regulated data, and establish appropriate mitigation controls.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 103
Data Integrity by Design Appendix D1
Appendix D1
Mapping: Laboratory System
This appendix contains an example of mapping a business process. This example is specific to a laboratory process,
however, it is presented to illustrate how a high-level business process provides the foundation to understanding the
detailed activities and user interactions.
In this example, a block diagram (Figure 11.1) shows the steps, a flowchart (Figure 11.2) is used to represent the
activities in the process, and from this a user-centric view is derived (Figure 11.3) identifying which users will interact
with the data.
Although this block diagram (Figure 11.1) depicts the various steps in the analysis of samples, it does not provide
enough detail to understand areas of risk. Additional detail is required to understand the process, for example:
• What system will be used for sample analysis – complex (NIR), simple (titration temporarily stores data), or
enterprise (HPLC within a CDS)?
Expanding the sample analysis process enables a more thorough understanding of the individual activities (Figure 11.2).
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 104 ISPE GAMP® RDI Good Practice Guide:
Appendix D1 Data Integrity by Design
Mapping the user interaction within the process flowchart aids an understanding of the user roles needed for the
different computerized systems supporting the process. In this example, the laboratory users are primarily split into
basic users (run analyses and report results) and power users (able to create storage folders, analytical methods,
etc.). Outside of (and independent from) the laboratory structure, an administrative account is needed to manage the
systems.
7 Laboratory PC System
From the detailed process flowchart in Figure 11.2, the detailed data flows can be generated, as discussed in
Section 4.3.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 105
Data Integrity by Design Appendix D2
Appendix D2
Electronic Record Storage
12.1 Introduction
This appendix discusses instrument devices that have moved on from a simple display of the measured value, to
become more and more sophisticated, and now temporarily store electronic records that must be managed and
reviewed. The appendix does not discuss devices limited to a display or printout capability because these will be
managed as manually recorded data.
12.2 Background
Testing instrument devices such as filter integrity testers, particle counters, glucose concentration measurement
systems, balances, and pH meters have historically been considered simple systems. These systems would only
display data or display and temporarily store data. These systems have become more and more advanced over the
past few years. This trend is a good example of the technology enhancement continuum, as vendors add additional
features to make device use more convenient, but at the same time, the data integrity controls continuum must keep
up. These more functional devices should not be run in “simple” mode – where they are treated as their previous
models with only a display or a printout – as this does not address the electronic records generated.
The regulatory agencies are becoming increasing concerned about the risks posed by such devices. In their 2018
guidance, MHRA [12] stated:
“Where the basic electronic equipment does store electronic data permanently and only holds a certain volume
before overwriting; this data should be periodically reviewed and where necessary reconciled against paper
records and extracted as electronic data where this is supported by the equipment itself.”
Functions that enhance data integrity should be used where possible. Appropriate data integrity controls must be
considered and applied.
Many of these types of instruments lack the necessary data integrity capabilities; therefore they require additional
procedural controls and/or enhanced second person verification:
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 106 ISPE GAMP® RDI Good Practice Guide:
Appendix D2 Data Integrity by Design
The use of an instrument may vary based on instrument type, business process, and data criticality. Below is an
example of routine use approach. Critical thinking should be applied to determine which elements of this approach
are appropriate and what additional steps may be needed.
3. Log use in a paper logbook (attach, initial and date printout) or electronic logbook (include passed, failed, or
aborted tests).
Note: A unique identifier such as a filename (if unique) or batch number is needed to ensure traceability between
the data in the system, the logbook, and GxP record.
4. Transcribe result value(s) to the GxP record (e.g., laboratory worksheet or batch record).
• GxP record
• Logbook
• Original electronic data on the device – including the result and a review for data not accounted for in the
logbook
All passed, failed, and aborted testing must be accounted for and distinguishable from other valid data generated
during calibration, maintenance, and training activities, in order to detect any “orphan” test results not reported. The
typical way to account for the data is at point/time of use is with a logbook to create a meaningful chronological use
record. These logbook entries can then be used to assist the second person original (electronic) data review as
needed.
This can be considered as a data review “triangle” to review specific results and detect orphan data. See Figure 12.1.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 107
Data Integrity by Design Appendix D2
Below is an example approach to reviewing data. Critical thinking should be applied to determine which elements
of this approach are appropriate to a particular instrument, business process, and what additional controls may be
needed:
1. Identify original data, metadata, and orphan data (unaccounted for data). They may be included in an equipment
use log, test run log, or individual files temporarily stored on the device.
2. Review the logbook (or electronic equivalent) to ensure all test data/files are accounted for and reported, for
example, an equipment use log.
3. Compare the GxP record under review with the associated logbook(s) to ensure all relevant data is reported, i.e.,
passed, failed, and aborted tests.
4. If additional tests or files are identified that cannot be accounted for via the original data or logbook, initiate an
investigation.
12.5 Challenges
The process described above is labor intensive for one device. Where multiple devices are utilized within the
organization, it becomes very challenging for reasons such as:
• Easy location of mobile units (e.g., filter integrity testers) to perform reviews and manual backups
• Sheer volume of data to review and account for on a small screen in busy factories
• Storage of large amounts of data from manual backups having to be curated manually
This section discusses the benefits of implementing systems with stronger technical controls and better interface
options to improve and simplify the review process.
As with all large volumes of data, getting it into a database or secured flat-file format is the best starting point for
managing it compliantly. This is not always possible of course, but here are some possibilities to consider:
• Select secured Wi-Fi enabled devices to enable backups without having to physically locate the devices
• Select devices with browser-based management software if available. This can enable remote data review and
backup.
• Electronic logbooks to create the chronological usage records for data reconciliation
The worst scenario is a device that is improved halfway, with a few features to require controls, but not enough
electronic capability to manage the added complexity; therefore, it is better to select devices that include connectivity,
configuration management, and features found in complex devices.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 108 ISPE GAMP® RDI Good Practice Guide:
Appendix D2 Data Integrity by Design
When deliberating the replacement/upgrade of a device, consider the following capabilities for improved data integrity
controls:
• Temporary data storage: 20,000 data points, 250 analyses, system administrator access required for deletion
• Auto-read capable: limit the ability to select when to take the reading
The most costly but effective scenario is the use of middleware (i.e., software that enables communication and
data management distributed across various instruments) to connect between an LES and an instrument to provide
instrument control and data management. When selecting vendors for middleware, assess their compliance to
following:
• Compliant QMS
• Number and types of systems it is able to control (e.g., balance, pH meter, conductivity meter, melting point,
UV/Vis spectrophotometers)
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 109
Data Integrity by Design Appendix D2
Depending on the age and severity of technical deficiencies in the device, it may not make sense to replace it
immediately with the latest, most connected version available. Connectivity will certainly enable more efficient
electronic data review and data backup processes, but subject to a documented and justified risk assessment, it may
still be possible to use them in a compliant manner with manual processes. Some of the necessary manual processes
could include:
• Backing up either automated or manual via direct USB cable or ethernet connection to a networked location
Note 1: USB ports should be disabled by default on computers creating, processing, or storing GxP data. Under
controlled conditions, a single port may be opened for the duration of transfer of data from a secure and virus-
scanned USB flash drive (see also Note 2).
Note 2: USB flash drives should be used as temporary storage only prior to transferring data to a networked location
that is backed up automatically to preserve the data. An SOP covering USB controls and data transfer and accounting
should be in place for this manual process. It is a best practice to document, at a minimum, the following when using
temporary media to transfer data:
Inevitably there is some risk with remaining on current but slightly old technology. Keeping track of these risks and
their mitigations in a risk register is a good way to revisit the device’s compliance during its lifecycle and make an
informed decision as to the optimal time to upgrade or replace. The risk register should be integrated with other QMS
processes to ensure regular review either periodically or as a result of an incident.
12.8 Example Data Integrity Risks, Interim Controls, and Actions to Consider
Table 12.1 identifies typical risks and suggested mitigations for various instruments such as filter integrity testers,
particle counters, pH meter, balance, etc. The best mitigation may be the implementation of middleware, depending
on the type of testing (lot release versus buffer preparation), number of systems, and increase to data integrity. See
Section 12.6 for additional information.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 110 ISPE GAMP® RDI Good Practice Guide:
Appendix D2 Data Integrity by Design
Table 12.1: Data Integrity Risks and Suggested Mitigations for Instrument Devices with Electronic Record
Storage
Interim control:
Record identity of user in GxP record (e.g., ELN) and attach printout
(initial and date) as available, or document in a LIMS system.
Lacks individual login
(Attributable)
Action:
Replace with a more sophisticated instrument based on risk and
availability.
Interim control:
Record results on GMP record or attach printout (initial and date). Add a
Lacks test log/register (audit trail) or
printer if not currently present where possible.
does not store result data (printout
only)
Action:
(Original/Accurate)
Replace with a more sophisticated instrument based on risk and
availability.
Interim control:
Regular backups to USB per procedure; move from USB to a network
location and manually log data in the location.
Backup is manual to USB
(Original/Legible)
Action:
Replace with a more sophisticated instrument based on risk and
availability (e.g., a Wi-Fi enabled device).
Interim control:
Review and reconcile per Figure 12.1 Data Review Triangle, on each unit
Large numbers of devices in individually.
an organization generating a
significant amount of data on a Action:
regular basis, requiring extensive Consider middleware or a networked solution over Wi-Fi with a database
effort for electronic data review and to enable metadata entry and automated backups. LDAP integration to
reconciliation simplify password management for users. Perform data review remote
from the devices. This approach can also reduce the number of personnel
entering graded areas.
12.9 Conclusion
Based on risk, it is important to evaluate the instrument devices to understand their capabilities and design the most
effective mitigation possible. This may involve procedural controls, integration via middleware, or even replacement
of instruments with new models offering more data integrity technical controls. Care should be taken when generating
user requirement or purchase specifications for instrument devices that generate electronic records to ensure that
new instruments have the best available controls for data integrity.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 111
Data Integrity by Design Appendix O1
Appendix O1
Technical Solutions Supporting Data
Integrity
13.1 Introduction
Increased use of technology has triggered increased opportunity and visibility of data integrity incidents, often caused
by human factors, with issues being:
• Intentional: caused by lapses in personal integrity such as sharing user names and passwords, falsification of
documentation for more favorable results, or to cover mistakes, etc.
This appendix focuses on utilizing technology to detect or ideally prevent data integrity issues, and offers thoughts
on ways to design and use systems to increase the visibility of data integrity incidents or concerns, and consequently
promote improved data integrity.
Historically, much of the design and review of systems for data integrity has been verified through the systems
initial validation followed by batch data review, a manual review of audit trails due to routine process, or deviation,
or by inspection (internal or regulatory). Human review of this vast amount of data is time-consuming and far
from exhaustive. An automated review based on tacit knowledge of issues and the design of bots is much more
comprehensive; however, it is reactive and may not discover willful data integrity breaches. A well-designed system
can provide a proactive look at critical steps in a system’s process to prevent issues.
There is one special area of concern for prevention of data integrity issues: failing to permanently record the initial
value before reporting the issue and waiting for correction. Failure to record each value permits unlimited changes
until a desired data value is obtained, which is a specific variation of testing into compliance. Enforced saving of data
at the time of entry provides mitigation for this.
During system design or upgrades to legacy systems, a review of the process and workflows can identify key areas
where critical information is entered, so as to identify technical controls that can be embedded into the system’s
operation. Additionally, the means to control access to the software (either electronic or physical) should be
investigated to restrict the opportunity to manipulate data. Examples include:
• The time required for personnel to properly train on a procedure in a learning management system (Can read
and understand training be performed on a 50-page procedure in two minutes? Maybe it can be by exception
(e.g., the procedure author is documenting their training), but a flag to the employee’s manager can ensure a
proper review.)
• Repeated consistent or identical results entered when variable data is being reported
These are simple examples, and this appendix investigates more detailed and practical applications and thought
processes to leverage technology to improve data integrity and minimize human factors on the data.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 112 ISPE GAMP® RDI Good Practice Guide:
Appendix O1 Data Integrity by Design
Before providing examples of business rules that can detect potential issues with data integrity, it is necessary to
understand the limitations of this approach, so it can be put into context with other data integrity efforts, such as
training and well established quality-culture advancements. Some limitations include:
• Some business rule failures cannot be detected with current technology. For example, simple instruments (e.g.,
pH meters, balances) permit a person to remeasure a sample multiple times before forwarding a data value for
retention.
• Business rules must be automated to be practical. Hybrid processes are difficult, if not impossible, to implement
in an efficient manner.
• Every business rule violation must be investigated to assess its merit. Often, a suspect entry has a valid reason
behind it, but investigation is necessary. This investment in time is critical to understand—it means that business
rule queries will reach a point of marginal return. This makes it imperative to monitor use (and effectiveness) of
queries and stop using those that are not providing value in detecting “real” issues in the organization or process.
Not all rules are equal.
One additional factor to consider is the timing of the application. Use of business rules can be applied either as
prevention or detection. As prevention, the business rule is applied at the time data is collected and stored in a
permanent medium. This provides feedback to the person performing the activity, and permits immediate remedial
action: reprocess, re-collect, re-enter. Generally, prevention of an issue is preferable to detection after the issue.
As manufacturing and laboratories continue the drive to automate operations, operators/analysts, and reviewers will
encounter a growing number of audit trails to enter or review as a routine part of work. If required to manually review
these audit records, firms will find that review requires more time than performing the actual work under review.
Additionally, greater knowledge of processes and how they can be improperly manipulated leads to an increasing
need to look for data that is suspicious due to its content or the sequence of events that occurred during the conduct
of the activity. Again, it is easy to overwhelm reviewers with a huge amount of files and records to be reviewed for
potential issues.
Increasing adoption of technology has created both these issues, and technology is also the solution. Developing
automated business rules for data integrity enables personnel to look for potential issues in a time-sensitive manner.
Automated searches through electronic data offer several advantages:
• Reduce the review to a small set of records (only records matching the criteria)
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 113
Data Integrity by Design Appendix O1
Business rules should be generated taking into account the business process mapping, data flow diagrams, and the
applicable regulations.
• Regulatory enforcement actions (Notice of Concern, Warning Letters, 483 Documents) can provide a wealth of
knowledge about the places and manners in which data can be improperly manipulated or deleted. For each
specific observation, add the question, “How would we prevent or detect that?” and a new business rule is
developed.
• CAPA and deviation reports can provide an opportunity to prevent recurrence of data integrity issues using
business rules to prevent or detect aberrant data. In addition, these reports are internal, so they can point to
behaviors within the organization that deserve special attention.
• Lifecycle status reviews: look at the statuses of materials as they move through the business process and from
system to system. By looking at combinations of these statuses, discrepancies can be identified. For instance,
an approved batch will have a status of “Approved” in the batch record application, and all the tests will be
“Released” in the LIMS system. Any status other than these indicates something unusual. Status reviews provide
a simple, high-level technique for consistency in records.
Business rules establish normal patterns for getting work done in an efficient manner. Ideally, they can be used
to point out unusual data values that merit added investigation by a qualified person who deeply understands the
business process to determine the merit of the data. This assures that data used to make decisions has a high
probability of meeting ALCOA+ attributes.
Because there are a wide variety of systems and equipment used in dedicated manufacturing and other GxP
processes, it is unreasonable to expect off-the-shelf business rule checks. The reality is that business rules must be
developed for specific processes, equipment, and systems. Something as simple as a different model of equipment
can change data integrity vulnerabilities. While there is opportunity for the reuse of rules, there is no substitute for
qualified people to create and manage business rules to ensure they meet the quality objectives of the firm.
- Security Roster: List of personnel, sorted by security role, permits review of the current access rights for
everyone. Special attention should be paid to roles with enhanced rights to administer accounts, review
data, modify calculations or data, or modify method/workflow parameters.
- Access Change History: While the access roster displays the current access state, this report provides
a list of changes in access rights over time and can identify changes made in the past that may have an
impact on reported data.
- Date of Last Access: A report of all users and their last date of access is valuable when removing users
from a system who no longer require system access. This report drives the right behavior: if someone no
longer requires access to the system, remove their access.
- Vendor Accounts: This report is a list of vendor (or non-employee) accounts. Ideally, it also includes the
dates of access into the system by any vendor. This report could be combined into the Security Roster and
Access Change History, but vendors represent a higher risk than users, and should be carefully reviewed.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 114 ISPE GAMP® RDI Good Practice Guide:
Appendix O1 Data Integrity by Design
- Elevated Accounts: This report lists elevated accounts and compares it to a list of approved elevated
accounts along with inactivated access to detect anyone that should not have elevated privileges or remove
access when someone leaves the area or the company. The approved elevated access should be controlled
procedurally to prevent anyone with a conflict of interest in the business area from getting that access.
- Important Changes to System: This report scans the system log and extracts changes to the system clock,
the recycle bin, plus any other system events that might be important to test.
- Recycle Bin: A list of files in the recycle bin permits a review of the files to determine if GxP data is being
discarded (Note: it is possible to delete data without it appearing in the recycle bin).
- Sample Statuses: Larger applications, such as ELN, LIMS, EBRS or Electronic Document Management
Systems (EDMS) often use statuses to manage the flow of data through a process. Two important reports
can look at statuses for potential issues: (1) List of items that are in statuses not supported by company
business practices. These are statuses present in a commercial system that are not needed for company
practices. These statuses are typically mentioned as unused in SOPs; (2) Statuses that are not “aligned”
across systems. For example, a released batch might have all tests “Released” in an ELN/LIMS, “Complete”
in the EBRS, and “Approved” in the EDMS. If QC laboratory personnel rolled back a test to make changes,
the test would not be “Released.” This discrepancy (statuses not all in their terminal state) can be detected
with this report. Similarly, flag any data that has not been reviewed appropriately prior to batch release (e.g.,
audit trail review). These must be resolved before releasing the batch.
- Audit Trail Change Reasons: There is tremendous value in a report that simply prints the date and time
and the change reason for all changes in an audit trail. Such a report provides the means to assess the
quality of entries in the audit trail.
- Output by Analyst: A report listing the amount of work performed per analyst permits reviewers to identify
performance that is “too good to be true” or a spike that would indicate shared account usage. However,
such reports should anonymize user names to avoid privacy issues.
- Control Chart by Analyst: Some systems permit the use of control charts by analyst to look for aberrant
trends. For example, data variability that is significantly skewed, or significantly less variable than other
analysts. Repeated use of a data set with the same time stamps or information could also be detected.
Again, user names should be anonymized in the report to avoid privacy issues.
• Chromatography: Many data integrity enforcement actions have resulted from improper manipulation or
exclusion of chromatographic data. Modern systems maintain audit trails that indicate changes to sample
injection sequences, integration of peaks, processing of injections, and calculation of reporting results. The
inherent flexibility and complexity of these systems requires a number of business rules to assure data integrity.
Some business rules of value include:
- Manual Integration of Peaks in any Injection: Auto-integration of peaks is the regulatory expectation,
but may not be a reality for legacy methods. Peaks manually integrated are expected to receive enhanced
review by qualified personnel, to assure that baselines are not created to bias the results.
- By Site and Method: A report of percentage of peaks auto-integrated versus manually integrated. This can
lead efforts to reduce manual integration, resulting in improvements in the consistency and efficiency of
results.
- Aborted Runs: Firms have used the Abort Run feature to prevent undesired data from being successfully
recorded so they are not required to justify it (or reject material). Consequently, any aborted runs must be
justified. In addition, it would be useful to report the number of aborted runs over a time period (e.g., month
or quarter) to ensure that the feature is not used to destroy unwanted data.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 115
Data Integrity by Design Appendix O1
- Multiple Integrations of Data: In addition to the use of manual integration to bias results, some firms
manipulate auto-integration parameters to obtain the desired results. This can be monitored by tracking the
number of changes to the processing method over a time period. Another means to detect reprocessing is to
search data channels for multiple entries for the same channel ID.
- Unprocessed Injections: When analysts are permitted to view peaks in real time, they can often determine
if peaks will yield desired results before the injection is processed. Personnel may choose to ignore an
injection and reinject the solution to get a more favorable result. Failure to process and justify all injections is
a clear violation of GxP requirements for complete records of testing.
- Short Runs: Personnel who repeatedly inject samples to obtain favorable results can leave unprocessed
injections (above), or create additional sample sequences (with 1 to 2 samples), or reinject samples under
testing. Searching for sample sequences with 1 to 2 samples can detect this behavior.
- Missing Injections in Sample Sequence: Since runs have a fixed injection length, look for times between
injections that go beyond the length of a single injection time. This indicates a missing injection deleted after
the event.
• Benchtop Systems: Common data integrity gaps for stand-alone benchtop systems include copying of data files
to other sample identities, collecting data and not forward processing it (choosing “best” result) for batch review,
permitting everyone to adjust instrument parameters to “adjust” test outcome, and permitting users to delete test
results. Because many of these systems generate data in individual files, it may be difficult to determine if a test
result has been processed and forwarded for inclusion in the batch record.
- Runs Present in Batch Release: This is a comparison between files in LIMS/ELN, and the archive of the
benchtop system. This report would ordinarily be used to identify data files not forwarded for inclusion in the
batch record.
- Modified Files: Identifying files where the modified date is later than the creation date indicates data
updates after the original save. In some systems, by design this will not be useful, but systems where data is
settled prior to saving the data file, this search can identify files that merit review. This is especially true in an
archive where files should be protected against update.
- Deleted Files: If a PC is configured so files may be placed in a recycle bin but not deleted, it is possible to
identify deleted files and review the appropriateness of each deleted file.
• Enterprise Systems: These systems provide superior data integration and handling, enabling reviewers to
closely look for data issues that merit closer scrutiny.
- Time Sequence Anomalies: A report that identifies steps in workflow that are out of order based on date
and time stamps, permits a review for multiple uploads of data, changes after test completion, or clock
modifications.
- Time to Complete Process by Method and Analyst: Comparing average time to complete a method
across several analysts can yield valuable information about resourcing and can also identify individuals
who complete a process in significantly less time than other people in the work group.
- Multiple Uploads: External systems that upload multiple times to a method might indicate someone
attempting to pick a dataset giving a favorable result. A threshold of three uploads is recommended to start
this report.
The above items are examples only; critical thinking should be applied to determine the business rules most
applicable and appropriate for an organization or business process.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 116 ISPE GAMP® RDI Good Practice Guide:
Appendix O1 Data Integrity by Design
A great potential exists in this area to adopt AI and ML algorithms to look for data patterns and trends that merit
further review. This area of exploration is in its infancy within GxP areas, but is expected to grow rapidly in the coming
years. Algorithms have the potential to identify sites with differing error rates, outputs, and variability. This will provide
sites with the ability to ask questions and gain insights into their processes. Experience has shown that an initial
“danger” of these technologies is their ability to create questions faster than they can be processed and answered by
the organization. AI and ML are further covered in Appendix S1.
13.2.6 Governance
Senior management want metrics to measure the organization’s progress in detecting data integrity issues. It should
be noted that these reports will cause a few individuals to modify their practices to avoid detection. There are metrics
that can provide insights into uptake and efficiency:
• By Site and by Report: Number of times a report is executed, and the number of records in the report
• By Site: Number of investigations conducted based on reports, time invested in investigations, and number of
confirmed issues detected
These metrics, along with training, provide basic governance and allow leaders to determine which reports are
effective in detecting issues in the organization and use the knowledge to drive continual improvement. The number
of report executions measures the uptake of the reports into each site, especially in the first months of program
implementation.
Some vendor programs allow analysts read/write/delete access to the desktop and folders where data is stored
because the system did not allow certain critical functions to be separated from user access roles. This creates
a serious problem with ensuring data integrity because the user can potentially edit or delete data at will and test
into compliance. They can also potentially download and run software that can cause problems with the data or the
system. Having a system with these gaps necessitates locking down the computer to prevent any potential problems
with data integrity.
Alternate Operating System Shells (“shells”) are software applications that provide additional configuration
capabilities to ensure only authorized people can access the computer and restrict what the user can do. They do not
fundamentally change the operating system in use but add missing capabilities to give greater control over files and
data. These can be used to effectively restrict the users to only important tasks related to their work and keeps them
away from deleting or moving the original data until it can be swept into the archive.
Shells may require different settings for different configurations of software. It is not a one-size-fits-all with the
configuration and may need tweaking to get different vendor software to run appropriately. When developing
the procedure for installation, some flexibility needs to be incorporated into the settings for adjustments without
compromising the lockdown that needs to be accomplished.
Lockdown Features
The shell software can log off users due to inactivity and block virtually any application or any part of it: window,
popup message, or dialog box. Applications can be set to run as Administrator, should they need Administrator
access without giving that level of access to the user.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 117
Data Integrity by Design Appendix O1
• Disable taskbar, desktop, clipboard, control panel, safe mode, drag and drop, and many others
• Restrict the browser to allow access to trusted sites only and block all others
• Use a selective start menu to show only items to which the user has been given access
• Prevent users from using the downloaders, installing software, and block unwanted programs
• Hide system and network drives and block access to USB drives, DVDs, and CD burners
The software can monitor changes to the system and write these changes to the log file.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 119
Data Integrity by Design Appendix O2
Appendix O2
General Archiving
This appendix discusses good archiving practice, which starts with data of high quality and that is fit for its
intended use.
The archive is intended as a long-term storage solution (years). It is important to remember that archiving fulfills
a different need to backup, which is a duplicate copy of digital data intended to assist with disaster recovery.
Maintaining backup copies of data is not a viable solution to archiving.
Data must be recorded in a durable, maintainable form for the retention period. The archive can only maintain the
level of integrity of the incoming data, it cannot remediate existing integrity issues.
Where the intention is to retain data, data should be retained for a reason.
Archive only those records that are required by predicate rules and legislation, by business practices (including legal
considerations), or by procedures. Determine whether there is a need to keep all records or whether those that do not
fall under these categories can be safely discarded. Decisions should be documented and justified.
The archive is intended for long-term storage of key information. Draft documents are usually either superseded
by issued versions of the same document or never issued and therefore should not be archived. Draft documents
should only be archived if the business process requires the retention of drafts. If it is important to understand why a
document was never formally issued, the decision and reasons for this can be captured in a separate document that
is archived.
Perform the archiving of data at established intervals based on the system, user requirements, and regulatory
requirements. Often, data may be archived once the likelihood of accessing that record has reduced to a given
frequency, say once a year. Alternatively, archive data at a well-defined point in the workflow, e.g., at the end of final
approval of a GLP study or manufacturing batch.
When designing an electronic archive, consider that the required speed of response for different system parts may
not be the same. For example, a higher speed of response generally will be required for the data management
functions, such as Search and Report, compared with the retrieval of a specific Archive Information Package.
Individual computerized systems need to provide a mechanism to archive complete and accurate records, including
relevant metadata. Controls need to be in place to retrieve and read the data (including metadata) during the
retention period.
It is not uncommon to find a record in several places. When archiving such a record, locate all copies of it, check they
are consistent and archive only one record while deleting the others (following the due process for deletion). Add a
link from the location of the deleted copies to the retained record.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 120 ISPE GAMP® RDI Good Practice Guide:
Appendix O2 Data Integrity by Design
When archiving a newer version of a document, it is considered essential to record the reason for the update and
what is being replaced, augmented, or deleted. This applies also to where a new document replaces one or several
existing documents. Note that all previous versions are still be retained in the archive for auditing purposes. It is also
critical that the superseded document record provides a link to the updated document to ensure the complete records
and history can be obtained.
A directory structure should be defined where metadata is saved in the same directory as the parent data. This will
facilitate maintenance and linking of metadata with parent data. Directory and filenames should be given careful
consideration. Some archiving software truncates filenames. Other software changes file metadata, such as the
timestamps. The path name alone should not be relied upon for identifying the record. Path names may change, and
are usually not sufficiently robust for record identification and location.
Avoid archiving compressed data files. These introduce another layer of conversion and the potential for data
corruption, as well as bringing the issue of continued availability of the decompression tool and an operating system
on which to run it. In some instances, this may result in the inability to fully restore the original data, that is, there is
loss of resolution or metadata.
Use procedures to preserve the integrity and identity of data when archiving from one set of media, such as files
on hard drive to another. Consider any potential loss of resolution when migrating images to formats such as JPG;
consider using TIFF instead. Hyperlinks should be avoided where the URL could change without the knowledge of
the Archivist.
Procedures should confirm that archived data, including relevant metadata, is available and human readable. The
archive needs to protect records from deliberate or inadvertent loss, damage, and/or alteration for the retention
period. Security controls should be in place to ensure the data integrity of the record throughout the retention period,
and validated where appropriate.
Verify that record integrity is being maintained, for example, by conducting tests as part of scheduled maintenance or
by regularly reviewing audit trails. Make sure there is a test environment for testing software and hardware changes
without the risk of corrupting data which has already been archived.
Make sure that there is only one source of control to the archived data, using a small number of well-defined indices
and procedures. There may be several archivists, but they should all be using the same methods and procedures.
Keep the quality role separate from the archivist role. Quality should be seen to be independent, particularly during an
audit. Ensure that quality is empowered to report directly to management.
Collect relevant metrics from the operation of the archive so that its performance can be substantiated and monitored.
The metrics also will enable future requirements to be better predicted and planned. Sufficient funding should be
allocated for the ongoing maintenance of the archive.
If the archive is contracted to a third party, ensure contracts specify responsibilities to ensure the data is secure and
retrievable throughout the record retention period, even if the contract ends or the third party can no longer support
the contract.
Handle operational difficulties through a formal event and problem reporting system. Hold periodic meetings with the
key archive stakeholders to review archive procedures, performance, migration requirements, future requirements,
etc. Information Systems (IS)/IT are important stakeholders in the archival process and it is important to include IS/IT
in any relevant communications and meetings.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 121
Data Integrity by Design Appendix O2
Ensure that IS/IT understands the archiving needs clearly so that the infrastructure is appropriate and can safely
support the archive. Ensure that backup and disaster recovery activities are appropriate to the system and media
used. Relying on a single copy of archive data is a significant data integrity risk.
The archive should be managed in alignment with the data lifecycle. This includes the destruction of all archive
copies, including backups, when the records reach the end of their retention period unless required to be retained for
legal reasons.
Consider to what level data should be deleted to become fully destroyed. Many deletion processes simply remove
the reference to the data without actually removing the data itself. The audit trail is part of the record and should be
destroyed with it. It is important to clearly specify procedures for the destruction of data. Eventually archive systems will
need to be retired. The migration of the archived data is critical. An archive that is allowed to outlive its economic life
will become expensive to maintain and eventually non-maintainable. Retirement planning should be part of the design.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 123
Data Integrity by Design Appendix O3
Appendix O3
Considerations
This appendix builds on Appendix O2 and adds the more detailed requirements needed for GLP.
Most GLP regulations (e.g., 21 CFR 58 [59], OECD Good Laboratory Practice and Compliance Monitoring No. 15
[60], etc.) have a specific requirement for an archive with an associated a position as an Archivist. The archivist
controls and oversees the archive and the input and removal of data from the archive. An archivist can have deputies
if required.
The principles described here can be the basis of records management for other GxP disciplines.
The basic requirements for the GLP archive for both physical (paper, pathology slides, specimens, etc.) and
electronic records are the same:
• The location of the archive is required for the GLP study report (Note: For an electronic record, the location may
be a UNC path or URL hyperlink to electronic storage).
• Restricted access for authorized individuals. Usually there is a paper log for recording who and when an
individual enters and leaves the archive. Visitors must be escorted and this recorded in the log.
• To ensure that physical records are protected and stored in optimum conditions, the environment must be
monitored and the records protected from hazards (e.g., weather, water pipes, fire, pests, etc.)
• There must be an index of archived studies and supporting records (e.g., training records, organization charts,
etc.)
• The index is updated as studies enter the archive or as studies are destroyed at the end of the applicable
retention period.
• There must be a record of when study data was taken out and by whom and when the package was returned
and by whom. The records should be reconciled to see if any records are missing or have been added to.
• Archived records must be readable over the record retention period. This is a sponsor responsibility, as data
ownership should make clear, if the archive is outsourced.
• An archive can be contracted to a third party provided there is an agreement detailing the roles and
responsibilities of both parties and the right for the sponsor to audit and allow GLP inspections.
• Location of the electronic archive must be included in each GLP study report.
An electronic archive is allowable under OECD GLP No. 15 [60] as one of three options:
• Off-line media storage (in which case, the media must be stored in the GLP archive)
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 124 ISPE GAMP® RDI Good Practice Guide:
Appendix O3 Data Integrity by Design
For electronic records there are additional requirements for the archive:
• The archive can be outsourced, but the test facility management is responsible for archiving and a facility outside
of the GLP monitoring program must have a QMS. Test facility management must evaluate the QMS and risks for
SaaS and any e-archiving service. The need for an audit should be risk based.
• IT staff involved with operating or supporting a GLP electronic archive must be trained in GLP regulations as it
impacts their work.
• An electronic archive should be separate and secure, in practice explicitly marked as archived and locked so that
they cannot be changed. Studies must be under the control of the archivist.
• A single global instance is acceptable as an archive as long as relevant SOPs mention this.
• The archiving process must not change electronic data, i.e., the content and meaning is maintained for all data.
Dynamic data should remain dynamic as long as is feasible, to preserve the GxP content and meaning.
• Keeping the data in the original system is the better option as the application database will ensure the data
structures, data, and metadata of each study is maintained. This is acceptable if the study records can be locked
and cannot be changed by normal users.
Locked study records can be viewed by any user in an application and this does not count as accessing the archive.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 125
Data Integrity by Design Appendix O4
Appendix O4
Periods and Requirements
This appendix gives examples of retention periods from various regulations around the world.
Note that this Guide mainly focuses on relevant GxP regulations. It is important to keep in mind that non-GxP
regulations can also impact the retention of records.
The ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts Appendix 5 [16] explains and defines the
specific data terminology used in the different regulations, e.g., raw data, source data, etc.
Table 16.1 summarizes extracts from the key regulations and directives listed that specify requirements for how data
is archived and for how long. It is not intended to be a complete reference but provides some additional information to
support the statements made within the body of the document. A regulated company should determine for itself which
regulations and corresponding retention periods apply.
Table 16.1: Extracts from Regulations and Directives Specifying Retention Requirements and Periods
2. 21 CFR Part 58—Good Laboratory Practice for Nonclinical Laboratory Studies (US) [59]
3. 21 CFR Part 211—Current Good Manufacturing Practice for Finished Pharmaceuticals (US) [29]
4. 21 CFR Part 606—Current Good Manufacturing Practice for Blood and Blood Components (US) [61]
6. Commission Directive 2003/94/EC of 8 October 2003 laying down the principles and guidelines of good
manufacturing practice in respect of medicinal products for human use and investigational medicinal products for
human use (EU) [63]
7. Directive 2001/83/EC of the European Parliament and of the Council of 6 November 2001 on the Community
Code Relating to Medicinal Products for Human Use (EU) [64]
9. Guidelines on Good Distribution Practice of Medicinal Products for Human Use – 2013/C 343/01 (EU) [67]
10. ICH E6 (R2) Guideline for Good Clinical Practice (tripartite guideline EMA/CHMP/ICH) [45]
11. EU No. 536/2014 on Clinical Trials on Medicinal Products for Human Use [68]
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 126 ISPE GAMP® RDI Good Practice Guide:
Appendix O4 Data Integrity by Design
Table 16.1: Extracts from Regulations and Directives Specifying Retention Requirements and Periods (continued)
Ref: 1 Part 11.10(c) Protection of records to enable their accurate and ready
21 CFR Part 11 [10] retrieval throughout the records retention period.
Ref: 2 58.190(b) There shall be archives for orderly storage and expedient
21 CFR Part 58 [59] retrieval of all raw data, documentation, protocols,
specimens, and interim and final reports. Conditions of
storage shall minimize deterioration of the documents
or specimens in accordance with the requirements for
the time period of their retention and the nature of the
documents or specimens. A testing facility may contract
with commercial archives to provide a repository for all
material to be retained. Raw data and specimens may
be retained elsewhere provided that the archives have
specific reference to those other locations.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 127
Data Integrity by Design Appendix O4
Table 16.1: Extracts from Regulations and Directives Specifying Retention Requirements and Periods (continued)
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 128 ISPE GAMP® RDI Good Practice Guide:
Appendix O4 Data Integrity by Design
Table 16.1: Extracts from Regulations and Directives Specifying Retention Requirements and Periods (continued)
Ref: 3 211.180(c) All records required under this part, or copies of such
21 CFR Part 211 [29] records, shall be readily available for authorized inspection
during the retention period at the establishment where
the activities described in such records occurred. These
records or copies thereof shall be subject to photocopying
or other means of reproduction as part of such inspection.
Records that can be immediately retrieved from another
location by computer or other electronic means shall be
considered as meeting the requirements of this paragraph.
Ref: 3 211.180(d) Records required under this part may be retained either
21 CFR Part 211 [29] as original records or as true copies such as photocopies,
microfilm, microfiche, or other accurate reproductions of
the original records. Where reduction techniques, such as
microfilming, are used, suitable reader and photocopying
equipment shall be readily available.
Ref: 4 606.160(d) Records shall be retained for such interval beyond the
21 CFR 606 [61] expiration date for the blood or blood component as
necessary to facilitate the reporting of any unfavorable
clinical reactions. You must retain individual product
records no less than 10 years after the records of
processing are completed or 6 months after the latest
expiration date for the individual product, whichever is the
later date. When there is no expiration date, records shall
be retained indefinitely.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 129
Data Integrity by Design Appendix O4
Table 16.1: Extracts from Regulations and Directives Specifying Retention Requirements and Periods (continued)
Ref: 5 820.180(b) Record retention period. All records required by this part
21 CFR Part 820 [62] shall be retained for a period of time equivalent to the
design and expected life of the device, but in no case
less than 2 years from the date of release for commercial
distribution by the manufacturer.
Ref: 6 Article 9.1 For a medicinal product, the batch documentation shall
Commission Directive be retained for at least one year after the expiry date of
2003/94/EC [63] the batches to which it relates or at least five years after
the certification referred to in Article 51(3) of Directive
2001/83/EC, whichever is the longer period.
Ref: 7 Title IV In all cases and particularly where the medicinal products
Directive 2001/83/EC Article 51(3) are released for sale, the qualified person must certify in a
[64] register or equivalent document provided for that purpose,
that each production batch satisfies the provisions of this
Article; the said register or equivalent document must be
kept up to date as operations are carried out and must
remain at the disposal of the agents of the competent
authority for the period specified in the provisions of the
Member State concerned and in any event for at least five
years.
Ref: 7 Title VII Holders of the distribution authorization must fulfil the
Directive 2001/83/EC Article 80(f) following minimum requirements:
[64]
they must keep the records referred to under (e) available
to the competent authorities, for inspection purposes, for a
period of five years.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 130 ISPE GAMP® RDI Good Practice Guide:
Appendix O4 Data Integrity by Design
Table 16.1: Extracts from Regulations and Directives Specifying Retention Requirements and Periods (continued)
Ref: 8 Chapter 4 For other types of documentation, the retention period will
EudraLex Volume 4 [65] 4.12 depend on the business activity which the documentation
PIC/S PE 009-14 (Part supports. Critical documentation, including raw data (for
1) [66] example relating to validation or stability), which supports
information in the Marketing Authorisation should be
retained whilst the authorization remains in force. It may
be considered acceptable to retire certain documentation
(e.g., raw data supporting validation reports or stability
reports) where the data has been superseded by a full set
of new data. Justification for this should be documented
and should take into account the requirements for
retention of batch documentation; for example, in the case
of process validation data, the accompanying raw data
should be retained for a period at least as long as the
records for all batches whose release has been supported
on the basis of that validation exercise.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 131
Data Integrity by Design Appendix O4
Table 16.1: Extracts from Regulations and Directives Specifying Retention Requirements and Periods (continued)
Ref: 8 Annex 2 Where human cell or tissue donors are used full
EudraLex Volume 4 [65] 28 traceability is required from starting and raw materials,
PIC/S PE 009-14 (Biological substances including all substances coming into contact with the
(Annexes) [66] and products) cells or tissues through to confirmation of the receipt of
the products at the point of use whilst maintaining the
privacy of individuals and confidentiality of health related
information. Traceability records must be retained for 30
years after the expiry date of the medicinal product.
Ref: 8 Annex 11 Data may be archived. This data should be checked for
EudraLex Volume 4 [65] Section 17 accessibility, readability and integrity. If relevant changes
PIC/S PE 009-14 are to be made to the system (e.g. computer equipment or
(Annexes) [66] programs), then the ability to retrieve the data should be
ensured and tested.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 132 ISPE GAMP® RDI Good Practice Guide:
Appendix O4 Data Integrity by Design
Table 16.1: Extracts from Regulations and Directives Specifying Retention Requirements and Periods (continued)
Ref: 8 Annex 14 (EudraLex) Data needed for full traceability must be stored
EudraLex Volume 4 [65] 4.3 for at least 30 years, according to Article 4 of Directive
PIC/S PE 009-14 (Blood and plasma) 2005/61/EC and Article 14 of Directive 2002/98/EC.
(Annexes) [66]
(PIC/S) Data needed for full traceability must be stored
according to national legislation. For EU/EEA this is for at
least 30 years according to Article 4 of Directive 2005/61/
EC and Article 14 of Directive 2002/98/EC.
Ref: 9 3.3.1 Data should only be entered into the computerised system
2013/C 343/01 [67] (Computerized systems) or amended by persons authorised to do so.
Ref: 9 4.2 (General) Documents should be retained for the period stated in
2013/C 343/01 [67] national legislation but at least five years. Personal data
should be deleted or anonymised as soon as their storage
is no longer than necessary for the purpose of distribution
activities.
Ref: 10 3.4 The IRB/IEC should retain all relevant records (e.g.,
ICH E6 (R2) [45] written procedures, membership lists, lists of occupations/
affiliations of members, submitted documents, minutes
of meetings, and correspondence) for a period of at
least 3-years after completion of the trial and make them
available upon request from the regulatory authority(ies).
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 133
Data Integrity by Design Appendix O4
Table 16.1: Extracts from Regulations and Directives Specifying Retention Requirements and Periods (continued)
Ref: 10 5.5.6 The sponsor, or other owners of the data, should retain all
ICH E6 (R2) [45] of the sponsor-specific essential documents pertaining to
the trial.
Ref: 10 8.1 Addendum The sponsor should ensure that the investigator has
ICH E6 (R2) [45] control of and continuous access to the CRF data reported
to the sponsor. The sponsor should not have exclusive
control of those data.
Ref 11 Article 58 Unless other Union law requires archiving for a longer
EU No. 536/2014 [68] period, the sponsor and the investigator shall archive the
content of the clinical trial master file for at least 25 years
after the end of the clinical trial.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 134 ISPE GAMP® RDI Good Practice Guide:
Appendix O4 Data Integrity by Design
Table 16.2 lists extracts from non-regulatory guidances that propose requirements for how data is archived and for
how long.
12. PIC/S Guide PI 011-3 Good Practices for Computerised Systems in Regulated “GxP” Environments [69]
13. OECD Series on Principles of Good Laboratory Practice and Compliance Monitoring No. 1 OECD Principles on
Good Laboratory Practice (as revised in 1997) ENV/MC/CHEM(98)17 [70]
14. OECD Series on Principles of Good Laboratory Practice and Compliance Monitoring No. 17 Application of GLP
Principles to Computerised Systems (2016) ENV/JM/MONO(2016)13 [71]
Ref: 12 14.4 Validation GxP compliance evidence is essential for the following
PIC/S Guide PI 011-3 Strategies and Priorities aspects and activities related to computerized systems:
[69]
• Data input (capture and integrity), data filing, data-
processing, networks, process control and monitoring,
electronic records, archiving, retrieval, printing, access,
change management, audit trails, and decisions
associated with any automated GxP related activity.
• In this context, examples of GxP related activities might
include: regulatory submissions, R&D, clinical trials,
procurement, dispensing/weighing, manufacturing,
assembly, testing, quality control, quality assurance,
inventory control, storage and distribution, training,
calibration, maintenance, contracts/technical
agreements and associated records and reports.
Ref: 12 21.1 ERES EC Directive 91/356 sets out the legal requirements for
PIC/S Guide PI 011-3 EU GMP. The GMP obligations include a requirement
[69] to maintain a system of documentation...The main
requirements here being that the regulated user has
validated the system by proving that the system is able
to store the data for the required time, that the data is
made readily available in legible form and that the data is
protected against loss or damage.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 135
Data Integrity by Design Appendix O4
Ref: 12 21.10 ERES Issues to consider where electronic records are used to
PIC/S Guide PI 011-3 retain GxP data:
[69]
• Documentary evidence of compliance exists
• Archiving procedures are provided and records of use
exist
• Procedures exist to ensure accuracy, reliability and
consistency in accordance with the validation exercise
reported for the electronic record system
• System controls and detection measures (supported
by procedures) exist to enable the identification,
quarantining and reporting of invalid or altered records
• Procedures exist to enable the retrieval of records
throughout the retention period
• The ability exists to generate accurate and complete
copies of records in both human readable and
electronic form
• Access to records is limited to authorised individuals
• Secure, computer-generated, time-stamped audit trails
to independently record GxP related actions following
access to the system are used
Ref: 13 1.1 Test Facility At a minimum it [the test facility organization and
OECD ENV/MC/ Organization and personnel] should:
CHEM(98)17 [70] Personnel
l) Ensure that an individual is identified as responsible for
the management of the archive(s).
Ref: 13 3.4 Archive Facilities Archive facilities should be provided for the secure storage
OECD ENV/MC/ and retrieval of study plans, raw data, final reports,
CHEM(98)17 [70] samples of test items and specimens. Archive design and
archive conditions should protect contents from untimely
deterioration.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 136 ISPE GAMP® RDI Good Practice Guide:
Appendix O4 Data Integrity by Design
Ref: 13 10.1 Storage and The following should be retained in the archives for the
OECD ENV/MC/ Retention of Records period specified by the appropriate authorities:
CHEM(98)17 [70] and Materials
a) The study plan, raw data, samples of test and reference
items, specimens, and the final report of each study;
b) Records of all inspections performed by the Quality
Assurance Program, as well as master schedules;
c) Records of qualifications, training, experience, and job
descriptions of personnel;
d) Records and reports of the maintenance and calibration
of apparatus;
e) Validation documentation for computerized systems;
f) The historical file of all Standard Operating Procedures;
g) Environmental monitoring records.
Ref: 13 10.2 Storage and Material retained in the archives should be indexed so as
OECD ENV/MC/ Retention of Records to facilitate orderly storage and retrieval.
CHEM(98)17 [70] and Materials
Ref: 13 10.3 Storage and Only personnel authorized by management should have
OECD ENV/MC/ Retention of Records access to the archives. Movement of material in and out of
CHEM(98)17 [70] and Materials the archives should be properly recorded.
Ref: 13 10.4 Storage and If a test facility or an archive contracting facility goes out of
OECD ENV/MC/ Retention of Records business and has no legal successor, the archive should be
CHEM(98)17 [70] and Materials transferred to the archives of the sponsor(s) of the study(s).
Ref: 14 3.2 Storage of data When data (raw data, derived data or metadata) are stored
OECD ENV/JM/ Article 73 electronically, requirements for back-up and archiving
MONO(2016)13 [71] purposes should be defined. Back-up of all relevant data
should be carried out to allow recovery following failure
which compromises the integrity of the system.
Ref: 14 3.2 Storage of data Stored data should be secured by both physical and
OECD ENV/JM/ Article 74 electronic means against loss, damage and/or alteration.
MONO(2016)13 [71] Stored data should be verified for restorability, accessibility,
readability and accuracy. Verification procedures of stored
data should be risk based. Access to stored data should be
ensured throughout the retention period.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 137
Data Integrity by Design Appendix O4
Ref: 14 3.2 Storage of data Regarding procedures, the test facility management
OECD ENV/JM/ Article 77 should describe how electronic records are stored, how
MONO(2016)13 [71] record integrity is protected and how readability of records
is maintained. For any GLP-relevant time period, this
includes, but may not be limited to:
Ref: 14 3.11 Archiving Any GLP-relevant data may be archived electronically. The
OECD ENV/JM/ Article 110 GLP Principles for archiving must be applied consistently
MONO(2016)13 [71] to electronic and non-electronic data. It is therefore
important that electronic data is stored with the same
levels of access control, indexing and expedient “retrieval”
as non-electronic data.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 138 ISPE GAMP® RDI Good Practice Guide:
Appendix O4 Data Integrity by Design
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 139
Data Integrity by Design Appendix O5
Appendix O5
Software
17.1 Introduction
Legacy software may be needed to read archived dynamic data. This appendix looks at the different approaches
currently available to run legacy software after system retirement. Apply critical thinking to identify the most
appropriate, robust, and long-term solution to readability of the archived data.
Companies lacking a retention strategy and/or an adequate system retirement process may simply opt to leave
the retired system connected and available as a means to read data. This is a “do nothing” approach and is not
viable long term as any hardware failure or software corruption is likely to render the system unusable and the data
unreadable.
Should the system be running an unsupported operating system and be connected to the company network, this
introduces a serious vulnerability to cyberattacks. PI 041-1 (Draft 3) Good Practices for Data Management and
Integrity in Regulated GMP/GDP Environments [24] specifically recommends any outdated systems should be
isolated from the company network for this reason, for example, by the introduction of an additional firewall.
The latest version of some OS may support the installation of legacy software. There will still be challenges to keep
older software running as updates are installed since incremental changes over time can interfere as much as a
major upgrade.
Consider that the system’s validated state was maintained with the previous OS and therefore formal documented
testing should be conducted in the new environment to ensure that it is operational and suitable for its intended use
should the need arise to review the data during an inspection.
Documented periodic checks need to be conducted to ensure OS updates have not caused problems with the
opening and viewing of records, so a test data set should be retained to enable a quick review of the software and
that the software is still suitable for its intended use. The periodic review of the data will need to be clearly defined in
a procedure along with the expected interval and documentation of the outcome that would be generated.
Some applications may require the original, compatible operating system to work properly. A practical solution is to
have a complete Virtual Machine (VM) image containing the application software and any supporting software needed,
all running on the compatible operating system version. This can be achieved by virtualizing the physical system before
it is retired, or by re-installing the application and supporting software into a new VM when required. If the vendor
software and compatible OS are archived together it will make the VM environment setup simple, and the image
can be loaded and run when needed to view archived data. A VM solution may have a longer viable lifespan than a
hardware museum (see Section 17.5) but there will still be limits to which OS can be prolonged this way. One particular
risk with running a legacy OS in a VM is that the unsupported OS is neither patched nor patchable against security
vulnerabilities, and therefore the VM is highly susceptible to malware attacks if directly connected to the main network.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 140 ISPE GAMP® RDI Good Practice Guide:
Appendix O5 Data Integrity by Design
Cooperation may be required between IT and the process or data owner to manage and maintain the VM ongoing.
This could include, but is not limited to, system access controls, security, periodic review, disaster recovery
processes, etc.
A hardware museum is a planned, structured approach to storing legacy hardware to run its associated software and
view records. This is the most difficult of the options and is relatively uncommon. It is a last resort where there is no
other alternative to maintain readability for dynamic data, and the risks to patient safety and product quality resulting
from converting to static data are unacceptable. It is particularly impractical for enterprise systems, and even for a
stand-alone system there is no guarantee that the legacy hardware will operate when required as old components
may fail with no replacements available. A hardware museum only has a limited viable lifespan and should only be
used as a short- to medium-term solution; there should be a clear plan that addresses record retention beyond this.
The physical storage space required also poses challenges and is compounded in the rare case of the instrument
needing to be retained along with the computer for the software to run properly.
One location for such storage would be in the archive with the paper records because those storage conditions
would also be ideal to preserve the hardware and software from degradation. Another potential location would be
in the server room as those conditions would also be good for preserving the hardware, but the associated vendor
software disks or OS disks should be separated and stored elsewhere for disaster recovery purposes. In any case,
the hardware should not be kept running because that would wear it out prematurely.
17.6 Conclusion
There is no perfect solution to ongoing operability for legacy software. At some point, as discussed in Section 7.2,
maintaining the readability of the old dynamic records may become unfeasible and conversion to static format is the
only remaining option.
5
Also known as mothballing/time capsule/computer museum.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 141
Data Integrity by Design Appendix S1
Appendix S1
Machine Learning
18.1 Introduction
Artifical Intelligence (AI) is a broad field of study within computer science that includes technologies such as Machine
Learning (ML), as well as deep and continuous learning. This appendix examines the area of ML and the importance
and implications of data and data integrity on the outcomes of what “machines” are able to process/learn from the
data which is made available to them.
18.2 Background
ML is a method of data analysis that builds and automates mathematical models (i.e., algorithms) based on data in
order to make predictions or decisions.
As previously stated, it is an area or branch of AI based on the idea that systems can learn from data, identify
patterns and make decisions with minimal human intervention and/or explicit programming.
In order to understand and properly appreciate the inherent importance of data to ML, and its integrity, we must first
understand and grasp the use of data within ML. For a “machine” or software system to “learn” there must be data
available to train the system (e.g., algorithms). But that is not the only reason for which data is used within such an
intelligent machine.
In general, there are four primary success factors to any good ML effort; these include:
• Data
• Algorithms
• Computations
• Predictions
As data integrity is the focus of this Good Practice Guide, this appendix focuses on the data portion of ML.
18.3 Scope
This appendix first identifies the lifecycle of data within a typical ML framework. This includes where data is required
to be input into the lifecycle, how data is split for various activities, and where data is created or reintroduced to
be used for additional learnings. This appendix then equates the typical ISPE GAMP® 5 [9] software phases (e.g.,
concept, project, and operation) to those of the ML model and those activities required to integrate this technology
into an overarching application and/or software product.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 142 ISPE GAMP® RDI Good Practice Guide:
Appendix S1 Data Integrity by Design
The vast majority of data is required prior to any model or algorithm selection; additionally, new data is used
throughout the iterative lifecycle of ML (see Figure 18.1). It is used for fully understanding and defining the business
case, data engineering activities, model training, tuning and selection, in addition to model evaluation (i.e., validation
and testing). Production data is also used in order to refine models (i.e., retraining), improve scoring/performance
(i.e., precision, accuracy, etc.) and allows the “machine” to keep learning, whether that be through supervised, un-
supervised, reinforced, deep, or continuous learning.
The idea or concept phase to implement ML comes, as with any AI based technology, with the need to understand
the problem(s) to be addressed at a business level and develop a use case.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 143
Data Integrity by Design Appendix S1
Once this is accomplished, the primary task is to build a data set based upon what the organization is trying to
achieve, not forgetting to identify assumptions. However, in order to build an appropriate set of data, an organization
must understand what information is important to them (i.e., meaningful) and find where that data/information
is available along with how to exclude unwanted/irrelevant data. This is defined by the desired features and/or
parameters to be directly involved in the ML, which can be small and/or even limited in quantity.
This process should begin with, and/or be supported by, a solid data governance strategy and a realization that not all
data is created equal. This includes awareness for data size, quality, and prevalence.
There are many places an organization can begin to look in order to acquire data. The first place is typically internal
to their company and given operations. This is often within an existing data warehouse or data lake, where the
company’s data has hopefully been already organized, leveraging a standardized data nomenclature (see Section
4.7). Ideally the organizations internal data is not siloed, as it may not be possible to converge all data streams. If so,
data from multiple sources such as this is usually in an “unorganized” format and requires proper preparation and
labeling to become useful to a ML model.
It is also possible to obtain data external to one’s own organization to serve as the basis for ML. This data may
come from various sources of which some may be structured, unstructured, and/or semi-structured. Regardless, as
previously mentioned, a data selection and governance strategy should be in place that aims for a diverse and non-
biased data set. The data set should also aim for a large number of data points, keeping in mind a reduction of overall
data complexity.
One must consider data classification (see Section 3.3), clustering (e.g., including data privacy needs), regression,
and ranking, as well as what values and/or metadata are “critical” or which simply add more complexity. It is important
to not just have the “right” data but also to have it in the “right” form or format, which is one of the major challenges in
the use of external data sources (see Section 1.6.4 Data Quality).
If obtaining and/or using external data (i.e., open source or public information) an organization must ensure that they
have the right to use such data and that Personal Information (PI) and/or Personally Identifiable Information (PII)
considerations are appropriately taken to comply with regulations such as the EU GDPR [17]. If this is not the case,
then certain data attributes may need to be anonymized.
• Privacy and Controls: Data classifications, data use, risk, mitigation controls
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 144 ISPE GAMP® RDI Good Practice Guide:
Appendix S1 Data Integrity by Design
• Augmenting to diversify
Data sets are often inaccurate, and thus they need to be “prepared” prior to making them available for ML activities.
The quality and integrity of the data (see Section 1.6.4 and Appendix M2) makes all the difference (the quality of
output is determined by the quality of input) and must be suitably classified and labeled (i.e., assigning tags to make
it more identifiable for predictive analysis). The data needs to be free of bias to certain regions, demographics, etc.
This is one of the most time-consuming efforts in the process and should ideally be done by dedicated data scientists
that possess distinct domain knowledge and expertise with the data. This aids in their ability to decide upon relevant
data structuring, cleaning (i.e., removal or replacement of missing values), labeling, annotating, and preparation for
further “processing” and use. This is imperative in order to achieve “good results” as the understanding and proper
preparation of data is directly attributable to proper selection, build, and testing of models.
Putting together the data in an optimal format is known as “feature transformation.” This includes the format (e.g.,
differing files), data cleaning (i.e., removing missing values), and feature extraction (i.e., which features or data
elements are most important for prediction speed and accuracy). Normalizing the data set also helps to improve
these circumstances by reducing dimensions.
During the project phase of ML, a model is selected based upon the question it is expected to answer. Common
models include, but are not limited to:
Once a model has been selected, the model and algorithms are configured to allow the machine to learn using the
“case data” selected during the concept phase, which has been divided into two sub-datasets to train, and validate/
test the machine (e.g., 80% training and 20% validation/testing).
The training data set is used to train the model for performing various actions and/or teaching it how to apply certain
concepts. This requires data input along with defined and expected outputs (i.e., supervised learning).
The validation/test data set is used to evaluate how well the model was trained and assists in fine tuning the model.
For instance, does the defined input generate the correct output? This may often require human verification and/
or the use of tools. Validation is performed to provide evidence that the accuracy of the model and its associated
algorithms delivers against output expectations. It is important to consider the need for multiple data sets as data
used may alter the original data, making it unavailable to re-validate.
Finally, there may be additional data used for the confirmation of acceptable performance (i.e., verification) within
the overarching application’s User Acceptance Testing (UAT) prior to deployment. This data is important from a
confidence perspective, in order to verify that certain performance scoring and/or defined outcomes are in fact being
achieved.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 145
Data Integrity by Design Appendix S1
In the operational phase it is possible to integrate, enrich, and prepare new data to further refine existing models,
develop new predictive models, and establish performance monitoring measures to track ongoing effectiveness.
It is important to note and understand that model performance goes beyond statistics and includes items mentioned
such as applicability, reproducibility, and interpretability. There needs to be QC standards and/or measures to ensure
acceptable performance throughout the model’s usable life.
Poor data and a good model is a bad combination that can ruin the success of ML.
Risk management helps to drive much of the process and controls needed because it gives a direct indication of
possible harm that could happen in different scenarios. ML is about increasing efficiency while decreasing potential
harm, and it is imperative to outline possible scenarios and gaps from the traditional risk management approach; this
means performing risk assessments and then assessing the additional risks of having ML intervention. This must
also account for the volume and scope of the initial data set in ensuring all scenarios, features, etc. were properly
trained, and that the training data was correctly labeled prior to use, as this could lead to additional requirements to
be evaluated.
• Benefits: What does the intervention of ML mitigate from a traditional electronic system?
• Costs: What potential new risks have been brought about? Are there mitigation plans?
• System performance, down time, etc. (i.e., must consider business continuity and potential data loss scenarios)
Change management should continue following the current change process in the company and adhere to regulatory
requirements, taking into consideration possible mitigation efforts brought about in the risk management efforts
(Section 18.7.2) and focusing on how to maintain the integrity of the data.
User experience and system controls along with other human intervention are a great place to start; drafting a visual
of the process with possible scenarios and mitigation can help identify weak points in the process.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 146 ISPE GAMP® RDI Good Practice Guide:
Appendix S1 Data Integrity by Design
• Input: Internal procedures that apply to regulatory needs and how those may need to be adjusted or managed
differently with ML
• Output: Necessary regulatory deliverables required for change management, for example, if it is Software as a
Medical Device (SaMD) or a specific GxP software (i.e., a clinical system or a laboratory system can have very
different approaches and documents needed for changes)
For example:
• Data inputs and the process of ML: Details about how a change is initiated and what impacts on the data that
could have
For example:
- What are the rollback plans or separate environments that allow for issue detection and protection of data
sets?
ML may be used in a highly specific part of a process but could have large reaching effects; knowing the boundaries
of a change impact allows for effective mitigation controls.
FDA Discussion Paper: Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine Learning
(AI/ML) – Based Software as a Medical Device (SaMD) [72]
BSI and AAMI Position Paper 2019: The emergence of artificial intelligence and machine learning algorithms in
healthcare: Recommendations to support governance and regulation [73]
ISO IEC 62304: 2006 Medical device software — Software life cycle processes [74]
Thinking on its own: AI in the NHS Section 5 – Overcoming System Challenges [75]
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 147
Data Integrity by Design Appendix S2
Appendix S2
Assurance
19.1 Introduction
As described in ISPE GAMP® Guide: Records and Data There are tools used to automate and
Integrity [8], Section 1.5.6 GxP Computerized System Life supplement the tracking and testing of
Cycle: non-product systems such as application
lifecycle management systems, comparison
“Data integrity is underpinned by well-documented, tools, software testing tools, and bug tracking
validated GxP computerized systems, and the application tools. The use of these tools is not the focus
of appropriate controls throughout both the system and of the regulations, and good engineering,
data lifecycles.” software engineering, and IT practices should
be applied to ensure they are acceptable
This foundational principle is often misunderstood and used for use rather than a separate validation or
to justify the creation of non-value adding documentation. qualification effort.
A system lifecycle approach, such as described in ISPE GAMP® 5 [9], should be applied to each GxP computerized
system. Record and data integrity should be built-in and maintained throughout the GxP computerized system
lifecycle phases, from concept through project and operations, to retirement. The GxP computerized system lifecycle
activities should be scaled based on the complexity and novelty of the system, and potential impact on patient safety,
product quality, and data integrity.
This appendix describes the application of the FDA Center for Device and Radiological Health (CDRH) Computer
Software Assurance (CSA) [4] concepts within a ISPE GAMP® 5 [9] system lifecycle approach framework in order to
assist in clarifying what it means to have a well-documented, validated GxP computerized system to effectively and
efficiently achieve data integrity by design.
While CSA concepts apply broadly to the validation for intended use of a computerized system, in this appendix they
are specifically applied to data integrity, which is itself foundational to ensuring patient safety and product quality.
This appendix shows how a combination of critical thinking and CSA approaches can achieve data integrity through
effectively and efficiently managing data integrity risks in support of product quality and patient safety by focusing
efforts on the areas of highest risk to data integrity.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 148 ISPE GAMP® RDI Good Practice Guide:
Appendix S2 Data Integrity by Design
Computer Software Assurance was born from CDRH’s Case for Quality [4], which enhances and incentivizes the
adoption of practices and behaviors to improve medical safety, responsiveness, and how patients experience
products. As part of that effort, the FDA supports and encourages the use of automation, information technology, and
data solutions throughout the product lifecycle in the design, manufacturing, service, and support of life sciences.
Automated systems benefit product quality and patient safety by reducing errors, reducing patient risk, optimizing
resources, and increasing business value.
While the FDA planned guidance [4] uses the term “Computer Software Assurance” there is no intent to limit the
application of CSA to software only. Software is only one component within a computerized system, which in turn is
part of a wider operating environment. A holistic approach is needed to ensure the overall computerized system is fit
for intended use in support of the business process.
Risk-based testing models have been developed following good software engineering principles and practices.
Application of such approaches supports the effective achievement of data integrity objectives. CSA outlines a
risk-based approach to testing and details the intended uses the FDA considers as high risk. In addition, a potential
framework for testing methods and an acceptable record of objective evidence is provided. CSA emphasizes that:
quality is the primary focus, not documentation and that the record demonstrating that the system performs as
intended should be primarily of value to the regulated organization.
A cornerstone of CSA is the application of critical thinking and risk-based principles when developing the computerized
system lifecycle strategy in support of data integrity. A regulated company should focus on understanding the intended
use of the system in support of a regulated business process, and the risks introduced by the system not performing
as intended with respect to patient safety, product quality, and data integrity. This allows a knowledgeable and
experienced team of SMEs to select and apply the appropriate strategy to evaluate the process, the data flows
through the process, and the computerized systems supporting the process, and the degree of human interaction
with the system. Systems or functions directly impacting patient safety, product quality, and data integrity are likely to
require a rigorous assurance effort and recorded results. Conversely, if the system or function has only an indirect5
impact, the effort and recorded results should be appropriately scaled to be as least burdensome as possible.
CSA is consistent with General Principles of Software Validation [78], which states:
“The level of validation effort should be commensurate with the risk posed by the automated operation.”
CSA is also consistent with robust lifecycle methodologies such as ISPE GAMP® 5 [9] or AAMI TIR36 [79]. Like ISPE
GAMP® 5, CSA takes a lifecycle approach where activities and controls at earlier or later stages, such as a robust
risk assessment based on product and process understanding, can be leveraged during the verification stage. ISPE
GAMP® 5 presented the concept of risk-based testing to focus on testing functions with high-risk priorities. CSA builds
on the ISPE GAMP® 5 principles by directly acknowledging the acceptability of concepts from the software testing
industry, such as exploratory testing, which neither rely on scripted test cases nor extensively capture screenshots as
test records.
Ineffectual application of the risk-based approach combined with unskilled validation/quality practitioners has resulted in:
• Dry running of tests to reduce execution errors caused by the inordinate level of detail in the test instructions
• Focus on error-free execution of the formal script instead of software/system error detection
5
Examples of indirect impact are situations where the system feature, operation, or function is: (1) collecting and recording data from the process for
monitoring, review, or statistical process control; or (2) related to quality system integrity, e.g., electronic signatures, logs of system configuration
changes.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 149
Data Integrity by Design Appendix S2
• Spend more time actively testing to find defects and less Validation is often erroneously regarded as
time generating specifications in advance requiring a rigorous formal structure and direct
regulated company QA oversight to fulfill a
The ultimate objective of testing should always be the “compliance” checkbox requirement. In reality,
reduction of defects before the system goes live; the CSA the underlying objective of validation is to
approach expedites this in support of patient safety, product support the intended use of the computerized
quality, and data integrity. system by the regulated company in support
of a business process, and this can be
CSA is a balanced approach promoting a thorough best achieved by leveraging scripted and
understanding of the intended use of the system by the unscripted testing approaches within the
organization, an understanding of risk, use of more efficient validation activities.
and effective testing methods, and an appropriate level of
objective evidence to improve and accelerate the use of Assurance activities can occur outside of
technology in the regulated system landscape. This approach rigorous pre-defined test specifications and
is aligned with the way systems are reviewed during any still support patient safety, product quality, and
regulatory audit or inspection. The validation plan and data integrity.
summary are reviewed. The functional behavior of the system
is primarily confirmed in production, for example, “show me ISPE GAMP® 5 Appendix D5 [9] states that
the Qualified Person approval for this lot.” In the event that
testing is reviewed, it is either a high-risk area (where scripted “Unnecessary supporting documentation
testing should have been used) or an actual production failure that does not add value to the normal test
that should have been caught before final acceptance and results should be avoided.” (emphasis
release. added)
19.1.1 Current State and Potential Barriers The principles of unscripted testing are
discussed later in this appendix.
Other regulated and non-regulated industries have
increasingly moved forward and adopted frameworks
for modern testing and modern lifecycles. Technology
is continually evolving, creating new opportunities and expectations; therefore, the approaches for validation for
intended use should adapt accordingly to support companies to deliver products to market fast while continuing to
ensure patient safety, product quality, and data integrity. The life sciences industry has an opportunity to break away
from the past and change the perception of computerized systems validation. The conjunction of ISPE GAMP® 5 [9]
and CSA enables computerized systems to be validated for intended use with increased quality and speed.
Computerized system validation practices are often anchored in the past, and greatly influenced by CDRH’s General
Principles of Software Validation [78], which was finalized in 2002. This document was referenced by the FDA
Guidance for Industry: Part 11, Electronic Records; Electronic Signatures – Scope and Application (2003) [30] and
again in 2007 in the FDA’s Computerized Systems Used in Clinical Investigation [80]. General Principles of Software
Validation [78] covers both software that is itself, or is part of, a regulated medical device, as well as systems used
as part of production or the quality system (non-product systems). The requirements for the lifecycle and validation of
the regulated medical device software itself do not directly apply to such non-product systems, but this distinction has
not always been clearly and correctly applied by regulated companies. The average publication date of the software
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 150 ISPE GAMP® RDI Good Practice Guide:
Appendix S2 Data Integrity by Design
engineering references in General Principles of Software Validation is 1993 [78]; and testing methodologies, tools,
and techniques have advanced significantly since then. Consequently, the current state of validation is influenced by
the misapplication of guidance for product systems and by long outdated software engineering practices.
During the same time period, the pharmaceutical industry implemented lifecycle models such as ISPE GAMP® 5
[9] and AAMI TIR36 [79]. ISPE GAMP® 5 [9] supports a patient-centric and QRM approach to the assurance of
computerized systems. Unfortunately, in many cases ISPE GAMP® 5 has not been effectively implemented in the field
due to the absence of critical thinking and the use of unskilled practitioners. There have been deficiencies in historical
approaches to validation that have led to data integrity risks, such as lack of process definitions and data flows,
poorly-defined configuration settings, gaps in the development of procedures and training for review of electronic
data and audit trails, and gaps in other aspects of validation for intended use, and governance of automated business
processes. In the absence of critical thinking, validation typically becomes a template activity applying a formulaic risk
assessment per ISPE GAMP® 5 Appendix M3 [9]. Testing is done irrespective of the outcome of the risk assessment
(minimal or excessive, depending on the company), to suit their internal company ethos. In some regulated
companies, this has resulted in over-simplified table-driven approaches with no application of critical thinking and lack
of meaningful consideration of aspects such as complexity, novelty and supplier development testing in the definition
of the appropriate system lifecycle activities.
All of the above means industry has not embraced advances in development and testing methodologies, techniques,
and test tools; risk-based testing is just one technique of many techniques and tools in modern testing tools.
An area many regulated companies continue to struggle with is the continued use of detailed test scripts requiring
screen shots without applying critical thinking. Some companies make the mistake of focusing on regulatory risk
over patient risk, and hence the major goal becomes documentation that will pass an inspection. In some cases test
specifications are repeatedly executed in advance (dry run) until they can be executed without detecting a single
issue, and until they have no ability to find any new defects (which is a primary goal of effective testing).
The ultimate aim of effective validation is that the computerized system (including people, process, procedures) is
fit for intended use. Testing focus is often on ensuring the automation feature, operation, or function performs as
intended without compromising patient safety, product quality, and data integrity, but it is important to verify the wider
functionality in terms of the underlying business process and data flows in support of data integrity. The regulated
companies and people performing the assurance activities should apply critical thinking and ensure the assurance
activities are value added and meaningful instead of an inspection-readiness task. Focusing on ensuring a regulated
company’s business needs first, and then leveraging work already performed will lead to a higher quality system; the
validation effort will be reduced when it values work already performed instead of adding an unnecessary burden
or redundant activity.6 The generation of supporting documentation for an inspection is a by-product of the system
lifecycle.
The level of effort should be commensurate to the risk acceptable within the organization as defined in its policies,
procedures, and plans. The regulated company determines the assurance activities based on their own need to
ensure systems are fit for intended use. The key is to determine the regulated company’s level of risk acceptance,
based on the intended use of the software, and factoring in the technical or procedural controls that are currently in
place or that will be put in place.
6
An example of a redundant activity is repeating functional testing (SQA) already performed in another, but still adequately controlled environment
because it lacked independent quality assurance unit review. The adequacy of testing is a technical issue as opposed to a quality assurance/
compliance issue.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 151
Data Integrity by Design Appendix S2
There are strong benefits for the life science industry in • Failure Mode Effects Analysis (FMEA) is
applying tools that enable test automation and efficiency effective when real data is available, when
gains within the system lifecycle (e.g., requirements, business there are many components, and a bottom
process maps, test case management, test automation, up/cause and effect approach can be
traceability, and comparison utility). Teams can embed how helpful at identifying risks
they work and the tools they use as part of the process and • Fault-Tree Analysis (FTA) is a “top down”
no longer add manual steps to produce documentation and approach for identifying risk iteratively
testing evidence. It should be noted that the use of incidental using a cause and effect model
tools to aid in the validation effort does not trigger a separate • Hazard Analysis and Critical Control Point
validation effort; automated tools only require a documented (HACCP) is widely used as a preventive
assessment for their adequacy [11, 25], which should have food safety system where hazards are
been completed prior to their use in a validation project. A identified and controlled at specific points
tool is not a regulated system, and its acceptability should in the process
be documented using the company’s non-GxP business • Hazard Operability Analysis (HAZOP) may
practices, for example, by the application of good engineering be used to identify potential hazards in a
practices and the availability of evidence of proper selection, system and identify operability problems
installation, and control. likely to lead to nonconforming products.
The regulatory requirements define that the software must be There are many tools, and regulated
fit for its intended use [11, 30], but not how this is achieved. companies should apply critical thinking to
Testing documentation can vary in depth and detail yet still identify and apply them effectively. A data
provide assurance. Each company needs to define how to integrity risk assessment should be included
implement the most effective, least-burdensome mitigation of as part of the quality risk management
the risks to patient safety, product quality, and data integrity approach, and should leverage detailed
as part of their overall validation process. business process mapping and data flow
diagrams to identify potential data integrity
risks through the end to end process.
7
Indirect leveraging is all predicated on an assessment and ongoing monitoring of the vendor’s quality management system and practices, including
contracts. Direct leveraging of test results does not require this as the test results of the system are directly reviewed and deemed acceptable.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 152 ISPE GAMP® RDI Good Practice Guide:
Appendix S2 Data Integrity by Design
This section describes how ISPE GAMP® 5 [9] principles can be combined with CSA concepts, and applies the
combined approach to a data integrity focused example.
ISPE GAMP® 5 [9] and CSA advocate a lifecycle approach with the application of computerized system best practices
within the QMS framework. See Figure 19.1.
The starting point within the regulated company’s system lifecycle is to ask the question: “What is needed to
be confident that the system is fit for intended use and meets the regulated company’s needs?”, and to define
an appropriate response. Confidence in system functionality can be achieved by leveraging supplier activities
demonstrating that the system performs as expected, based on the regulated company’s supplier management
practices. This includes activities performed to select the supplier, supplier activities performed within their own
lifecycle to produce the system, factory acceptance testing, etc. The regulated company will implement the
computerized system for its intended use in their operating environment, including the integration with other systems,
people, processes, and procedures. Regulated companies should identify risks associated with the intended use of
the system, and define and complete additional assurance in the form of validation activities to mitigate these risks.
• Use of controls earlier or later in the project stage to mitigate risks (e.g., informal reviews, walk-throughs,
inspections, and static analysis)
• Use of the most appropriate and effective testing techniques at every point including unscripted testing (see
Section 19.3.2)
• Ongoing monitoring of existing and new risks, and the effectiveness of the current risk mitigation controls
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 153
Data Integrity by Design Appendix S2
These lifecycle activities provide a comprehensive assurance approach that can reduce the need for testing within the
regulated company’s system lifecycle. The need for additional assurance through testing is determined by assessing
the system’s impact on safety and quality, to answer the question: “What is needed to ensure the system does no
harm by addressing impact to patient safety, product quality, and data integrity?”
Features, operations, or functions with a direct impact to patient safety, product quality, and data integrity may require
the most rigorous assurance efforts and objective evidence, and indirect impacts require the least amount of rigor
and objective evidence. CSA introduces the use of risk-based documentation, following good software engineering
principles and good software testing practices that include unscripted and scripted testing to address this risk-based
approach.
This section provides a reference model for the additional assurance activities noted in Section 19.2.
In addition to the risks to patient safety, product quality, and data integrity, testing and assurance activities should
consider the following:
• The expected or feared failure modes and the primary undesirable operational outcomes
• How many tests are required to cover the scope under consideration?
• What is the specific testing technique applicable in this case (negative, positive, functional, structural, unit,
integration, acceptance, performance, regression, error-handling, boundary, etc.)?
• The role the testing plays in demonstrating fitness for intended use and discovering defects
A formulaic approach to always applying a particular testing technique to a particular risk priority may not achieve
the maximum test effectiveness. The effectiveness of testing primarily aimed at defect identification, for instance,
is directly related to the nature of the function to be tested, the nature of the likely defects, the architecture of the
component to be tested, which tools are being used for testing, and the logic of the process supported.
An effective testing and assurance strategy cannot be defined based solely on risk priority, and should additionally
consider the factors listed above (which itself is not exhaustive).
Testing activities fall under two broad categories: static techniques and dynamic techniques (not related to static and
dynamic data). Static techniques test the software without executing it, for example, code reviews, walk-throughs,
and static analysis. Static techniques are typically used in the development of systems and may reduce the amount of
dynamic testing undertaken by the supplier.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 154 ISPE GAMP® RDI Good Practice Guide:
Appendix S2 Data Integrity by Design
Dynamic testing occurs when the system functionality is confirmed through execution. CSA is primarily focused on
dynamic testing and a risk-based application through scripted and unscripted techniques. Scripted testing is testing
carried out following a documented sequence of test cases, that is, the tester’s actions are prescribed by written
instructions within a test case. Unscripted testing is dynamic testing where tester’s actions are not prescribed step-
by-step within written instructions and are experience based in nature, and may include ad hoc, error guessing, and
exploratory testing. See Table 19.1 for further details.
CSA identifies these as possible ways of providing assurance and are not intended to be prescriptive or all inclusive.
There are many possible ways and techniques for classifying and structuring testing, and software professionals
and SME testers should choose the appropriate testing approaches, structure, and tools. Regulated companies may
leverage any of the approaches or a combination of approaches that they determine will verify the intended use and/
or mitigate the risk most appropriately.
Scripted test cases are the traditional validation test cases, which typically include execution instructions, expected
results, independent review, and approval of test cases. Scripted testing is not restricted to manual execution of pre-
defined test specifications and can and should leverage automated test tools where appropriate.
Limited and robust scripted testing are respectively reserved for medium- to high-risk, and highest-risk features,
functions, or operations. For example, robust scripted testing may include positive, negative, and alternate path
testing; limited scripted testing may be used to test positive scenarios explicitly documented due to the risk to patient
safety and/or product quality. It should be noted that CSA leverages risk-based documentation and test rigor is not
automatically linked to documentation rigor – see Figure 19.5.
In scripted testing, it is a common misconception that extensive screenshots are required to provide objective
evidence to demonstrate validation of computer systems used in automation of business processes. CSA addresses
this misconception by clarifying the purpose of the objective evidence and what constitutes an acceptable record
demonstrating confidence in the system reliably and repeatedly performing as intended.
In the age of modern browsers and freely available image editors, the screenshot, once the gold standard of
supporting evidence, no longer has the same irrefutable status. For example, modern web browsers have built-
in tools that can edit text and replace images prior to taking screenshots in an undetectable fashion as the edits
are indistinguishable from natively generated systems records. Similarly, image editors can be used to manipulate
screenshots after creation, in an undetectable fashion, especially when the images are printed.
While extensive screenshots are now considered to bring little value to verification activities, there are cases where
a screenshot may still have merit when collected for areas with high impact to patient safety and/or product quality.
These are explained in the ISPE GAMP® Good Practice Guide: A Risk-Based Approach to Testing of GxP Systems
(Second Edition) [22]:
“Requirements for additional test evidence in the form of printouts, screenshots, etc., should be clearly defined in
the test method and focus on:
• Test steps which produce complex results, which may be difficult or time consuming to record manually
• When it is faster than manual recording of sufficient evidence to allow independent review
• When the result is something essentially visual and easier to review from a screenshot
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 155
Data Integrity by Design Appendix S2
CSA identifies three unscripted testing approaches: (1) ad hoc testing; (2) error guessing; and (3) exploratory testing.
Ad hoc testing is unscripted testing performed without planning or pre-defined documentation, and may occur in an
informal test environment or a controlled environment, depending on the purpose of the testing. CSA recommends
ad hoc testing for the lower risk areas or as a precursor to defining scripted testing for higher risk functions. Ad hoc
tests may be experience based and are conducted randomly and informally with a minimal record of the activity (such
as the objectives and conclusions of the test activity and the identity of the tester). Figure 19.2 shows the least-
burdensome nature of the testing.
Ad-hoc testing was performed on deletion permissions when configured for out-of-the-box
(OOB) roles and fields/screens. Testing executed by John Doe (JD) on 3rd June 2020.
Issues 005, 006, and 007 were found and logged in the bug tracker. Ad-hoc testing concluded
with all issues resolved and the system functioning as expected.
JD
Initials: __________
3 June 2020
Date: ________________________
_
_
Error guessing and exploratory testing are both experience-based testing techniques and rely on the skill and
expertise of the tester. It is important to use testers who understand the supporting business process and can
anticipate how real users will/might use the software.
Error guessing is another unscripted test design technique in which test cases are designed to expose anticipated
errors based on the tester’s experience and general knowledge of failure modes. Tests may challenge the quality
of the system by injecting invalid entries and errors into the system to evaluate behavior, especially reliability, of the
system. A structured approach to error guessing is to list common failure modes and attempt to produce them.
Exploratory testing is unscripted testing in which the tester actively controls the test design as those tests are
executed. Initial test design is based on the tester’s existing relevant knowledge, prior exploration of the test item
(including results from previous tests), and critical thinking regarding common software behaviors and types of failure.
During test execution, the tester uses information gained while testing to design new and better tests dynamically.
A structured approach to exploratory testing is to list the use cases/scenarios or specific operations that need to be
covered. Figure 19.3 is a sample of exploratory testing where the specific use cases are noted and explicitly covered.
Test Overview – capture at a high level the test Use Cases Issues Tester Initials and Date
activities executed
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 156 ISPE GAMP® RDI Good Practice Guide:
Appendix S2 Data Integrity by Design
Table 19.1 summarizes the acceptable assurance approaches and records discussed in the previous sections.
Unscripted Testing: Testing of requirement Details regarding any • Summary description of failure
Error guessing or function failure failures/deviations modes tested
modes with optional found • Issues found and disposition
listing of expected • Conclusion statement
failure modes in • Record of who performed testing
advance and date
Scripted Testing: • Test objectives • Pass/fail for test • Detailed report of assurance
Robust • Test cases (step- case activity
by-step procedure) • Details regarding • Result for each test case – any
• Expected results any failures/ critical values and indication of
and values deviations found and pass/fail
• Independent disposition regarding • Issues found and disposition
review and fails • Conclusion statement
approval of test • Record of who performed testing
cases and date
• Record of who reviewed testing
and date
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 157
Data Integrity by Design Appendix S2
These are all possible ways of providing assurance and are not intended to be prescriptive or all inclusive. Similarly,
different SMEs may be more appropriate for different test types; for example, interface testing is best done by a
system SME, whereas exploratory or error guessing may need both system and business expertise for maximum
defect detection.
The use of tools and automation for as many as possible of the activities mentioned is strongly encouraged over the
use of documents, for example, test management and test automation systems and tools, and use of traceability tools
rather than traditional traceability matrices.
19.4 Example: Applying Risk-based Approach from ISPE GAMP® 5 and CSA
This section contains an example model for integrating ISPE GAMP® 5 [9] with CSA, however regulated companies
should apply critical thinking to develop their own model. An effective testing or assurance strategy should consider
(with full and effective application of critical thinking) what is being tested, how it is built, who is performing the testing,
when, and with what tools and for what purpose. It is no longer enough to just be based on the potential residual risk
of the function, component, or attribute under test. Merging ISPE GAMP® 5 [9] with CSA begins with combining the
lifecycle approach (defined in Section 19.1.1) and risk-based assurance (Section 19.1.2).
Table 19.2 contains a simplistic example of a LIMS system using current conventional thinking, presented as a
teaching model.
Conventional approaches combine the primary scenarios with a few elements listed in the additional scenarios into
a formal validation protocol. Regulated companies often include the additional scenarios based on lessons learned
from production defects that escaped prior testing efforts on prior releases or other systems. Given exhaustive testing
is impossible (all combinations of preconditions and inputs), formally documenting unnecessarily creates a priority
inversion where documentation becomes more important than the quality of the system.
With CSA, the regulated company applies a risk-based approach to documentation and includes the primary
scenarios as part of the validation package. The additional scenarios are tested when the primary scenarios are
tested, but do not warrant formal validation documentation (e.g., risk assessments, scripts); the goal is to reduce the
probability of undiscovered defects remaining in the system that could impact intended use. Should the additional
scenarios warrant formal documentation and traceability, they should be moved from additional scenarios to primary.
Applying ISPE GAMP® 5 Appendix M3 [9] risk methodology shown in Figure 19.4, the company should focus its
validation effort on areas of higher risk priority.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 158 ISPE GAMP® RDI Good Practice Guide:
Appendix S2 Data Integrity by Design
With CSA, the level of documentation detail needed can also be scaled based on risk priority, as shown in Figure 19.5.
Figure 19.5: Scaling the Test Rigor and Documentation Based on Risk Priority
Applying an integrated ISPE GAMP® 5 [9] and CSA approach that leverages critical thinking to the example above,
the validation activities can be determined, as shown in the example in Table 19.3.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 159
Data Integrity by Design Appendix S2
Only Privileged Severity: High M Unscripted Testing Primary scenarios as determined by the
Users (PU) – recommend regulated company
Probability:
can delete exploratory testing
Medium PS 1. PU can delete laboratory data
laboratory data
Detectability: PS 2. General users cannot delete
Medium laboratory data
PS 3. …
A full listing of all lifecycle testing activities for that requirement is shown in Table 19.4.
PS 3. … Exploratory testing …
AS 6. … … …
*These can be tested before validation testing begins, or when the primary scenarios are tested. Should they
require formal documentation and traceability, they should be changed to primary scenarios.
The emphasis should always be on testing to ensure patient safety, product quality, and data integrity rather than the
generation of documentation for inspection purposes alone. In this case, only the primary scenarios are documented,
which provides more time to test using modern techniques (e.g., automation). The additional scenarios are tested
as determined by the manufacturer to ensure they work properly for their business and not just for inspection
purposes alone. The level of documentation should take into account any future needs for regression testing (i.e.,
is the documentation sufficiently detailed that a repeat iteration would follow exactly the same test procedure?) and
whether it provides sufficient assurance to justify leveraging these tests rather than repeating them as formal scripted
testing in a later environment. Ongoing metrics and monitoring are recommended to be used to ensure the system is
operating as expected.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 160 ISPE GAMP® RDI Good Practice Guide:
Appendix S2 Data Integrity by Design
19.5 Example: Applying ISPE GAMP® 5 and CSA Using Direct Leveraging of Testing
Throughout the System Lifecycle
This section revisits the example in Section 19.4, but here uses the lifecycle approach endorsed by ISPE GAMP® 5
[9] and CSA to leverage the efforts from earlier stages and increase the amount of testing to ensure patient safety
and product quality. Specifically, this example presents a method for leveraging functional and UAT performed in an
earlier environment (for example, in a development or evaluation environment) instead of repeating the same testing
in a formal validation environment. As noted in Section 19.4, the regulated companies should apply critical thinking to
develop their own model.
19.5.1 Background
The regulated company applies a software testing process as part of implementing a new computerized system. This
testing (undertaken as Software Quality Assurance (SQA)) is separate and distinct from the development testing
performed by the vendor during the original development of the computerized system. The intent of this testing is risk
awareness through defect prevention and defect detection; it is led by the test team with support from the business
team and technical team; all three own the accountability for the quality of the system through reviews and analysis to
prevent defects, and scripted and unscripted testing to detect defects. Close collaboration ensures the SQA activity is
aligned with intended use and technical complexity. The test team should plan and execute the testing approach once
this alignment is achieved.
While non-GxP in nature, the earlier environments are still controlled and governed by the good engineering
practices (e.g., configuration management, release management); for example, all deployments should be approved
before execution; all deployments are assessed for impact for changes; test planning should be premeditated, test
scenarios/objectives should be reviewed before execution; and the test results should be documented in a fashion
that can be reviewed.
The validation plan provides the rationale and justification for using testing from the earlier environments, and the
validation SOP specifically allows leveraging testing from an earlier environment.
Upon completion of the internal test cycle, the test artifacts are evaluated by the validation team8 to confirm each
requirement has been tested to the level of rigor commensurate with the risk priority (e.g., as determined by ISPE
GAMP® 5 Appendix M3 [9]). The focus is functional acceptability of the requirements based on the artifacts, that is, is
the function, operation, or feature working as required? If yes, trace the requirement to the functional testing effort in
the traceability matrix. If no, additional testing is performed to remediate the gap before establishing traceability.
The regulated company’s GxP controls begin at the point of leveraging, that is, when the earlier environment test
artifacts are evaluated for inclusion as part of the overall validation package via the traceability matrix. The regulated
company should refrain from applying GxP documentation standards on the earlier environment testing. For example,
as part of unscripted testing the test evidence may be generated before the test documentation is completed or test
cases may be executed by automation. The question is: “Has the right testing been performed?”, not “Has the work
been documented to good documentation standards?” QA should review the traceability matrix to ensure that the test
strategies were followed and implemented as planned.
The validation summary report documents the extent of testing from the earlier environment that was leveraged, with
reference back to the justification for this in the validation plan.
8
The test team and validation team may be one and the same.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 161
Data Integrity by Design Appendix S2
Similarly, the regulated company should determine the level of risk-based documentation from a CSA perspective.
Areas of higher risk should have more detailed test documentation as noted in Figure 1.5. Successful application
of CSA depends on the ability to recognize that different testing techniques can be combined to achieve risk-based
assurance; for example, even the highest-risk requirement can be confirmed based on unscripted testing (e.g., a
collection of exploratory test scenarios that cover the feature fully). See Table 19.5.
Only Privileged Users Severity: High M A retrospective review of the SQA cycle documentation
(PU) can delete identified that the following test scenarios had been
Probability:
laboratory data covered:
Medium
Ad hoc/Basic Assurance
Detectability:
1. Deletion permissions work when configured for Out-of-
Medium
the-Box (OOB) roles and fields/screens
Automated testing:
1. User roles (including lab user) configuration
programmatically confirmed
2. …
Exploratory testing:
1. Deletion functionality is disabled versus hidden,
i.e., inaccessible using keyboard short cuts during a
transaction
2. Permissions are enforced when users attempt to delete
via the API (without the user interface)
3. …
*This represents one middle-of-the road-example of leveraging, and many other techniques may be used. For
example: (1) the traceability matrix from the SQA cycle can be used directly without creating a separate one; or
(2) the business team, technical team, and test team can define a test design specification to prospectively guide
the test effort (as opposed to retrospectively guide in the example), and assures the scenarios for leveraging are
documented sufficiently.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 162 ISPE GAMP® RDI Good Practice Guide:
Appendix S2 Data Integrity by Design
The leveraging of UAT may occur in analogous fashion, where the business users can also provide their test
scenarios and test data for inclusion during the testing stages to minimize or obviate the need for a separate user
acceptance test cycle or PQ. The business users can review the output of the testing to ensure their needs are met.
19.6 Conclusion
CSA describes a risk-based approach to the validation of computerized systems that reinforces ISPE GAMP® 5 [9]
principles and key concepts. The foundation is the application of critical thinking by a knowledgeable and experienced
team of SMEs. The assurance activities within the quality system should be meaningful and add value, applying a
pragmatic, least-burdensome approach to ensuring patient safety, product quality, and data integrity.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 163
Data Integrity by Design Appendix G1
20 Appendix G1 – References
Appendix G1
1. EudraLex Volume 4 – Guidelines for Good Manufacturing Practice for Medicinal Products for Human and
Veterinary Use, Chapter 1: Pharmaceutical Quality System, January 2013, http://ec.europa.eu/health/documents/
eudralex/vol-4/index_en.htm.
2. 21 CFR Part 211.160 – Current Good Manufacturing Practice for Finished Pharmaceuticals; General
requirements, Code of Federal Regulations, US Food and Drug Administration (FDA), www.fda.gov.
3. 21 CFR Part 211.194 – Current Good Manufacturing Practice for Finished Pharmaceuticals; Laboratory records,
Code of Federal Regulations, US Food and Drug Administration (FDA), www.fda.gov.
4. US FDA Center for Devices and Radiological Health (CDRH), Case for Quality, Food and Drug Administration
(FDA), https://www.fda.gov/medical-devices/quality-and-compliance-medical-devices/case-quality.
5. Wyn, S., Reid, C.J., Clark, C., Rutherford, M.L., Watson, H.D., Vuolo-Schuessler, L.L., Perez, A., “Why ISPE
GAMP® Supports the FDA CDRH: Case for Quality Program,” Pharmaceutical Engineering, November/December
2019, Vol. 39, No. 6, pp. 37-41, www.ispe.org.
6. “Understanding Barriers to Medical Device Quality,” Center for Devices and Radiological Health (CDRH),
Food and Drug Administration (FDA), www.fda.gov/about-fda/cdrh-reports/understanding-barriers-medical-
device-quality.
7. ISPE GAMP® Good Practice Guide Series, International Society for Pharmaceutical Engineering (ISPE),
www.ispe.org.
8. ISPE GAMP® Guide: Records and Data Integrity, International Society for Pharmaceutical Engineering (ISPE),
First Edition, March 2017, www.ispe.org.
9. ISPE GAMP® 5: A Risk-Based Approach to Compliant GxP Computerized Systems, International Society for
Pharmaceutical Engineering (ISPE), Fifth Edition, February 2008, www.ispe.org.
10. 21 CFR Part 11 – Electronic Records; Electronic Signatures, Code of Federal Regulations, US Food and Drug
Administration (FDA), www.fda.gov.
11. EudraLex Volume 4 – Guidelines for Good Manufacturing Practices for Medicinal Products for Human and
Veterinary Use, Annex 11: Computerized Systems, June 2011, http://ec.europa.eu/health/documents/eudralex/
vol-4/index_en.htm.
12. MHRA Guidance: ‘GXP’ Data Integrity Guidance and Definitions, Revision 1, March 2018, Medicines &
Healthcare products Regulatory Agency (MHRA), www.gov.uk/government/organisations/medicines-and-
healthcare-products-regulatory-agency.
13. OECD Series on Principles of Good Laboratory Practice and Compliance Monitoring, Draft Advisory Document of
the Working Group on Good Laboratory Practice on GLP Data Integrity, Organisation for Economic Cooperation
and Development (OECD), August 2020, www.oecd.org/chemicalsafety/testing/draft-glp-guidance-documents-
public-comments.htm.
14. DAMA International, DAMA International’s Guide to the Data Management Body of Knowledge (DAMA-
DMBOK2), Second Edition, ISBN, PDF 9781634622363, DAMA International, https://technicspub.com/dmbok/.
15. ISPE GAMP® RDI Good Practice Guide: Data Integrity – Manufacturing Records, International Society for
Pharmaceutical Engineering (ISPE), First Edition, May 2019, www.ispe.org.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 164 ISPE GAMP® RDI Good Practice Guide:
Appendix G1 Data Integrity by Design
16. ISPE GAMP® RDI Good Practice Guide: Data Integrity – Key Concepts, International Society for Pharmaceutical
Engineering (ISPE), First Edition, October 2018, www.ispe.org.
17. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the protection of
natural persons with regard to the processing of personal data and on the free movement of such data, and
repealing Directive 95/46/EC (General Data Protection Regulation), GDPR Regulation (EU) 2016/679, (General
Data Protection Regulation), gdpr-info.eu/.
18. Health Insurance Portability and Accountability Act of 1996 (HIPAA), U.S. Department of Health & Human
Services, Centers for Disease Control (CDC), www.cdc.gov/phlp/publications/topic/hipaa.html.
19. ISPE GAMP® Good Practice Guide: IT Infrastructure Control and Compliance, International Society for
Pharmaceutical Engineering (ISPE), Second Edition, August 2017, www.ispe.org.
21. ISPE GAMP® Good Practice Guide: A Risk-Based Approach to Operation of GxP Computerized Systems,
International Society for Pharmaceutical Engineering (ISPE), First Edition, January 2010, www.ispe.org.
22. ISPE GAMP® Good Practice Guide: A Risk-Based Approach to Testing of GxP Systems, International Society for
Pharmaceutical Engineering (ISPE), Second Edition, December 2012, www.ispe.org.
23. WHO Technical Report Series, No. 996, Annex 5: Guidance on Good Data and Record Management Practices,
World Health Organization (WHO), 2016, http://apps.who.int/medicinedocs/en/d/Js22402en/.
24. PIC/S Draft Guidance: PI 041-1 (Draft 3) Good Practices for Data Management and Integrity in Regulated GMP/
GDP Environments, November 2018, Pharmaceutical Inspection Co-operation Scheme (PIC/S),
www.picscheme.org/.
25. PIC/S Guide to Good Manufacturing Practice for Medicinal Products, Annex 11: Computerised Systems, PE 009-
14 (Annexes), July 2018, Pharmaceutical Inspection Co-operation Scheme (PIC/S), www.picscheme.org/.
26. Amazon, “Amazon Compute Service Level Agreement,” Last Updated: 22 July 2020, https://aws.amazon.com/
compute/sla/#:~:text=AWS%20will%20use%20commercially%20reasonable,the%20%E2%80%9CService%20
Commitment%E2%80%9D.
27. International Council for Harmonisation (ICH), ICH Harmonised Tripartite Guideline, Quality Risk Management –
Q9, Step 4, 9 November 2005, www.ich.org.
28. ISO 14971:2019 Medical Devices -- Application of Risk Management to Medical Devices, International
Organization for Standardization (ISO), www.iso.org.
29. 21 CFR Part 211 – Current Good Manufacturing Practice for Finished Pharmaceuticals, Code of Federal
Regulations, US Food and Drug Administration (FDA), www.fda.gov.
30. FDA Guidance for Industry: Part 11, Electronic Records; Electronic Signatures – Scope and Application, August
2003, US Food and Drug Administration (FDA), www.fda.gov.
31. EudraLex Volume 4 – Guidelines for Good Manufacturing Practice for Medicinal Products for Human and
Veterinary Use, Chapter 4: Documentation, January 2011, http://ec.europa.eu/health/documents/eudralex/vol-4/
index_en.htm.
32. International Council for Harmonisation (ICH), ICH Harmonised Tripartite Guideline, Pharmaceutical Quality
System – Q10, Step 4, 4 June 2008, www.ich.org.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 165
Data Integrity by Design Appendix G1
33. EudraLex Volume 4 – Guidelines for Good Manufacturing Practices for Medicinal Products for Human and
Veterinary Use, Annex 2: Manufacture of Biological Active Substances and Medicinal Products for Human Use,
http://ec.europa.eu/health/documents/eudralex/vol-4/index_en.htm.
34. PIC/S Guide to Good Manufacturing Practice for Medicinal Products, Annex 2: Manufacture of Biological
active substances and Medicinal Products for Human Use, PE 009-14 (Annexes), June 2018, Pharmaceutical
Inspection Co-operation Scheme (PIC/S), www.picscheme.org/.
35. FDA Guidance for Industry: Data Integrity and Compliance With Drug CGMP Questions and Answers, December
2018, US Food and Drug Administration (FDA), www.fda.gov.
36. ISPE GAMP® Good Practice Guide: Validation and Compliance of Computerized GCP Systems and Data (Good
eClinical Practice), International Society for Pharmaceutical Engineering (ISPE), First Edition, December 2017,
www.ispe.org.
39. Japanese Pharmacopoeia (JP), Pharmaceuticals and Medical Devices Agency (PMDA), https://www.pmda.go.jp/
english/rs-sb-std/standards-development/jp/0005.html.
40. FDA Warning Letters, US Food and Drug Administration (FDA), www.fda.gov.
41. ISPE GAMP® Good Practice Guide: A Risk-Based Approach to GxP Compliant Laboratory Computerized
Systems, International Society for Pharmaceutical Engineering (ISPE), Second Edition, October 2012,
www.ispe.org.
42. ISPE GAMP® Good Practice Guide: A Risk-Based Approach to GxP Process Control Systems, International
Society for Pharmaceutical Engineering (ISPE), Second Edition, February 2011, www.ispe.org.
43. ISPE GAMP® Good Practice Guide: Manufacturing Execution Systems – A Strategic and Program Management
Approach, International Society for Pharmaceutical Engineering (ISPE), First Edition, February 2010, www.ispe.org.
44. MHRA/HRA, Joint statement on seeking consent by electronic methods, September 2018, Medicines and
Healthcare products Regulatory Agency (MHRA) and Health Research Authority (HRA), https://www.hra.nhs.uk/
planning-and-improving-research/best-practice/informing-participants-and-seeking-consent/#:~:text=Joint%20
HRA%20and%20MHRA%20statement%20on%20seeking%20consent,for%20seeking%20and%20documenti-
ng%20consent%20using%20electronic%20methods.
45. International Council for Harmonisation (ICH), ICH Harmonised Guideline, Integrated Addendum to ICH E6(R1):
Guideline for Good Clinical Practice E6(R2), Step 4, 9 November 2016, www.ich.org.
46. Rowley, J., “The Wisdom Hierarchy: Representations of the DIKW Hierarchy,” Journal of Information Science,
2007, 33(2), pp.163-180.
47. Ackoff, R. L., “From Data to Wisdom,” Journal of Applied Systems Analysis, 1989, 16(1), pp. 3-9.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 166 ISPE GAMP® RDI Good Practice Guide:
Appendix G1 Data Integrity by Design
49. Kane, P., A Blueprint for Knowledge Management in the Biopharmaceutical Sector, Doctoral thesis, Dublin
Institute of Technology (DIT), 2018. doi.org/10.21427/aex5-5p19.
Note on Figure 8.2: Diagram by Dr. Juan C. Dürsteler, adapted from “An Overview of Understanding” by
N. Shedroff in the book Information Anxiety 2 by R. S. Wurman (2000), further adapted by Dr. Paige Kane.
50. International Council for Harmonisation (ICH), ICH Harmonised Tripartite Guideline, Pharmaceutical
Development – Q8(R2), Step 5, August 2009, www.ich.org.
51. International Council for Harmonisation (ICH), ICH Harmonised Tripartite Guideline, Development and
Manufacture of Drug Substances (chemical entities and biotechnological/biological entities) – Q11, Step 4,
1 May 2012, www.ich.org.
52. International Council for Harmonisation (ICH), ICH Harmonised Tripartite Guideline, Technical and Regulatory
Considerations for Pharmaceutical Product Lifecycle Management – Q12, Final Version, Adopted 20 November
2019, www.ich.org.
53. Trees, L., Improving the Flow of Organizational Knowledge, 2018, Houston: APQC Knowledge Base,
www.APQC.org.
55. ISPE Cultural Excellence Report, International Society for Pharmaceutical Engineering (ISPE), Fifth Edition,
April 2017, www.ispe.org.
56. Newton, M. E., and White, C. H., “Data Quality and Data Integrity: What is the Difference?” iSpeak Blog, 15 June
2015, International Society for Pharmaceutical Engineering (ISPE), www.ispe.org.
57. Perez, A.D., Canterbury, J., Hansen, E., Samardelis, J.S., Longden, H., Rambo, R.L., “Application of the SOC2+
Process to Assessment of GxP Suppliers of IT Services,” Pharmaceutical Engineering, July/August 2019, Vol.
39, No. 4, pp. 14-20, www.ispe.org.
58. ISO/IEC 27001:2013 Information Technology -- Security Techniques -- Information Security Management
Systems -- Requirements, ISO/IEC JTC1, International Organization for Standardization (ISO), www.iso.org, and
International Electronical Commission (IEC), www.iec.ch.
59. 21 CFR Part 58 – Good Laboratory Practice for Nonclinical Laboratory Studies, Code of Federal Regulations, US
Food and Drug Administration (FDA), www.fda.gov.
60. OECD Series on Principles of Good Laboratory Practice and Compliance Monitoring, Number 15, Establishment
and Control of Archives that Operate in Compliance with the Principles of GLP, ENV/JM/MONO(2007)10,
Organisation for Economic Cooperation and Development (OECD), July 2007, www.oecd-ilibrary.org/
environment/oecd-series-on-principles-of-good-laboratory-practice-and-compliance-monitoring_2077785x.
61. 21 CFR Part 606 – Current Good Manufacturing Practice for Blood and Blood Components, Code of Federal
Regulations, US Food and Drug Administration (FDA), www.fda.gov.
62. 21 CFR Part 820 – Quality System Regulation; General, Code of Federal Regulations, US Food and Drug
Administration (FDA), www.fda.gov.
63. Commission Directive 2003/94/EC of 8 October 2003 laying down the principles and guidelines of good
manufacturing practice in respect of medicinal products for human use and investigational medicinal products for
human use, Official Journal of the European Union, www.legislation.gov.uk/eudr/2003/94/adopted.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 167
Data Integrity by Design Appendix G1
64. Directive 2001/83/EC of the European Parliament and of the Council of 6 November 2001 on the Community
Code Relating to Medicinal Products for Human Use, Official Journal L – 311, 28/11/2004, pp. 67-128, www.ema.
europa.eu/en/documents/regulatory-procedural-guideline/directive-2001/83/ec-european-parliament-council-6-
november-2001-community-code-relating-medicinal-products-human-use_en.pdf.
65. EudraLex Volume 4 – Guidelines for Good Manufacturing Practice for Medicinal Products for Human and
Veterinary Use, http://ec.europa.eu/health/documents/eudralex/vol-4/index_en.htm.
66. PIC/S Guide to Good Manufacturing Practice for Medicinal Products, PE 009-14, July 2018, Pharmaceutical
Inspection Co-operation Scheme (PIC/S), www.picscheme.org/.
67. Guidelines on Good Distribution Practice of medicinal products for human use – 2013/C 343/01, https://eur-lex.
europa.eu/LexUriServ/LexUriServ.do?uri=OJ:C:2013:343:0001:0014:EN:PDF.
68. Regulation (EU) No 536/2014 of the European Parliament and of the Council of 16 April 2014 on clinical trials on
medicinal products for human use, and repealing Directive 2001/20/EC, ec.europa.eu/health/human-use/clinical-
trials/regulation_en.
69. PIC/S Guidance: PI 011-3 Good Practices for Computerised Systems in Regulated “GXP” Environments, 25
September 2007, Pharmaceutical Inspection Co-operation Scheme (PIC/S), www.picscheme.org.
70. OECD Series on Principles of Good Laboratory Practice and Compliance Monitoring, Number 1, OECD
Principles on Good Laboratory Practice (as revised in 1997) ENV/MC/CHEM(98)17, Organisation for Economic
Cooperation and Development (OECD), January 1998, www.oecd-ilibrary.org/environment/oecd-series-on-
principles-of-good-laboratory-practice-and-compliance-monitoring_2077785x.
71. OECD Series on Principles of Good Laboratory Practice and Compliance Monitoring, Number 17, Application of
GLP Principles to Computerised Systems, ENV/JM/MONO(2016)13, Organisation for Economic Cooperation and
Development (OECD), April 2016.
72. FDA Discussion Paper: Proposed Regulatory Framework for Modifications to Artificial Intelligence/Machine
Learning (AI/ML) – Based Software as a Medical Device (SaMD), US Food and Drug Administration (FDA),
www.fda.gov. https://www.fda.gov/media/122535/download.
73. BSI and AAMI Position Paper 2019: “The emergence of artificial intelligence and machine learning algorithms in
healthcare: Recommendations to support governance and regulation,” https://www.bsigroup.com/globalassets/
localfiles/en-gb/about-bsi/nsb/innovation/mhra-ai-paper-2019.pdf.
74. ISO IEC 62304: 2006 Medical device software — Software life cycle processes, ISO/IEC JTC1, International
Organization for Standardization (ISO), www.iso.org, and International Electronical Commission (IEC), www.iec.ch.
75. Harwich, E., Laycock, K., “Thinking on its own: AI in the NHS Section 5 – Overcoming System Challenges,”
Reform, Reform Research Trust, https://reform.uk/sites/default/files/2018-11/AI%20in%20Healthcare%20report_
WEB.pdf.
76. Taulli, T., Artificial Intelligence Basics: A Non-Technical Introduction, Apress L. P., 2019. ISBN-13: 978-1-4842-
5028-0 / ISBN-13: 978-1-4842-5027-3.
77. 21 CFR Part 820.70 – Production and Process Controls, Code of Federal Regulations, US Food and Drug
Administration (FDA), www.fda.gov.
78. FDA Guidance for Industry and Staff: General Principles of Software Validation; Final Guidance for Industry and
FDA Staff, January 2002, US Food and Drug Administration (FDA), www.fda.gov.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 168 ISPE GAMP® RDI Good Practice Guide:
Appendix G1 Data Integrity by Design
79. AAMI TIR36:2007, Validation Of Software For Regulated Processes, December 2007, Association for the
Advancement of Medical Instrumentation (AAMI), www.aami.org.
80. FDA Guidance for Industry: Computerized Systems Used in Clinical Investigation, May 2007, US Food and Drug
Administration (FDA), www.fda.gov.
81. ASTM Standard E2500-13, “Standard Guide for Specification, Design, and Verification of Pharmaceutical and
Biopharmaceutical Manufacturing Systems and Equipment,” ASTM International, West Conshohocken, PA,
www.astm.org.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 169
Data Integrity by Design Appendix G2
21 Appendix G2 – Glossary
Appendix G2
21.1 Acronyms and Abbreviations
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 170 ISPE GAMP® RDI Good Practice Guide:
Appendix G2 Data Integrity by Design
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
ISPE GAMP® RDI Good Practice Guide: Page 171
Data Integrity by Design Appendix G2
21.2 Definitions
A designated secure area or facility (e.g. cabinet, room, building or computerised system) for the long term, retention
of data and metadata for the purposes of verification of the process or activity.
A method of verifying an individual’s identity based on measurement of the individual’s physical feature(s) or
repeatable action(s) where those features and/or actions are both unique to that individual and measurable.
A broad range of systems including, but not limited to, automated manufacturing equipment, automated laboratory
equipment, process control and process analytical, manufacturing execution, laboratory information management,
manufacturing resource planning, clinical trials data management, vigilance and document management systems.
The computerized system consists of the hardware, software, and network components, together with the controlled
functions and associated documentation.
Critical Thinking (ISPE GAMP® Guide: Records and Data Integrity [8])
A systematic, rational, and disciplined process of evaluating information from a variety of perspectives to yield a
balanced and well-reasoned answer.
The arrangements to ensure that data, irrespective of the format in which they are generated, are recorded,
processed, retained, and used to ensure the record throughout the data lifecycle.
Data integrity is the degree to which data are complete, consistent, accurate, trustworthy, reliable and that these
characteristics of the data are maintained throughout the data life cycle. The data should be collected and maintained
in a secure manner, so that they are attributable, legible, contemporaneously recorded, original (or a true copy) and
accurate. Assuring data integrity requires appropriate quality and risk management systems, including adherence to
sound scientific principles and good documentation practices.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
Page 172 ISPE GAMP® RDI Good Practice Guide:
Appendix G2 Data Integrity by Design
Data migration is the process of moving stored data from one durable storage location to another. This may include
changing the format of data, but not the content or meaning.
The assurance that data produced is exactly what was intended to be produced and fit for its intended purpose. This
incorporates ALCOA.
Data quality is the assurance that the data produced are generated according to applicable standards and fit for
intended purpose in regard to the meaning of the data and the context that supports it. Data quality affects the value
and overall acceptability of the data in regard to decision-making or onward use.
Data transfer is the process of transferring data between different data storage types, formats, or computerized
systems.
Any combination of text, graphics, data, audio, pictorial, or other information representation in digital form that is
created, modified, maintained, archived, retrieved, or distributed by a computer system.
A computer data compilation of any symbol or series of symbols executed, adopted, or authorized by an individual to
be the legally binding equivalent of the individual’s handwritten signature.
GxP Regulation (ISPE GAMP® Guide: Records and Data Integrity [8])
The underlying international pharmaceutical requirements, such as those set forth in the US FD&C Act, US PHS
Act, FDA regulations, EU Directives and guidelines, Japanese regulations, or other applicable national legislation or
regulations under which a company operates. These include but are not limited to:
• Good Manufacturing Practice (GMP) (pharmaceutical, including Active Pharmaceutical Ingredient (API),
veterinary, and blood)
Raw data is defined as the original record (data) which can be described as the first-capture of information, whether
recorded on paper or electronically. Information that is originally captured in a dynamic state should remain available
in that state.
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
For individual use only. © Copyright ISPE 2020. All rights reserved.
gxp 1027
600 N. Westshore Blvd., Suite 900, Tampa, Florida 33609 USA
Tel: +1-813-960-2105, Fax: +1-813-264-2816
www.ISPE.org
gxp 1027