rOpenHealth
diff --git a/‎paper/clinicalcodes.pdf
100755100644
-209 KB b/‎paper/clinicalcodes.pdf
100755100644
-209 KB
diff --git a/‎paper/clinicalcodes.tex
Lines changed: 11 additions & 11 deletions b/‎paper/clinicalcodes.tex
Lines changed: 11 additions & 11 deletions
diff --git a/‎paper/clinicalcodes_with_pix.pdf
297 KB b/‎paper/clinicalcodes_with_pix.pdf
297 KB
diff --git a/‎paper/collection.sty
Lines changed: 85 additions & 0 deletions b/‎paper/collection.sty
Lines changed: 85 additions & 0 deletions
diff --git a/‎paper/coverletter.pdf
18.3 KB b/‎paper/coverletter.pdf
18.3 KB
diff --git a/‎paper/coverletter.tex
Lines changed: 32 additions & 0 deletions b/‎paper/coverletter.tex
Lines changed: 32 additions & 0 deletions
diff --git a/‎paper/data/categorised_papers.xlsx
115 KB b/‎paper/data/categorised_papers.xlsx
115 KB
diff --git a/‎paper/letterbuild
Lines changed: 1 addition & 0 deletions b/‎paper/letterbuild
Lines changed: 1 addition & 0 deletions
@@ -122,13 +122,13 @@ \section*{Reporting of codes in the current literature}
 
 A large component of total EMR research is made up by primary care database (PCD) studies and UK PCDs are among the most researched in the world.  Figure \ref{figure1_articles_per_year} shows that research outputs with UK PCDs appears to be increasing at an exponential rate, while figure \ref{figure2_PCD_map} shows that research using UK PCDs is being conducted in universities, pharmaceutical companies and research hospitals around the world, and is not just limited to the UK.  As one of the largest and most important resources for EMR-based research, it seems reasonable to expect reporting of code lists in UK PCD-based studies to be at least as comprehensive as in other EMR studies.  To evaluate levels of transparency in the reporting of clinical code lists, we took a representative sample of UK PCD studies and assessed each study on its extent of reporting of the clinical codes used.
 
-We took a sample of 450 papers from the original 1359 identified from a PubMed search.  Of these, 392 (87\%) had both  the full text accessible to the University of Manchester library and were examples of primary PCD research.  Only 35 (9\% of 392) studies published the entire set of clinical codes needed to reproduce the study (usually in an online appendix), while only an additional 47 (12\% of 392) stated explicitly that the clinical codes are available upon request \ref{tab:table1_percentages}.
+We took a sample of 450 papers from the original 1359 identified from a PubMed search.  Of these, 392 (87\%) had both  the full text accessible to the University of Manchester library and were examples of primary PCD research.  Only 35 (9\% of 392) studies published the entire set of clinical codes needed to reproduce the study (usually in an online appendix), while only an additional 47 (12\% of 392) stated explicitly that the clinical codes are available upon request (table \ref{tab:table1_percentages}).
 
 
 \section*{The need for transparency in clinical code usage}
 
 
-We identify four main consequences of lack of transparency of clinical code lists.  First, if code lists are not made available or not published alongside the primary research using them, they represent an important part of a study methodology that is not subject to scrutiny or peer review. In the extreme case, there is no way of assessing the validity of the diagnosis definition used in a study and clinical decisions could be based on invalid results derived from an incorrect patient base.  This could happen despite rigorous downstream statistical analysis.  Second, the effective replication of EMR studies is dependent on the availability of the clinical codes from the original study.  If all of the codes are not available, it is impossible to tell if differences found in study replications are due to artefactual differences in code lists or if they are genuine.  Third, if code-lists are unknown, comparisons between studies addressing the same clinical question are potentially invalidated.  Condition definitions change over time and GP coding practice may also change with respect to regulations and incentives \cite{Hippisley-Cox2006}. Also, different studies may use different types of codes for a condition; some studies, for example, include medication and monitoring codes as part of their definition of a patient with diabetes (e.g. \cite{Mulnier2006}) while others do not (e.g. \cite{Kontopantelis2014}).  Not having access to code-lists means that it is difficult to know whether fair comparisons are being made between studies. Fourth, building code lists is a time consuming process; having access to historical code lists  would mean that new lists could be built incrementally and iteratively, saving much `reinvention of the wheel' while increasing consistency, and potentially accuracy, of definitions across studies.
+We identify four main consequences of lack of transparency of clinical code lists.  First, if code lists are not made available or not published alongside the primary research using them, they represent an important part of a study methodology that is not subject to scrutiny or peer review. In the extreme case, there is no way of assessing the validity of the diagnosis definition used in a study and clinical decisions could be based on invalid results derived from an incorrect patient base.  This could happen despite rigorous downstream statistical analysis.  Second, the effective replication of EMR studies is dependent on the availability of the clinical codes from the original study.  If all of the codes are not available, it is impossible to tell if differences found in study replications are due to artifactual differences in code lists or if they are genuine.  Third, if code-lists are unknown, comparisons between studies addressing the same clinical question are potentially invalidated.  Condition definitions change over time and GP coding practice may also change with respect to regulations and incentives \cite{Hippisley-Cox2006}. Also, different studies may use different types of codes for a condition; some studies, for example, include medication and monitoring codes as part of their definition of a patient with diabetes (e.g. \cite{Mulnier2006}) while others do not (e.g. \cite{Kontopantelis2014}).  Not having access to code-lists means that it is difficult to know whether fair comparisons are being made between studies. Fourth, building code lists is a time consuming process; having access to historical code lists  would mean that new lists could be built incrementally and iteratively, saving much `reinvention of the wheel' while increasing consistency, and potentially accuracy, of definitions across studies.
 
 
 
@@ -189,15 +189,15 @@ \subsection*{Database Architecture and Web Interface}
 \section*{Acknowledgments}
 We are thankful to Matt Ford for extensive technical support. Thanks to the Research team at CPRD for fruitful discussions in the development stage.
 
-\subsection*{Funding statement}
-This work is funded by the National Institute for Health Research (NIHR) School for Primary Care Research (SPCR).
+%\subsection*{Funding statement}
+%This work is funded by the National Institute for Health Research (NIHR) School for Primary Care Research (SPCR).
 
-\subsection*{Disclaimer}
-This article presents independent research funded by the National Institute for Health Research (NIHR).  The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
+%\subsection*{Disclaimer}
+%This article presents independent research funded by the National Institute for Health Research (NIHR).  The views expressed are those of the author(s) and not necessarily those of the NHS, the NIHR or the Department of Health.
 
-\section*{Author Contributions}
+%\section*{Author Contributions}
 
-Conceived, designed and built the website and software: DAS. Data collection: DAS, DR, EK, IO, RP, DA, EC.  Data Analysis: DAS. Wrote the manuscript DAS. Edited the manuscript DAS, DR, EK, IO, RP, DA, EC.
+%Conceived, designed and built the website and software: DAS. Data collection: DAS, DR, EK, IO, RP, DA, EC.  Data Analysis: DAS. Wrote the manuscript DAS. Edited the manuscript DAS, DR, EK, IO, RP, DA, EC.
 
 %\section*{References}
 % The bibtex filename
@@ -234,7 +234,7 @@ \section*{Figure Legends}
 
 \begin{figure}[!ht]
 \begin{center}
-\includegraphics[width=4in]{figure/articles_per_year.eps}
+%\includegraphics[width=4in]{figure/articles_per_year.eps}
 \end{center}
 \caption{
     {\bf Number of UK Primary Care Database publications.}
@@ -244,7 +244,7 @@ \section*{Figure Legends}
 
 \begin{figure}[!ht]
 \begin{center}
-  \includegraphics[width=6in]{figure/PCD_world.eps}
+%  \includegraphics[width=6in]{figure/PCD_world.eps}
 \end{center}
 \caption{
     {\bf Locations of primary affiliated departments.}
@@ -254,7 +254,7 @@ \section*{Figure Legends}
 
 \begin{figure}[!ht]
 \begin{center}
-  \includegraphics[width=6in]{figure/clinicalcodes_screenshot.eps}
+%  \includegraphics[width=6in]{figure/clinicalcodes_screenshot.eps}
 \end{center}
 \caption{
     {\bf Screenshot of the ClinicalCodes website showing articles with uploaded code lists.}
 
@@ -0,0 +1,85 @@
+%% start of file `collection.sty'.
+%% Copyright 2013-2013 Xavier Danaux ([email protected]).
+%
+% This work may be distributed and/or modified under the
+% conditions of the LaTeX Project Public License version 1.3c,
+% available at http://www.latex-project.org/lppl/.
+
+
+%-------------------------------------------------------------------------------
+%                identification
+%-------------------------------------------------------------------------------
+\NeedsTeXFormat{LaTeX2e}
+\ProvidesPackage{collection}[2013/03/28 v1.0.0 collections]
+
+
+%-------------------------------------------------------------------------------
+%                requirements
+%-------------------------------------------------------------------------------
+
+
+\RequirePackage{ifthen}
+
+
+%-------------------------------------------------------------------------------
+%                code
+%-------------------------------------------------------------------------------
+
+% creates a new collection
+% usage: \collectionnew{<collection name>}
+\newcommand*{\collectionnew}[1]{%
+  \newcounter{collection@#1@count}}
+
+% adds an item to a collection
+% usage: \collectionadd[<optional key>]{<collection name>}{<item to add>}
+\newcommand*{\collectionadd}[3][]{%
+  \expandafter\def\csname collection@#2@item\roman{collection@#2@count}\endcsname{#3}%
+  \if\relax\noexpand#1\relax% if #1 is empty
+    \else\expandafter\def\csname collection@#2@key\roman{collection@#2@count}\endcsname{#1}\fi%
+  \stepcounter{collection@#2@count}}
+
+% returns the number of items in a collection
+% usage: \collectioncount{<collection name>}
+\newcommand*{\collectioncount}[1]{%
+  \value{collection@#1@count}}
+
+% gets an item from a collection
+% usage: \collectiongetitem{<collection name>}{<element id>}
+% where <element id> is an integer between 0 and (collectioncount-1)
+\newcommand*{\collectiongetitem}[2]{%
+  \csname collection@#1@item\romannumeral #2\endcsname}
+
+% gets a key from a collection
+% usage: \collectiongetkey{<collection name>}{<element id>}
+% where <element id> is an integer between 0 and (collectioncount-1)
+\newcommand*{\collectiongetkey}[2]{%
+  \csname collection@#1@key\romannumeral #2\endcsname}
+
+% loops through a collection and perform the given operation on every element
+% usage: \collectionloop{<collection name>}{<operation sequence>}
+% where <operation sequence> is the code sequence to be evaluated for each collection item,
+%   code which can refer to \collectionloopid, \collectionloopkey, \collectionloopitem and
+%   \collectionloopbreak
+\newcounter{collection@iterator}
+\newcommand*{\collectionloopbreak}{\let\iterate\relax}
+\newcommand*{\collectionloop}[2]{%
+  \setcounter{collection@iterator}{0}%
+  \loop\ifnum\value{collection@iterator}<\value{collection@#1@count}%
+    \def\collectionloopid{\arabic{collection@iterator}}%
+    \def\collectionloopitem{\collectiongetitem{#1}{\collectionloopid}}%
+    \def\collectionloopkey{\collectiongetkey{#1}{\collectionloopid}}%
+    #2%
+    \stepcounter{collection@iterator}%
+    \repeat}
+
+% loops through a collection and finds the (first) element matching the given key
+% usage: \collectionfindbykey{<collection name>}{key>}
+\newcommand*{\collectionfindbykey}[2]{%
+  \collectionloop{#1}{%
+    \ifthenelse{\equal{\collectionloopkey}{#2}}{\collectionloopitem\collectionloopbreak}{}}}
+
+
+\endinput
+
+
+%% end of file `collection.cls'.
@@ -0,0 +1,32 @@
+\documentclass{letter}
+\signature{David A Springate \\ Research Fellow \\ Institute of Population Health \\ University of Manchester}
+\begin{document}
+\begin{letter}{Institute of Population Health \\ University of Manchester \\ UK}
+\opening{Dear Sirs,}
+
+We would like the editors to consider our article entitled ``ClinicalCodes: An online clinical codes repository to improve the validity and reproducibility of research using electronic medical records'' for publication in PLoS One.
+
+In this manuscript, we describe a new online database for lists of clinical codes (www.clinicalcodes.org) for use by researchers using electronic medical records (EMRs).  This resource will allow for clinical researchers to better validate electronic medical records studies, build on previous clinical code lists and compare condition definitions across studies. It will also assist health informaticians in replicating database studies, tracking changes in disease definitions or clinical coding practice through time and sharing clinical code information across platforms and data sources as research objects. 
+
+Despite accurate definitions of medical conditions being a prerequisite for valid EMR studies and these definitions depending upon careful selection of clinical codes, the publication of clinical codes is rarely, if ever, a requirement for obtaining grants, validating protocols or publishing research.  We evaluated the levels of transparency in the reporting of clinical code lists in a representative study of UK primary care database studies.  Of the 392 studies we examined, only 35 (9\%) published the entire set of clinical codes lists needed to reproduce or validate the study. These were most often published in online appendices.
+
+We identify four main consequences of lack of transparency of clinical codes lists:
+
+\begin{enumerate}
+    \item Code lists are not subject to scrutiny or peer review
+    \item It is impossible to tell if differences in found in study replications are genuine or due to artifactual differences in code lists
+    \item Comparisons between studies of the same clinical conditions are potentially invalidated
+    \item Lack of access to historical code lists leads to much wasted effort on the part of researchers
+\end{enumerate}
+
+
+The database described here will provide a centralised repository for EMR researchers to deposit their codes and this will lead to greater transparency, reproducibility and validity in this important area of research.
+
+We believe this submission fits all the PLoS ONE criteria for database papers, namely utility, validity and availability. The resource will be of great use to the EMR community and we expect the paper to be highly referenced and the ClinicalCodes database to becomes the de facto repository for clinical code lists across EMR research. The database is an effective repository for clinical code lists and we are aware no similar open repositories for clinical codes.  The database is written entirely using open source software and is freely available for access, upload and download.  In addition, we have developed open source software to access the database programmatically and to download research objects for integration with other systems.
+
+We would like to recommend Irene Petersen from UCL as an Academic Editor.
+
+\closing{Yours Faithfully,}
+\end{letter}
+\end{document}
+ 
@@ -0,0 +1 @@
+pdflatex coverletter