Skip to content

Commit c3087d9

Browse files
committed
Update README.md
1 parent ca1ef35 commit c3087d9

File tree

1 file changed

+28
-2
lines changed

1 file changed

+28
-2
lines changed

README.md

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -135,14 +135,40 @@ That means, when you submit your paper, the reviewers and the rest of the world
135135
the analyses from raw data all the way to final results. If you are trying to be efficient, you will likely perform
136136
some summarization/data analysis steps before the data can be considered tidy.
137137

138+
The ideal thing for you to do when performing summarization is to create a computer script (in R, Python, or something else)
139+
that takes the raw data as input and produces the tidy data you are sharing as output. You can try running your script
140+
a couple of times and see if the code produces the same output.
138141

142+
In many cases, the person who collected the data has incentive to make it tidy for a statistician to speed the process
143+
of collaboration. They may not know how to code in a scripting language. In that case, what you should provide the statistician
144+
is something called psuedocode. It should look something like:
139145

146+
1. Step 1 - take the raw file, run version 3.1.2 of summarize software with parameters a=1, b=2, c=3
147+
2. Step 2 - run the software separatly for each sample
148+
3. Step 3 - take column three of outputfile.out for each sample and that is the corresponding row in the output data set
149+
150+
You should also include information about which system (Mac/Windows/Linux) you used the software on and whether you
151+
tried it more than once to confirm it gave the same results. Ideally, you will run this by a fellow student/labmate
152+
to confirm that they can obtain the same output file you did.
140153

141154

142155

143156
What you should expect from a statistician
144157
====================
145158

146-
147-
159+
When you turn over a properly tidied data set it dramatically decreases the workload on the statistician. So hopefully
160+
they will get back to you much sooner. But most careful statisticians will check your recipe, ask questions about
161+
steps you performed, and try to confirm that they can obtain the same tidy data that you did with, at minimum, spot
162+
checks.
163+
164+
You should then expect from the statistician:
165+
1. An analysis script that performs each of the analyses (not just instructions)
166+
2. The exact computer code they used to run the analysis
167+
3. All output files/figures they generated.
168+
169+
This is the information you will use in the supplement to establish reproducibility and precision of your results. Each
170+
of the steps in the analysis should be clearly explained and you should ask questions when you don't understand
171+
what the analyst did. It is the responsibility of both the statistician and the scientist to understand the statistical
172+
analysis. You may not be able to perform the exact analyses without the statistician's code, but you should be able
173+
to explain why the statistician performed each step to a labmate/your principal investigator.
148174

0 commit comments

Comments
 (0)