-
Notifications
You must be signed in to change notification settings - Fork 4
ENH: Add a script to query fMRI compliant data from OpenNeuro #89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
ENH: Add a script to query fMRI compliant data from OpenNeuro #89
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #89 +/- ##
=======================================
Coverage 71.35% 71.35%
=======================================
Files 23 23
Lines 1138 1138
Branches 139 139
=======================================
Hits 812 812
Misses 283 283
Partials 43 43 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
806c6c5
to
736e276
Compare
I revisited this. In 736e276 I split the OpenNeuro querying and BOLD feature extraction/analysis/selection. Querying OpenNeuro dataset information and querying the files of each dataset is also split into two files: I've only tested Next is to develop/refactor the necessary parts for extracting the relevant features and selecting those datasets/files complying with the criteria into different scripts. |
063bf1a
to
0bcbf15
Compare
Done in subsequent commits. I think I am done with this. Thoughts:
Once these have been addressed and the scripts do not need updates, I'll squash all commits. Additionally, as I've split all this into 4 different scripts, I think it has become apparent that most of the scripts in the |
e7c394b
to
4f9b1e9
Compare
Re: the commit message 4f9b1e9, re-run earlier commits again; 9f66a50 still incurs a deadlock; 2567252 brings |
OK, this is working. Takes like 11 hours, but it is working, as opposed to less than two hours with a headlong rush that only does
or
(the latter only solving part of the There is probably things to improve, but I think it serves the purpose and I've spent enough time on this. The retries are nice as they demonstrate that we've made efforts to get the widest possible set of cases: e.g. in the last run, all datasets were queried successfully (then getting the number of bold runs in the selection script fails for reasons related to malformed NIfTI files or other reasons, i.e. unrelated to this). I will go ahead and squash all commits into one and then fix the style issues. At that point this will be ready to be merged. Apologies the noise generated by all the commits here. The decision to transfer all these scripts to another repository (x-ref #89 (comment)) can be made once this is merged and be applied to all scripts. Edit: I see that there is some issue when returning the failed queries in |
b1fc859
to
6d580fc
Compare
6d580fc
to
abf9856
Compare
abf9856
to
eeeed67
Compare
Last force push (eeeed67) solves this. |
05a22af
to
401b43c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall. I missed having examples in the initial docstring of every file.
Considering that all these scripts are heavily related, and the shared static variables, would it make sense to aggregate all of them into a single script, and select the behavior with click
commands? Something like:
$ openneuro_query.py profile
$ openneuro_query.py datasets
$ openneuro_query.py files
$ openneuro_query.py select
I like the idea, but I'd propose to do it in a separate PR in order to have a working version at least. I know merging this clutters the repository, and in other circumstances, I'd be for adopting the approach here, but in this case I believe that I've spent too much time on this PR, so I'd rather have something merged and work from there. Also, as said, I think these scripts, together with the rest of the scripts that are not essential for the package, should dwell in a separate repository. So I'd vote for merging this as is to have a record of something that works, then transferring to another repository and converting to a |
f8ef0ca
to
e0bb868
Compare
Examples added. |
Add a series of scripts to query fMRI compliant data from OpenNeuro: - A script that queries the existing datasets. - A script that, based on the results of the previous one, queries the human (f)MRI files. - A script that, based on the results of the previous one, identifies files constituting BOLD runs and characterizes them in terms of the number of timepoints they contain. - A script that, based on the results of the previous one, selects the relevant BOLD runs for the study according to some criteria. Notes: - Querying the human (f)MRI files takes a few hours. If a 502 response is obtained from the server, the request is posted again after a delay. If unsuccessful with yet another 502 error, this is repeated a fixed number of times. - When characterizing the BOLD runs, S3 is used instead of rely on URLs. It is observed that some `urls` fields contain multiple items, and rather than picking one of them (and even if the assumption that all URLs are the same or point to the same resource), prefer to use S3. It is experimentally observed that it is twice as fast for the queries done within the context of the script.
Improve descriptions of OpenNeuro query scripts: - Improve the wording. - Add examples on how to call the scripts. - Add empty lines for easier reading. Co-authored-by: Oscar Esteban <[email protected]>
e0bb868
to
726a6ea
Compare
So, feel free to open a new repository to specifically host only these scripts. I will push the scripts to there when that is done, then work on the |
Add a script to query fMRI compliant data from OpenNeuro.