-
Notifications
You must be signed in to change notification settings - Fork 75
Bug/220 spice makes bsk sims not thread safe #994
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
4a1af4d
to
aeed048
Compare
src/simulation/environment/spiceInterface/_UnitTest/test_spiceThreadSafety.py
Outdated
Show resolved
Hide resolved
def2f18
to
8f6bf30
Compare
8f6bf30
to
a9ef12b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work! All tests passed on my system. Just some minor comments to address
import sys | ||
import time | ||
import hashlib | ||
import random |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
random is not used, please delete this import
import multiprocessing as mp | ||
import pytest | ||
import inspect | ||
import numpy as np |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
numpy is not used, please delete this import
import inspect | ||
import numpy as np | ||
import traceback | ||
import signal |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signal is imported, if needed, later on. We can delete this import.
# Import Basilisk modules | ||
from Basilisk import __path__ | ||
from Basilisk.simulation import spiceInterface | ||
from Basilisk.utilities import unitTestSupport |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are not using unitTestSupport, we can delete this import
=========================================== | ||
|
||
This script tests the thread safety of the SPICE interface, specifically addressing | ||
GitHub issue #220 where parallel simulations using SPICE were causing deadlocks and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make this part of the documentation a hyperlink to issue 220?
Nice work, looks like we are there. @juan-g-bonilla , as you commented on the original issue with the suggestion to use a mutex, would you have time to take a look at this code? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nicely done @Natsoulas ! Many of my comments are actually encouraging you to fix existing issues or even the original fix I proposed so we can get this class to its highest quality possible.
Thanks for the tag @schaubh .
static std::mutex kernel_manipulation_mutex; //!< -- Mutex to protect SPICE kernel loading/unloading operations | ||
static std::map<std::string, int> kernel_reference_counter; //!< -- Reference counter for SPICE kernels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should probably make these variables camelCase. (sorry I originally suggested snake_case!).
static std::mutex kernel_manipulation_mutex; //!< -- Mutex to protect SPICE kernel loading/unloading operations | ||
static std::map<std::string, int> kernel_reference_counter; //!< -- Reference counter for SPICE kernels |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::unordered_map
is probably better here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, variables should be lower camel case.
@@ -40,7 +41,7 @@ class SpiceInterface: public SysModel { | |||
public: | |||
SpiceInterface(); | |||
~SpiceInterface(); | |||
|
|||
void UpdateState(uint64_t CurrentSimNanos); | |||
int loadSpiceKernel(char *kernelName, const char *dataPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should change these methods into:
int loadSpiceKernel(std::string_view kernelName, std::string_view dataPath);
int unloadSpiceKernel(std::string_view kernelName, std::string_view dataPath);
or at least std::string
. There's no reason why we should be passing char*
.
This will also save you a lot of headache on the implementation (see later comment).
// Create vector instead of variable-length array for Windows compatibility | ||
std::vector<char> kernelNameVec(this->charBufferSize); | ||
char* kernelName = kernelNameVec.data(); | ||
|
||
// Unload the SPICE kernels in reverse order of loading | ||
strcpy(kernelName, "de430.bsp"); | ||
unloadSpiceKernel(kernelName, this->SPICEDataPath.c_str()); | ||
|
||
strcpy(kernelName, "de-403-masses.tpc"); | ||
unloadSpiceKernel(kernelName, this->SPICEDataPath.c_str()); | ||
|
||
strcpy(kernelName, "pck00010.tpc"); | ||
unloadSpiceKernel(kernelName, this->SPICEDataPath.c_str()); | ||
|
||
strcpy(kernelName, "naif0012.tls"); | ||
unloadSpiceKernel(kernelName, this->SPICEDataPath.c_str()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you change the signature of unloadSpiceKernel
(see previous comment), you'll be able to call:
unloadSpiceKernel("de430.bsp", this->SPICEDataPath);
std::vector<char> kernelNameVec(this->charBufferSize); | ||
char* kernelName = kernelNameVec.data(); | ||
|
||
// Load the required SPICE kernels - they will only be loaded once per kernel | ||
// across all threads due to our reference counting mechanism | ||
strcpy(kernelName, "naif0012.tls"); | ||
if(loadSpiceKernel(kernelName, this->SPICEDataPath.c_str())) { | ||
bskLogger.bskLog(BSK_ERROR, "Unable to load %s", kernelName); | ||
} | ||
if(loadSpiceKernel((char *)"pck00010.tpc", this->SPICEDataPath.c_str())) { | ||
bskLogger.bskLog(BSK_ERROR, "Unable to load %s", "pck00010.tpc"); | ||
|
||
strcpy(kernelName, "pck00010.tpc"); | ||
if(loadSpiceKernel(kernelName, this->SPICEDataPath.c_str())) { | ||
bskLogger.bskLog(BSK_ERROR, "Unable to load %s", kernelName); | ||
} | ||
if(loadSpiceKernel((char *)"de-403-masses.tpc", this->SPICEDataPath.c_str())) { | ||
bskLogger.bskLog(BSK_ERROR, "Unable to load %s", "de-403-masses.tpc"); | ||
|
||
strcpy(kernelName, "de-403-masses.tpc"); | ||
if(loadSpiceKernel(kernelName, this->SPICEDataPath.c_str())) { | ||
bskLogger.bskLog(BSK_ERROR, "Unable to load %s", kernelName); | ||
} | ||
if(loadSpiceKernel((char *)"de430.bsp", this->SPICEDataPath.c_str())) { | ||
bskLogger.bskLog(BSK_ERROR, "Unable to load %s", "de430.tpc"); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might want to refactor this into:
auto kernelNames = {
"naif0012.tls", "pck00010.tpc", "de-403-masses.tpc", "de430.bsp"
};
for (auto kernelName : kernelNames)
{
if(loadSpiceKernel(kernelName, this->SPICEDataPath)) {
bskLogger.bskLog(BSK_ERROR, "Unable to load %s", kernelName);
}
}
actually a similar refactor is also possible above for unloadSpiceKernels
. You might even make the kernelNames
a class-wide static variable so we don't need to update both loading and unloading methods to keep the kernels in sync.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, this is a cleaner solution.
// Create vectors instead of raw pointers for Windows compatibility | ||
std::vector<char> fileNameVec(this->charBufferSize); | ||
char* fileName = fileNameVec.data(); | ||
|
||
std::vector<char> nameVec(this->charBufferSize); | ||
char* name = nameVec.data(); | ||
|
||
//! - The required calls come from the SPICE documentation. | ||
//! - The most critical call is furnsh_c | ||
strcpy(name, "REPORT"); | ||
erract_c("SET", this->charBufferSize, name); | ||
// Create the full filepath | ||
strcpy(fileName, dataPath); | ||
strcat(fileName, kernelName); | ||
furnsh_c(fileName); | ||
|
||
//! - Check to see if we had trouble loading a kernel and alert user if so | ||
strcpy(name, "DEFAULT"); | ||
erract_c("SET", this->charBufferSize, name); | ||
delete[] fileName; | ||
delete[] name; | ||
if(failed_c()) { | ||
return 1; | ||
std::string filepath(fileName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you refactor the inputs into std::string_view
, then:
std::string filepath = std::string{dataPath} + std::string{kernelName);
if (kernel_reference_counter.at(filepath) <= 0) { | ||
// The required calls come from the SPICE documentation. | ||
// The most critical call is furnsh_c | ||
strcpy(name, "REPORT"); | ||
erract_c("SET", this->charBufferSize, name); | ||
furnsh_c(fileName); | ||
|
||
// Check to see if we had trouble loading a kernel | ||
strcpy(name, "DEFAULT"); | ||
erract_c("SET", this->charBufferSize, name); | ||
|
||
if(failed_c()) { | ||
return 1; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice job fetching the error from SPICE.
std::vector<char> fileNameVec(this->charBufferSize); | ||
char* fileName = fileNameVec.data(); | ||
|
||
//! - The required calls come from the SPICE documentation. | ||
//! - The most critical call is furnsh_c | ||
strcpy(name, "REPORT"); | ||
erract_c("SET", this->charBufferSize, name); | ||
std::vector<char> nameVec(this->charBufferSize); | ||
char* name = nameVec.data(); | ||
|
||
// Create the full filepath | ||
strcpy(fileName, dataPath); | ||
strcat(fileName, kernelName); | ||
unload_c(fileName); | ||
delete[] fileName; | ||
delete[] name; | ||
if(failed_c()) { | ||
return 1; | ||
std::string filepath(fileName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment as above.
std::vector<char> spiceOutputBufferVec(allowedOutputLength); | ||
char* spiceOutputBuffer = spiceOutputBufferVec.data(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand you're using std::vector
to handle the memory of the underlying char*
. This might just be personal preference, but I think:
std::unique_ptr<char[]> spiceOutputBuffer(new char[allowedOutputLength]);
is a bit more expressive and safe. Just a thought.
Maybe actually we should make allowedOutputLength
a constexpr size_t
so that we can simply do:
char spiceOutputBuffer[allowedOutputLength];
and thus we get rid of nasty dynamically allocated memory.
@@ -0,0 +1,339 @@ | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very comprehensive test, nicely done.
Just for own curiosity, have you run this test without your fix? We would want it to fail on that case, so that we're sure the test is actually correctly stressing the system. I remember this error is infamously tricky to reproduce, so we want to make sure this test is actually testing as we need.
Description
The changes implement thread safety for the SPICE interface to address GitHub issue #220, which reported deadlocks and data corruption when running parallel Monte Carlo simulations.
This solution uses a mutex and reference counting approach to protect SPICE kernel loading and unloading operations (inspired by Juan's suggestion comment on the issue):
kernel_manipulation_mutex
in the SpiceInterface classkernel_reference_counter
to track kernel usageloadSpiceKernel()
to only load kernels once and use reference countingunloadSpiceKernel()
to only unload kernels when no longer neededclearKeeper()
with the mutexThis ensures that when multiple simulations run in parallel, they won't encounter race conditions or deadlocks when accessing SPICE kernels.
Verification
The changes were verified with a dedicated thread safety unit test:
test_spiceThreadSafety.py
unit test in the SPICE interface's_UnitTest
directoryThe test specifically reproduces the conditions described in issue #220.
Documentation
Added code comments explaining the thread safety mechanism in:
spiceInterface.h
for the static mutex and reference counterspiceInterface.cpp
for the mutex implementation and kernel reference counting logicNo existing documentation was invalidated.
Future work