-
Notifications
You must be signed in to change notification settings - Fork 177
RUST-871 Serialize directly to BSON bytes in insert operations #406
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RUST-871 Serialize directly to BSON bytes in insert operations #406
Conversation
@@ -444,7 +444,7 @@ fn parse_ids(matches: ArgMatches) -> Vec<bool> { | |||
ids | |||
} | |||
|
|||
#[cfg_attr(feature = "tokio-runtime", tokio::main)] | |||
#[cfg_attr(feature = "tokio-runtime", tokio::main(flavor = "current_thread"))] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the benchmarks are single threaded, this ends up giving a nice boost by avoiding the tokio overhead of managing a thread pool.
@@ -164,128 +169,30 @@ where | |||
.ok_or_else(|| D::Error::custom(format!("could not deserialize u64 from {:?}", bson))) | |||
} | |||
|
|||
pub fn doc_size_bytes(doc: &Document) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These were no longer needed as we now just use the length of the serialized vec.
Some(val) => val, | ||
None => break, | ||
}; | ||
fn num_decimal_digits(mut n: usize) -> u64 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the old implementation of this method was incorrect actually so I rewrote it.
/// If the BSON is not properly formatted, an internal error would be returned. | ||
/// | ||
/// TODO: RUST-924 replace this with raw document API usage. | ||
pub(crate) fn raw_get(doc: &[u8], key: &str) -> Result<Option<Bson>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this function needs to live in the driver for now as we need to be able to comb through the serialized command for _id
values, but it can go away shortly once RUST-924 is complete.
Error::from(ErrorKind::Internal { | ||
message: "failed to read from serialized document".to_string(), | ||
}) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documents queried here should always be valid as they're coming straight out of our serializer, so any error encountered here is mapped to an internal error.
f9597f7
to
e2db5d7
Compare
|
||
#[derive(Debug)] | ||
pub(crate) struct Insert<T> { | ||
pub(crate) struct Insert<'a, T> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Previously this T
was always actually a reference. This now makes it required to be a reference, which allows us to clone the input.
|
||
let doc_size = bson_util::array_entry_size_bytes(i, &doc); | ||
let mut doc = bson::to_vec(d)?; | ||
let id = match bson_util::raw_get(doc.as_slice(), "_id")? { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we can't inspect the T
for an _id
without requiring it implement a trait, we scan through the raw document for it instead.
Some(b) => b, | ||
None => { | ||
let oid = ObjectId::new(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If it's not in there, we write the new element into the beginning of the document. Once we have the raw document API we could update this to use it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a TODO comment for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Ok(Command::new("insert".to_string(), self.ns.db.clone(), body)) | ||
} | ||
|
||
fn serialize_command(&mut self, cmd: Command<Self::Command>) -> Result<Vec<u8>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since we already have raw BSON for the documents, we can't just use serde to serialize it, as it'll attempt to do so as a a Binary
. Instead, we serialize the rest of the command and manually append the serialized documents. Again, this will be easier with the raw document API.
type Response = CommandResponse<WriteConcernOnlyBody>; | ||
|
||
const NAME: &'static str = "abortTransaction"; | ||
|
||
fn build(&mut self, _description: &StreamDescription) -> Result<Command> { | ||
fn build(&mut self, _description: &StreamDescription) -> Result<Command<Self::Command>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the addition of Self::Command
here is necessary because the default T
for Command
is already Document
-- can we revert this change or change the rest of the operations whose build
signature wasn't updated (e.g. ListCollections
) for consistency?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done.
Some(b) => b, | ||
None => { | ||
let oid = ObjectId::new(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a TODO comment for this?
pub server_api: Option<ServerApi>, | ||
} | ||
|
||
fn default_uri() -> String { | ||
DEFAULT_URI.clone() | ||
} | ||
|
||
pub fn deserialize_server_api<'de, D>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this custom implementation needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The server expects the fields to be named apiVersion
, apiStrict
, and apiDeprecationErrors
, but the test format drops all of the api-
prefixes. Now that we're using the Serialize
implementation for ServerApi
in the driver, we needed to introduce a custom one for deserializing from the test format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is only used within this test file, but for clarity can we rename this serialization function? Maybe deserialize_server_api_for_testing
or deserialize_server_api_test_format
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah in that case I think you should be able to change the rename
attributes on the fields in ServerApi
to #[serde(rename(serialize = "apiVersion"))]
so that the renames only apply to serialization, and then add #[serde(rename_all = "camelCase", deny_unknown_fields)]
to the top of the struct for deserialization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh cool, I didn't know that was possible. Unfortunately, it looks like we'll still need the separate implementation though, since deny_unknown_fields
can't be used with flatten
(see https://serde.rs/attr-flatten.html). flatten
is needed with server API since they're all top level fields of commands rather than in a subdocument.
I did update the fuinction to be more precisely named (went with the test_format
suggestion)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! Only a few small things from me.
|
||
let serialized = op.serialize_command(cmd)?; | ||
let raw_cmd = RawCommand { | ||
name: cmd_name.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this is the only place cmd_name
is being used, so this seems like a spurious clone
to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too, but it's actually also used below when emitting the command succeeded / failed event.
pub server_api: Option<ServerApi>, | ||
} | ||
|
||
fn default_uri() -> String { | ||
DEFAULT_URI.clone() | ||
} | ||
|
||
pub fn deserialize_server_api<'de, D>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is only used within this test file, but for clarity can we rename this serialization function? Maybe deserialize_server_api_for_testing
or deserialize_server_api_test_format
?
RUST-871
This PR leverages the
bson::to_vec
function that was introduced in mongodb/bson-rust#279 to greatly increase the performance of inserts. It also introduces the framework for updating other operations to skip theDocument
step during serialization, though that is left as future work for once the raw document API is introduced.