Skip to content

Conversation

vinniefalco
Copy link

A few odds and ends upon the first reading of the code

// operations
//
// Example:
// https://github.com/vinniefalco/Beast/blob/b8e5a21bfd46d7e912c0e89bd6fcf0901b26ed0a/include/beast/websocket/stream.hpp#L274
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// https://github.com/vinniefalco/Beast/blob/b8e5a21bfd46d7e912c0e89bd6fcf0901b26ed0a/include/beast/websocket/stream.hpp#L274

// Consider `boost::string_ref` instead of `std::string`
template <typename C = std::initializer_list<string_t>>
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already have that.

    using string_t = boost::string_ref;

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was referring to the type of the cmd parameter

// What is the reason for constraining the type of S this way?
static_assert(std::is_same<protocol_type_t, boost::asio::ip::tcp>::value ||
std::is_same<protocol_type_t,
boost::asio::local::stream_protocol>::value,
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redis uses either TCP or Unix Domain Sockets as transport layer. So the restriction is to prevent using UDP-sockets for here.

Copy link
Author

@vinniefalco vinniefalco Apr 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is true, but constraining the choice of the next layer type prevents interesting use-cases. For example, lets say that someone wants to measure the bytes transferred in and out on the bredis connection. They don't need to modify your code, they could instead create an SyncStream or AsyncStream wrapper around a real TCP/IP socket which counts bytes in calls to read_some and write_some. Your static assertion would prevent this use, even though it doesn't break invariants. A more sophisticated version of this proposed wrapper could implement bandwidth control. Your users could extend the functionality of your connection objects without changing the source code.

This is beyond theoretical, Beast already comes with a very useful fail_stream wrapper:
https://github.com/vinniefalco/Beast/blob/master/extras/beast/test/fail_stream.hpp#L22

With it, you could declare bredis::AsyncConnection<beast::test::fail_stream<boost::asio::ip::tcp::socket>> in a test, and exercise all the possible failure modes to increase test coverage.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea!

namespace bredis {

// Rename `S` to `NextLayer` or `AsyncStream`
template <typename S> class AsyncConnection {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

//
// and use perfect forwarding to initiailze `socket_`
//
AsyncConnection(S &&socket)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// This file can't be included by itself,
// consider renaming it to have the .ipp extension
// to make that clear to readers.
//
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// `lowest_layer` member functions, callers can use `next_layer().cancel()` or
// `lowest_layer().cancel()` instead, allowing more choices for
// the `S` template type here.
//
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// given no opportunity to provide "back pressure",
// nor is the caller able to detect the unbounded
// growth and impose its own policies on it.
//
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vinniefalco The idea is cool, i.e. allows to get more fine-gained control.

I have tried to implement it on separate branch try-avoid-queuing, namely:

The resulting performance for 100k pings dropped from 0.27s to 2.04s. The rationale behind that is that it half-switched from pipe-lining

https://redis.io/topics/pipelining

i.e. for each "async_write" with 6-7 bytes of ping command it performs write syscall (or something like). Meanwile in the original implementation the tx_buffer grew "in infinitum", what allowed to do write syscall much less frequently.

Trying to find compromise...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I have studied your code a little more closely. Please tell me if I have it right. In this code:
https://github.com/basiliscos/cpp-bredis/blob/master/bredis/impl/async_connection.hpp#L42

You are atomically taking ownership of the entire write queue, and then in a single network call you are sending all of the requests that used to be in that queue?

If you send a ping command asynchronously in a loop, you should get the same result. This is because the Nagle algorithm will hold back sending of the TCP/IP packet until the kernel buffer has enough pings in it to form a complete packet. You just need to make sure that the Nagle algorithm can do its job. You shouldn't wait for the corresponding read to complete before sending the next ping. That's how pipelining works. So I think you can rewrite your test to accomplish this, without needing to change the implementation.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You shouldn't wait for the corresponding read to complete before sending the next ping.

https://github.com/basiliscos/cpp-bredis/blob/try-avoid-queuing/t/11-multi-ping.cpp#L62

In the new (sample) implementation i don't, i.e. next ping is pushed as soon as previous has been written.

On the other hand, I think the new async_write can be generalised to accept multiple commands with their arguments. I.e. like

push_write(command_with_args, callback);
push_write(vector<command_with_args>, callback);

it will do the exactly same, as I did in the v0.01, i.e. created one big TX-buffer and pushed it to system.

Will try to experiment with disabling Nagle algorithm, though.

Copy link
Author

@vinniefalco vinniefalco Apr 22, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't suggesting to disable Nagle's algorithm! You want Nagle's algorithm, in almost all cases it results in better performance.

If you want to create an interface that allows multiple commands, consider this signature:

template<class FwdCommandRange, ...>
void
write_range(FwdCommandRange const& list, ...);

Where FwdCommandRange is a type that meets the requirements of ForwardRange (see http://www.boost.org/doc/libs/1_63_0/libs/range/doc/html/range/concepts/forward_range.html) and has value_type convertible to your command_t (or whatever its called).

This way you are not forcing callers to only use std::vector.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a few mediataions, I have came to the solution that the plurality of single command / multiple commands shuld be pushed out of AsyncConnection into command_wrapper_t:

https://github.com/basiliscos/cpp-bredis/blob/try-avoid-queuing/include/bredis/Command.hpp

https://github.com/basiliscos/cpp-bredis/blob/try-avoid-queuing/include/bredis/AsyncConnection.hpp#L57

which is type-erased boost-variant for singe command / multiple commands. Then it became possible to one large buffer to write. The performance became comparable

./t-11-multi-ping  0.61s user 0.06s system 11% cpu 5.692 total

vs old:

./t-11-multi-ping  0.27s user 0.01s system 87% cpu 0.322 total

Actually I think this difference id is due to the fact, that in v0.01 callbacks are not allocated during main proccessing, meanwhile new read_callback is everytime the previous one is finished.

That also can be managed (or even improved), if the number of expected replies will be specified on async_read. Behind the useless synthetic tests, it might be very useful for multi/exec transactions in redis ( https://redis.io/topics/transactions ), where reply for single queued command will be useless.

what do you think?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks like your library is getting better and better by the day!

@basiliscos
Copy link
Owner

I think this is a bit outdated PR for now. Thank you again, anyway :)

@basiliscos basiliscos closed this Jun 24, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants