Skip to content

Perform partition checks from OCF HA script #939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

binarin
Copy link
Contributor

@binarin binarin commented Aug 26, 2016

Partitioned nodes are ordered to restart by master. It may sound like
autoheal, but the problem is that OCF script and autoheal are not
compatible because concepts of master in pacemaker and winner in
autoheal are completely unrelated.

Alexey Lebedeff added 2 commits August 26, 2016 14:35
Partitioned nodes are ordered to restart by master. It may sound like
`autoheal`, but the problem is that OCF script and `autoheal` are not
compatible because concepts of master in pacemaker and winner in
autoheal are completely unrelated.
@michaelklishin
Copy link
Collaborator

@dmitrymex @bogdando waiting for a go-ahead from you :)

@michaelklishin michaelklishin self-assigned this Aug 26, 2016
node_name=$(rabbit_node_name $1)

local seen_as_running
seen_as_running=$(erl_eval "lists:member('%s', rabbit_mnesia:cluster_nodes(running))." "$node_name")
Copy link

@bogdando bogdando Aug 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we as well use this approach in the check_need_join_to()? Your change seems like the get_running_nodes() is not good :(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or may be we need update get_nodes__base()'s eval "mnesia:system_info(${infotype})." as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By approach do you mean using lists:member/2 or checking both running and partitioned states?

I've added lists:member/2 because I needed to add another check and did it just for unification with new piece of code. It's slightly cleaner IMO, but it's not worth changing it everywhere. And for partition check it's just way more harder to implement it using grep.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As for cheek_need_join_to() it got me thinking. Is the check for running state is a correct check here? Or should we check list of known nodes instead?

@bogdando
Copy link

I'm not sure which problem you solve by replacing eval "mnesia:system_info(running)." to "lists:member('%s', rabbit_mnesia:cluster_nodes(running)).". And how to test this.

ocf_log err "${LH} Failed to check whether '$node_name' is considered running by us"
# XXX Or should we give remote node benefit of a doubt?
return 1
elif [ "$seen_as_running" != true ]; then
Copy link

@bogdando bogdando Aug 26, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it "${seen_as_running}" != "true" ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's correct. I'm checking here that given node is missing from the list of running nodes.

@binarin
Copy link
Contributor Author

binarin commented Aug 26, 2016

I'm not sure which problem you solve by replacing eval "mnesia:system_info(running)." to "lists:member('%s', rabbit_mnesia:cluster_nodes(running)).". And how to test this.

This are actually equivalent. Functions return the same values and can be used interchangeably, and wrapping result in lists:member/2 allows us to get rid of grep call in script itself, so we can just compare with true or false using if statement.

@michaelklishin
Copy link
Collaborator

Is this ready to be merged?

@bogdando
Copy link

LGTM and the standard check I do with jepsen (https://github.com/bogdando/rabbitmq-cluster-ocf-vagrant) works for me

@michaelklishin michaelklishin added this to the 3.6.6 milestone Aug 29, 2016
@michaelklishin michaelklishin merged commit 3b0f276 into rabbitmq:stable Aug 29, 2016
openstack-gerrit pushed a commit to openstack-archive/fuel-library that referenced this pull request Aug 30, 2016
Partitioned nodes are ordered to restart by master. It may sound like
`autoheal`, but the problem is that OCF script and `autoheal` are not
compatible because concepts of master in pacemaker and winner in
autoheal are completely unrelated.

Upsream change: rabbitmq/rabbitmq-server#939

Change-Id: I79bc2054a0ea0f04917130779e3380777960b1d6
Closes-Bug: 1616581
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants