Skip to content

Application restart breaks WebAssembly Components #1179

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lcrilly opened this issue Mar 7, 2024 · 8 comments · Fixed by #1181
Closed

Application restart breaks WebAssembly Components #1179

lcrilly opened this issue Mar 7, 2024 · 8 comments · Fixed by #1181
Assignees
Labels
T-Defect Bugs, crashes, hangs, and other problems

Comments

@lcrilly
Copy link
Contributor

lcrilly commented Mar 7, 2024

According to the documentation I should be able to restart any application by making a GET request to the appropriate /control/applications/… API endpoint.

Unit handles the rollover gracefully, allowing the old processes to deal with existing requests and starting a new set of processes (as defined by the processes option) to accept new requests.

This is not working for WebAssembly Components. The restart request is accepted but subsequent calls to the application timeout.

Here is my reproducer based on this simple hello world component.

$ unitd --no-daemon --log /dev/stderr &
[1] 10450
$ 2024/03/07 09:48:42 [warn] 10450#5044676 Unit is running unprivileged, then it cannot use arbitrary user and group.
2024/03/07 09:48:42 [info] 10450#5044676 unit 1.32.0 started
2024/03/07 09:48:42 [info] 10451#5044678 discovery started
2024/03/07 09:48:42 [notice] 10451#5044678 module: wasm 0.1 "/opt/homebrew/lib/unit/modules/wasm.unit.so"
2024/03/07 09:48:42 [notice] 10451#5044678 module: wasm-wasi-component 0.1 "/opt/homebrew/lib/unit/modules/wasm_wasi_component.unit.so"
2024/03/07 09:48:42 [info] 10450#5044676 controller started
2024/03/07 09:48:42 [notice] 10450#5044676 process 10451 exited with code 0
2024/03/07 09:48:42 [info] 10453#5044681 router started
2024/03/07 09:48:42 [info] 10453#5044681 OpenSSL 3.2.1 30 Jan 2024, 30200010

$ tee /dev/stderr < minimal.json | unitc /config
{
	"listeners": {
		"*:9000": {
			"pass": "applications/hello"
		}
	},
	"applications": {
		"hello": {
			"type": "wasm-wasi-component",
			"component": "/home/lcrilly/wasm/hello_wasi_http.wasm"
		}
	},
	"settings": {
		"http": {
			"log_route": true
		}
	}
}
2024/03/07 09:49:02 [info] 10508#5044950 "hello" prototype started
2024/03/07 09:49:02 [info] 10509#5044951 "hello" application started
{
  "success": "Reconfiguration done."
}
$ curl localhost:9000
2024/03/07 09:49:10 [notice] 10453#5044687 *15 http request line "GET / HTTP/1.1"
Hello, wasi:http/proxy world!
$ unitc /control/applications/hello/restart
2024/03/07 09:49:26 [alert] 10450#5044676 sendmsg(8, -1, -1, 1) failed (54: Connection reset by peer)
2024/03/07 09:49:26 [alert] 10509#5044951 recvmsg(6, 2) failed (9: Bad file descriptor)
{
  "success": "Ok"
}
$ curl localhost:9000
2024/03/07 09:49:33 [notice] 10453#5044689 *25 http request line "GET / HTTP/1.1"
^C
@lcrilly lcrilly added the T-Defect Bugs, crashes, hangs, and other problems label Mar 7, 2024
@lcrilly
Copy link
Contributor Author

lcrilly commented Mar 7, 2024

Update.

This may or may not be related but after configuring a wasm-wasi-component application and then replacing the configuration with a different (single) application, the original Wasm app processes (prototype and app) persist (when they should not).

@ac000
Copy link
Member

ac000 commented Mar 7, 2024

Yes, sounds like something funky going on, looking at it...

@ac000 ac000 self-assigned this Mar 7, 2024
@ac000
Copy link
Member

ac000 commented Mar 7, 2024

The processes weren't exit(2)ing..

Try this

diff --git a/src/wasm-wasi-component/src/lib.rs b/src/wasm-wasi-component/src/lib.rs
index 3ee40c4f..47c94bdc 100644
--- a/src/wasm-wasi-component/src/lib.rs
+++ b/src/wasm-wasi-component/src/lib.rs
@@ -4,6 +4,7 @@ use http_body_util::combinators::BoxBody;
 use http_body_util::{BodyExt, Full};
 use std::ffi::{CStr, CString};
 use std::mem::MaybeUninit;
+use std::process;
 use std::ptr;
 use std::sync::OnceLock;
 use tokio::sync::mpsc;
@@ -126,7 +127,7 @@ unsafe extern "C" fn start(
         bindings::nxt_unit_run(unit_ctx);
         bindings::nxt_unit_done(unit_ctx);
 
-        Ok(())
+        process::exit(0x0);
     })
 }

@lcrilly
Copy link
Contributor Author

lcrilly commented Mar 7, 2024

Thanks for looking!

I've done a cursory test with this patch and at first glance it works as expected. I can cargo component build && unitc /control/applications/…/restart with impunity.

Side note: it would be more convenient if libwasm_wasi_component.so was compiled with a _unit.so suffix so that it can be disovered without first renaming (or requiring make install to do that).

@danielledeleo
Copy link
Contributor

I've done a cursory test with this patch and at first glance it works as expected. I can cargo component build && unitc /control/applications/…/restart with impunity.

I also replicated that the fix works. 👍

Side note: it would be more convenient if libwasm_wasi_component.so was compiled with a _unit.so suffix so that it can be disovered without first renaming (or requiring make install to do that).

I believe I ran into this too. Make doesn't appear pick up on changes to lib.rs.

@ac000
Copy link
Member

ac000 commented Mar 7, 2024

I also replicated that the fix works. 👍

Thanks!, I'll add both your Tested-by:'s

Side note: it would be more convenient if libwasm_wasi_component.so was compiled with a _unit.so suffix so that it can be disovered without first renaming (or requiring make install to do that).

You have to really fight against cargo to get it named differently, we do that for macOS... I suppose we could do it for the others, but I'm not sure if there is any other practical differences between cargo build ... and cargo rust ...

NXT_OS=$(uname -s)

if [ $NXT_OS = "Darwin" ]; then
        NXT_CARGO_CMD="cargo rustc --release --manifest-path src/wasm-wasi-component/Cargo.toml -- --emit link=target/release/libwasm_wasi_component.so -C link-args='-undefined dynamic_lookup'"
else
        NXT_CARGO_CMD="cargo build --release --manifest-path src/wasm-wasi-component/Cargo.toml"
fi

I believe I ran into this too. Make doesn't appear pick up on changes to lib.rs.

Heh, yes, a slightly different issue, but I also have that fixed. PR coming shortly...

ac000 added a commit to ac000/unit that referenced this issue Mar 7, 2024
Liam reported a problem when trying to restart wasm-wasi-component based
applications using the /control/applications/APPLICATION_NAME/restart
endpoint.

The application would become unresponsive.

What was happening was the old application process(es) weren't
exit(2)ing and so while we were starting new application processes, the
old ones were still hanging around in a non-functioning state.

When we are terminating an application it must call exit(2).

So that's what we do.

We want to use the standard EXIT_SUCCESS mnemonic, which in rust is
available in external crates such as libc, however rather than bringing
in a whole nother crate just for this, we just define it locally
ourselves.

Reported-by: Liam Crilly <[email protected]>
Fixes: 20ada4b ("Wasm-wc: Core of initial Wasm component model language module support")
Closes: nginx#1179
Tested-by: Liam Crilly <[email protected]>
Tested-by: Danielle De Leo <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
@ac000 ac000 linked a pull request Mar 7, 2024 that will close this issue
ac000 added a commit to ac000/unit that referenced this issue Mar 11, 2024
Liam reported a problem when trying to restart wasm-wasi-component based
applications using the /control/applications/APPLICATION_NAME/restart
endpoint.

The application would become unresponsive.

What was happening was the old application process(es) weren't
exit(2)ing and so while we were starting new application processes, the
old ones were still hanging around in a non-functioning state.

When we are terminating an application it must call exit(2).

So that's what we do.

We want to use the standard EXIT_SUCCESS mnemonic, which in rust is
available in external crates such as libc, however rather than bringing
in a whole nother crate just for this, we just define it locally
ourselves.

Reported-by: Liam Crilly <[email protected]>
Fixes: 20ada4b ("Wasm-wc: Core of initial Wasm component model language module support")
Closes: nginx#1179
Tested-by: Liam Crilly <[email protected]>
Tested-by: Danielle De Leo <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
ac000 added a commit to ac000/unit that referenced this issue Mar 11, 2024
Liam reported a problem when trying to restart wasm-wasi-component based
applications using the /control/applications/APPLICATION_NAME/restart
endpoint.

The application would become unresponsive.

What was happening was the old application process(es) weren't
exit(2)ing and so while we were starting new application processes, the
old ones were still hanging around in a non-functioning state.

When we are terminating an application it must call exit(2).

So that's what we do.

We want to use the standard EXIT_SUCCESS mnemonic, which in rust is
available in external crates such as libc, however rather than bringing
in a whole nother crate just for this, we just define it locally
ourselves.

Reported-by: Liam Crilly <[email protected]>
Fixes: 20ada4b ("Wasm-wc: Core of initial Wasm component model language module support")
Closes: nginx#1179
Tested-by: Liam Crilly <[email protected]>
Tested-by: Danielle De Leo <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
ac000 added a commit to ac000/unit that referenced this issue Mar 12, 2024
Liam reported a problem when trying to restart wasm-wasi-component based
applications using the /control/applications/APPLICATION_NAME/restart
endpoint.

The application would become unresponsive.

What was happening was the old application process(es) weren't
exit(3)ing and so while we were starting new application processes, the
old ones were still hanging around in a non-functioning state.

When we are terminating an application it must call exit(3).

So that's what we do. We use the return value of nxt_unit_run() as the
exit status.

Reported-by: Liam Crilly <[email protected]>
Fixes: 20ada4b ("Wasm-wc: Core of initial Wasm component model language module support")
Closes: nginx#1179
Tested-by: Liam Crilly <[email protected]>
Tested-by: Danielle De Leo <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
@ac000
Copy link
Member

ac000 commented Mar 12, 2024

If either of you want to test the latest version of the patch, particularly if you have arm64 hardware... I'm really interested in if the language module builds without error.

@ac000
Copy link
Member

ac000 commented Mar 12, 2024

OK, I found an arm64 machine to test on and I suspected, it fails to build there...

ac000 added a commit to ac000/unit that referenced this issue Mar 12, 2024
Liam reported a problem when trying to restart wasm-wasi-component based
applications using the /control/applications/APPLICATION_NAME/restart
endpoint.

The application would become unresponsive.

What was happening was the old application process(es) weren't
exit(3)ing and so while we were starting new application processes, the
old ones were still hanging around in a non-functioning state.

When we are terminating an application it must call exit(3).

So that's what we do. We use the return value of nxt_unit_run() as the
exit status.

Reported-by: Liam Crilly <[email protected]>
Fixes: 20ada4b ("Wasm-wc: Core of initial Wasm component model language module support")
Closes: nginx#1179
Tested-by: Liam Crilly <[email protected]>
Tested-by: Danielle De Leo <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
ac000 added a commit to ac000/unit that referenced this issue Mar 14, 2024
Liam reported a problem when trying to restart wasm-wasi-component based
applications using the /control/applications/APPLICATION_NAME/restart
endpoint.

The application would become unresponsive.

What was happening was the old application process(es) weren't
exit(3)ing and so while we were starting new application processes, the
old ones were still hanging around in a non-functioning state.

When we are terminating an application it must call exit(3).

So that's what we do. We use the return value of nxt_unit_run() as the
exit status.

Due to exit(3)ing we also need to now explicitly handle the return on
error case.

Reported-by: Liam Crilly <[email protected]>
Fixes: 20ada4b ("Wasm-wc: Core of initial Wasm component model language module support")
Closes: nginx#1179
Tested-by: Liam Crilly <[email protected]>
Tested-by: Danielle De Leo <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
ac000 added a commit to ac000/unit that referenced this issue Mar 14, 2024
Liam reported a problem when trying to restart wasm-wasi-component based
applications using the /control/applications/APPLICATION_NAME/restart
endpoint.

The application would become unresponsive.

What was happening was the old application process(es) weren't
exit(3)ing and so while we were starting new application processes, the
old ones were still hanging around in a non-functioning state.

When we are terminating an application it must call exit(3).

So that's what we do. We use the return value of nxt_unit_run() as the
exit status.

Due to exit(3)ing we also need to now explicitly handle the return on
error case.

Reported-by: Liam Crilly <[email protected]>
Fixes: 20ada4b ("Wasm-wc: Core of initial Wasm component model language module support")
Closes: nginx#1179
Tested-by: Liam Crilly <[email protected]>
Tested-by: Danielle De Leo <[email protected]>
Co-developed-by: Dan Callahan <[email protected]>
Signed-off-by: Dan Callahan <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
@ac000 ac000 closed this as completed in e75f8d5 Mar 14, 2024
andrey-zelenkov pushed a commit that referenced this issue Mar 15, 2024
Liam reported a problem when trying to restart wasm-wasi-component based
applications using the /control/applications/APPLICATION_NAME/restart
endpoint.

The application would become unresponsive.

What was happening was the old application process(es) weren't
exit(3)ing and so while we were starting new application processes, the
old ones were still hanging around in a non-functioning state.

When we are terminating an application it must call exit(3).

So that's what we do. We use the return value of nxt_unit_run() as the
exit status.

Due to exit(3)ing we also need to now explicitly handle the return on
error case.

Reported-by: Liam Crilly <[email protected]>
Fixes: 20ada4b ("Wasm-wc: Core of initial Wasm component model language module support")
Closes: #1179
Tested-by: Liam Crilly <[email protected]>
Tested-by: Danielle De Leo <[email protected]>
Co-developed-by: Dan Callahan <[email protected]>
Signed-off-by: Dan Callahan <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
pkillarjun pushed a commit to pkillarjun/unit that referenced this issue May 29, 2024
Liam reported a problem when trying to restart wasm-wasi-component based
applications using the /control/applications/APPLICATION_NAME/restart
endpoint.

The application would become unresponsive.

What was happening was the old application process(es) weren't
exit(3)ing and so while we were starting new application processes, the
old ones were still hanging around in a non-functioning state.

When we are terminating an application it must call exit(3).

So that's what we do. We use the return value of nxt_unit_run() as the
exit status.

Due to exit(3)ing we also need to now explicitly handle the return on
error case.

Reported-by: Liam Crilly <[email protected]>
Fixes: 20ada4b ("Wasm-wc: Core of initial Wasm component model language module support")
Closes: nginx#1179
Tested-by: Liam Crilly <[email protected]>
Tested-by: Danielle De Leo <[email protected]>
Co-developed-by: Dan Callahan <[email protected]>
Signed-off-by: Dan Callahan <[email protected]>
Signed-off-by: Andrew Clayton <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-Defect Bugs, crashes, hangs, and other problems
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants