Add TrainingModule and SGD JNI + PTE-only Training Workflow #12247

georgehong · 2025-07-07T17:52:55Z

Summary

Adds JNI for SGD and TrainingModule, including a unit test that mirrors train.cpp for a simple XOR example. Also makes the following change:

Refactor jni_layer.cpp JTensor <--> Tensor conversion to be a general TensorHybrid utility. This is useful for TrainingModule classes that move maps of Tensors around.
Updates android_test_setup.sh to match the pushd-popd directory movement for consistency and flexibility. This is also used to fix errors with generating the XOR files.

Training dependencies are already enabled for Java JNI library, so we skip adding additional guard flags.

Test plan

Updated XOR tests that check .pte only convergence workflow.

sh scripts/build_android_library.sh
sh executorch_android/android_test_setup.sh // Creates xor.ptd, xor.pte, and xor_full.pte files.

./gradlew :executorch_android:connectedAndroidTest // Added unit test to check toy model convergence loss < 0.01

For the XOR tests, the device logs will show convergence values:

I testTrainXOR: Step 0, Loss 0.683540, Input [1, 0], Prediction 1, Label 1
...
I testTrainXOR: Step 4500, Loss 0.000994, Input [0, 0], Prediction 0, Label 0

pytorch-bot · 2025-07-07T17:52:58Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12247

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

❌ 1 New Failure, 2 Cancelled Jobs

As of commit 50f5032 with merge base defa089 ():

NEW FAILURE - The following job has failed:

pull / test-eval_llama-mmlu-linux / linux-job (gh)
RuntimeError: Command docker exec -t 0b85879e2613485818c03f658c039f13b3779189edf3b8827b3b60f39a2ea51a /exec failed with exit code 1

CANCELLED JOBS - The following jobs were cancelled. Please retry:

pull / test-models-linux (mobilebert, portable, linux.2xlarge) / linux-job (gh)
##[error]The operation was canceled.
pull / test-models-linux (mobilebert, xnnpack-quantization-delegation, linux.2xlarge) / linux-job (gh)
##[error]The operation was canceled.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

github-actions · 2025-07-07T17:53:34Z

This PR needs a `release notes:` label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:. This helps us keep track and include your important work in the next release notes.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

facebook-github-bot · 2025-07-08T15:49:57Z

@georgehong has imported this pull request. If you are a Meta employee, you can view this in D77939473.

JacobSzwejbka · 2025-07-08T17:55:19Z

extension/android/jni/jni_layer_training.cpp

+      facebook::jni::alias_ref<TensorHybrid::javaobject> jtensor);
+};
+
+class JEValue : public facebook::jni::JavaClass<JEValue> {


Hmm does this not already exist for inference?

Yes, these are just forward declarations referencing what exists in jni_layer.cpp so that it doesn't all have to be in a single file. The actual definitions are in that same file.

Could we just put them in a header?

Currently, all of the JNI files apart from jni_layer_constants.h are in implementation files. Would this be suitable for a follow-up diff, since the refactor would further increase the size of this PR as well as the possible surfaces for updating the build files?

JacobSzwejbka · 2025-07-08T17:59:43Z

extension/android/executorch_android/src/main/java/org/pytorch/executorch/SGD.java

+   * @param nesterov Whether to use Nesterov momentum
+   * @return new {@link org.pytorch.executorch.SGD} object
+   */
+  public static SGD create(


Doesnt have to be this diff but would it be more "java-y" to have builder classes?

new SGDBuilder().learning_rate().buildSGD();

Yes, that sounds good - having an SGDBuilder() sounds like a great follow-up to me.

JacobSzwejbka · 2025-07-08T18:00:49Z

extension/android/executorch_android/src/main/java/org/pytorch/executorch/TrainingModule.java

+  }
+
+  @DoNotStrip
+  private native EValue[] executeForwardBackwardNative(String methodName, EValue... inputs);


noob q. What are these "native" apis?

Each of these native methods maps to a C++ definition in the JNI jni_layer_training.cpp file.

static void registerNatives() { registerHybrid({ makeNativeMethod("initHybrid", ExecuTorchTrainingJni::initHybrid), makeNativeMethod( "executeForwardBackwardNative", ExecuTorchTrainingJni::executeForwardBackward), makeNativeMethod( "namedParametersNative", ExecuTorchTrainingJni::namedParameters), makeNativeMethod( "namedGradientsNative", ExecuTorchTrainingJni::namedGradients), }); } };

facebook-github-bot · 2025-07-08T20:02:47Z

@georgehong has imported this pull request. If you are a Meta employee, you can view this in D77939473.

JacobSzwejbka · 2025-07-08T20:44:12Z

extension/android/jni/jni_layer_training.cpp

+#include <string>
+#include <vector>
+
+#include <fbjni/ByteBuffer.h>


what happens with these fbjni bindings in oss?

Does ET have a dep on https://github.com/facebookincubator/fbjni?

I think we're already using FBJNI in jni_layer_runtime.cpp:

executorch/extension/android/jni/jni_layer_runtime.cpp

Lines 9 to 10 in ed9c4de

#include <fbjni/fbjni.h>

#include <jni.h>

and in the Gradle build:

executorch/extension/android/executorch_android/build.gradle

Lines 46 to 51 in ed9c4de

dependencies {

implementation 'com.facebook.fbjni:fbjni:0.5.1'

implementation 'com.facebook.soloader:nativeloader:0.10.5'

implementation libs.core.ktx

testImplementation 'junit:junit:4.12'

GregoryComer · 2025-07-08T23:13:26Z

Would it be possible to split out the new Buck selective JNI dependencies so they're only pulled in when needed? There are some binary-size critical users currently in Meta apps.

facebook-github-bot · 2025-07-09T07:01:39Z

@georgehong has imported this pull request. If you are a Meta employee, you can view this in D77939473.

facebook-github-bot · 2025-07-09T08:09:42Z

@georgehong has imported this pull request. If you are a Meta employee, you can view this in D77939473.

As title, adds wrappers together with unit test based on XOR train.cpp example.

Address comment on JNI binary size sensitivity. Rather than adding to the existing JNI buck targets, initially introduce a new executorch_training_jni target. Using EXECUTORCH_BUILD_TRAINING_JNI to further modularize JNI build

facebook-github-bot · 2025-07-09T18:58:34Z

@georgehong has imported this pull request. If you are a Meta employee, you can view this in D77939473.

georgehong · 2025-07-09T20:37:28Z

Split training components into separate BUCK target and added training JNI flag to make this JNI modular to address specific binary size concerns.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jul 7, 2025

georgehong force-pushed the gh/georgehong/training_jni_pte_only branch from a6e15a7 to c008409 Compare July 8, 2025 07:56

georgehong changed the title ~~Update and test JNI Training entrypoints slightly to allow for PTE-only workflows~~ [RFC] Add TrainingModule and SGD JNI + PTE-only Training Workflow Jul 8, 2025

georgehong force-pushed the gh/georgehong/training_jni_pte_only branch from c008409 to aba87ed Compare July 8, 2025 08:05

georgehong requested a review from JacobSzwejbka July 8, 2025 08:05

JacobSzwejbka reviewed Jul 8, 2025

View reviewed changes

georgehong force-pushed the gh/georgehong/training_jni_pte_only branch 3 times, most recently from d410f9d to 7557781 Compare July 8, 2025 20:02

JacobSzwejbka reviewed Jul 8, 2025

View reviewed changes

georgehong requested a review from GregoryComer July 8, 2025 20:53

georgehong changed the title ~~[RFC] Add TrainingModule and SGD JNI + PTE-only Training Workflow~~ Add TrainingModule and SGD JNI + PTE-only Training Workflow Jul 8, 2025

georgehong force-pushed the gh/georgehong/training_jni_pte_only branch from 7557781 to 5fdf9cd Compare July 9, 2025 06:57

georgehong force-pushed the gh/georgehong/training_jni_pte_only branch from 5fdf9cd to b646e29 Compare July 9, 2025 08:08

georgehong added 2 commits July 9, 2025 11:19

Add TrainingModule and SGD JNI

956124e

As title, adds wrappers together with unit test based on XOR train.cpp example.

georgehong force-pushed the gh/georgehong/training_jni_pte_only branch from b646e29 to 50f5032 Compare July 9, 2025 18:56

georgehong marked this pull request as ready for review July 9, 2025 20:35

georgehong requested review from larryliu0820 and kirklandsign as code owners July 9, 2025 20:35

georgehong requested a review from swolchok as a code owner July 9, 2025 20:35


	dependencies {
	implementation 'com.facebook.fbjni:fbjni:0.5.1'
	implementation 'com.facebook.soloader:nativeloader:0.10.5'
	implementation libs.core.ktx
	testImplementation 'junit:junit:4.12'

Add TrainingModule and SGD JNI + PTE-only Training Workflow #12247

Are you sure you want to change the base?

Add TrainingModule and SGD JNI + PTE-only Training Workflow #12247

Conversation

georgehong commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

pytorch-bot bot commented Jul 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/12247

❌ 1 New Failure, 2 Cancelled Jobs

Uh oh!

github-actions bot commented Jul 7, 2025

This PR needs a release notes: label

Uh oh!

facebook-github-bot commented Jul 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Jul 8, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

georgehong Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

GregoryComer commented Jul 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Jul 9, 2025

Uh oh!

facebook-github-bot commented Jul 9, 2025

Uh oh!

facebook-github-bot commented Jul 9, 2025

Uh oh!

georgehong commented Jul 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

georgehong commented Jul 7, 2025 •

edited

Loading

pytorch-bot bot commented Jul 7, 2025 •

edited

Loading

This PR needs a `release notes:` label

georgehong Jul 8, 2025 •

edited

Loading

GregoryComer commented Jul 8, 2025 •

edited

Loading

georgehong commented Jul 9, 2025 •

edited

Loading