TuriCreate: Human Activity Classifier Model Deployment and result on unseen test dataset

Hi,

I've gathered approximately 5 GB of time series data from my Apple Watch, categorizing activities into two classes with proper labeling (such as walking or not). 
The model was trained successfully, and the attached photo shows the best model's results. 
However, upon testing with an unseen dataset, I find the performance unsatisfactory.

Despite achieving high accuracy, F1, precision, and recall values during training, deploying the model for real-time activity classification on the Apple Watch yields disappointing results. 
I only see log_loss couldn't see valid/train loss graph.

While I understand my dataset might not be considered real Big Data, I believe that 5 GB of data should suffice for distinguishing between two activities, such as walking or not. 


I think metrics like accuracy, recall, precision, and F1 aren't adequate to take this model in practical deployment. How do I handle this challenge to build a trustful model?

How can I ensure the reliability of my model in such a scenario?

Ps: Tucreate: Training, validation, and testing pipeline was done considering human activity documents. 
Subsequently, the model deployment encountered unsatisfactory results on the watch app.

![Screenshot 2023-11-30 at 1 55 42 PM](https://github.com/apple/turicreate/assets/5468765/6e594fc1-278a-478b-8d68-a228e26f221e)



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

TuriCreate: Human Activity Classifier Model Deployment and result on unseen test dataset #3482

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

TuriCreate: Human Activity Classifier Model Deployment and result on unseen test dataset #3482

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions