Skip to content

Read and write binary file documentation #2811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 14 commits into from
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update based off PR feedback
  • Loading branch information
jwood803 committed Mar 5, 2019
commit 3d20a995f4f53db82be05b33da566ab596c61ecf
27 changes: 15 additions & 12 deletions docs/code/MlNetCookBook.md
Original file line number Diff line number Diff line change
Expand Up @@ -1025,33 +1025,36 @@ using (var fs = File.OpenRead(modelPath))
```

## How can I read and write binary data?
Copy link
Contributor

@rogancarr rogancarr Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cookbook examples all have corresponding tests in Tests/Scenarios/Api/CookbookSamples/. Could you please add one for this too? #Resolved

Copy link
Contributor Author

@jwood803 jwood803 Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must be missing something when I try to add the example. It's not finding the CreateDataView method on the ML context. It may be missing a reference for it, but I'm not sure which one it needs. #Resolved

Copy link
Contributor Author

@jwood803 jwood803 Mar 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rogancarr I switched to LoadFromEnumerable which is what the sample is already using. Hopefully the test looks good now. #Resolved

Other than using text files ML.NET will allow you to read and write binary data.
Other than using text files, ML.NET will allow you to read and write binary data. This has a few advantages such as not having to specify a schema, can improve reading times, and are generally smaller than text files.

To write binary data you need some data to be able to save. Specifically you need an instance of an `IDavaView`. Below is a code snippet that uses the iris data as an example.

```csharp
// Data model for the iris data
public class IrisData
{
public float Label;
public float SepalLength;
public float SepalWidth;
public float PetalLength;
public float PetalWidth;
public float Label { get; set; };
public float SepalLength { get; set; };
public float SepalWidth { get; set; };
public float PetalLength { get; set; };
public float PetalWidth { get; set; };
}

// An array of iris data points
var dataArray = new[] {
new IrisData{Label=1, PetalLength=1, SepalLength=1, PetalWidth=1, SepalWidth=1},
new IrisData{Label=0, PetalLength=2, SepalLength=2, PetalWidth=2, SepalWidth=2}
var dataArray = new[]
{
new IrisData { Label=1, PetalLength=1, SepalLength=1, PetalWidth=1, SepalWidth=1 },
new IrisData { Label=0, PetalLength=2, SepalLength=2, PetalWidth=2, SepalWidth=2 }
};

// Create the ML.NET context.
var context = new MLContext();

// Create the data view.
// This method will use the definition of IrisData to understand what columns there are in the
// data view.
// This method will use the definition of IrisData to understand what columns there are
// in the data view. However, the objects in ML.NET are only "promises" of data since
// ML.NET operations are lazy. One way to get a look at the data is with Schema Comprehension.
// Refer to this document for more information - https://github.com/dotnet/machinelearning/blob/master/docs/code/SchemaComprehension.md
var data = context.CreateDataView(dataArray);
Copy link
Contributor

@rogancarr rogancarr Mar 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the language of Schema Comprehension here, as done above in "How do I look at the intermediate data?". Also can link to this doc: https://github.com/dotnet/machinelearning/blob/master/docs/code/SchemaComprehension.md #Resolved


// Use a FileStream to create a file. Use the stream and the data view in the "SaveAsBinary" method.
Expand All @@ -1061,7 +1064,7 @@ using(var stream = new FileStream("./iris.idv", FileMode.Create))
}
```

To read a binary file, simply use the `context.Data.ReadFromBinary` method and pass in the path of the binary file to read in.
To read a binary file, simply use the `context.Data.ReadFromBinary` method and pass in the path of the binary file to read in. Notice that the schema of the data does not need to be defined here.

```csharp
var data = context.Data.ReadFromBinary("./iris.idv");
Expand Down