Skip to content

DateTimeTransformer featurizer #4521

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 17, 2019

Conversation

michaelgsharp
Copy link
Member

This change adds in the DateTimeTransformer into the new Featurizers project. It is the first of a series of PR's that will go in. The DateTimeTransformer is implemented in native code, so this is mostly just a wrapper around that with the appropriate entrypoints for NimbusML as well. Since all the new estimator/transformers follow the same patterns, once this one is reviewed and checked in I will create the other PR's.

@michaelgsharp michaelgsharp requested a review from a team December 4, 2019 18:11
@michaelgsharp michaelgsharp self-assigned this Dec 4, 2019
Year = 1, Month, Day, Hour, Minute, Second, AmPm, Hour12, DayOfWeek, DayOfQuarter, DayOfYear,
WeekOfMonth, QuarterOfYear, HalfOfYear, WeekIso, YearIso, MonthLabel, AmPmLabel, DayOfWeekLabel,
HolidayName, IsPaidTimeOff
};
Copy link
Contributor

@justinormont justinormont Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend explicitly numbering each enum name. This lets us insert new values later without affecting the enum numeric value.

This is more important if the enum is serialized within the model, which I expect you're doing since the model needs to know which features to produce. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. They are now explicitly numbered. The Countries enum (now HolidayList) is also numbered now.


In reply to: 353910235 [](ancestors = 353910235)

/// <param name="catalog">Transform catalog</param>
/// <param name="inputColumnName">Input column name</param>
/// <param name="columnPrefix">Prefix to add to the generated columns</param>
/// <param name="columnsToDrop">List of columns to drop, if any</param>
Copy link
Member

@eerhardt eerhardt Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One drawback to using columnsToDrop is that if a user is only interested in the DayOfYear and DayOfWeekLabel, they need to explicitly exclude all the rest. And then if we ever add a new output column in a future release, they would start getting it, and would need to add the new column to the "to drop" list. #Resolved

Copy link
Contributor

@justinormont justinormont Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I might recommend a bitwise use of an enum (flags) to say which columns to include.

Default would be DateTimeTransformerEstimator.ColumnsProduced.All = 0xFFFFFFFFFFFFFFFF (uint64).

Then users can bitwise or/and-not to select the columns they want to keep/drop.

Example usage

All output except DayOfYear: (subtractive column selection)

var pipeline = mlContext.Transforms.DateTimeTransformer("Date", "DTC", 
    DateTimeTransformerEstimator.ColumnsProduced.All & ~DateTimeTransformerEstimator.ColumnsProduced.DayOfYear)

Only DayOfYear and DayOfWeekLabel: (additive column selection)

var pipeline = mlContext.Transforms.DateTimeTransformer("Date", "DTC", 
    DateTimeTransformerEstimator.ColumnsProduced.DayOfYear | DateTimeTransformerEstimator.ColumnsProduced.DayOfWeekLabel)

#Resolved

Copy link
Contributor

@justinormont justinormont Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently, there's a [Flags] attribute in C# -- https://docs.microsoft.com/en-us/dotnet/api/system.flagsattribute?view=netcore-3.0

[Flags] public enum ColumnsProduced : uint64
{
    Year = 1 << 0,
    Month = 1 << 1,
    Day = 1 << 2,
    Hour = 1 << 3, 
    WeekOfMonth = 1 << 4,
    ...
}
             

#Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a way you think would be better? We had also thought about not even including that ability here and just letting the user use the DropColumns transformer to remove any columns they didn't want. That would keep this code cleaner and not duplicate that type of functionality. Thoughts on that?


In reply to: 353911746 [](ancestors = 353911746)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont think that doing flags is a great idea personally. I think enums are more clear (if more verbose) than flags usually. As I mentioned to Eric in this same thread though, we can remove this capability from here and just let the user use the DropColumn transformer to remove anything they dont want. What is your opinion on that?


In reply to: 353968394 [](ancestors = 353968394)

Copy link
Member

@eerhardt eerhardt Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expect this featurizer isn't aware of which columns are being requested of it

Maybe it should be made aware? #Resolved

Copy link
Contributor

@justinormont justinormont Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the usability of the flag method.

There is a user need of not including all outputs as overfitting occurs, mainly on the larger time scales, like year number. Including the year in a classification or regression model often leads to the model memorizing the label value specifically for a date range.

For instance the model will memorize that 2017-03-01 to 2018-01-05, the label value was high. This information is not valuable to memorize when predicting future data. Instead we want to ensure it memorizes that traffic volumes (the label) are higher on weekends (extracting useful patterns).

Hence it is nice to encourage users (as part of the timedate featurizer API itself) to think about which columns to include/exclude. This is as opposed to a user needing a separate DropColumns step in their pipeline. #Resolved

Copy link
Member Author

@michaelgsharp michaelgsharp Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So so far we have had 3 suggestions.
1 - Dont support dropping columns as a native part of this transformer.
2 - Use Enums.
3 - Use Flags.

Can we finalize on the decision? I personally agree with Harish that since we already have a drop columns transformer that we can just use that and not have that functionality be present here. #Resolved

Copy link
Contributor

@justinormont justinormont Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd recommend a way to proactively select the output columns to create during the setup of the transform. Instead of dropping later, just don't create them.

Generally, we should assume users won't think (or know) to drop later. For usability, placing it in the constructor exposes users to the idea, and forces them to consider which columns will be beneficial. #Resolved

Copy link
Member Author

@michaelgsharp michaelgsharp Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, since this functionality already exists inside of ML.NET as a separate transformer we will be leaving this as is. We can revisit this later as the need arises. Even if this transformer produces all the columns, if no downstream transformers request them then they will never be realized. It shouldn't cause any performance impact this way. #Resolved

HolidayName, IsPaidTimeOff
};

public enum Countries : byte
Copy link
Contributor

@justinormont justinormont Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May want a UINT given that there are ~200 current countries (and they keep changing), and in the future we may a more specific break down, like UnitedStatesBankingHolidays vs. UnitedStatesFederalHolidays

Suggested change
public enum Countries : byte
public enum Countries : uint
``` #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a good point. I have changed the enum to be uint as suggested.


In reply to: 353916304 [](ancestors = 353916304)

public ColumnsProduced[] ColumnsToDrop;

[Argument(ArgumentType.AtMostOnce, HelpText = "Country to get holidays for. Defaults to none if not passed", Name = "Country", ShortName = "ctry", SortOrder = 4)]
public Countries Country = Countries.None;
Copy link
Contributor

@justinormont justinormont Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Countries may be a bit restrictive of a term for the future. Perhaps name as HolidayList.

Perhaps in the future we'll want UnitedStatesCaliforniaEducationHolidays (see: list). We may also want more specific lists, like UnitedStatesMajorHolidays. Or localized lists like GermanyBavaria (see: list).

Or perhaps in the future we'll want to version the holiday list Germany2019 (as holidays are declared and dropped). Versioning helps old models run in current versions of ML.NET. #Resolved

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. I have renamed it to HolidayList, but we can revisit it if needed in the future as well.


In reply to: 353920532 [](ancestors = 353920532)

public int DTCYearIso { get; set; }
public string DTCMonthLabel { get; set; }
public string DTCAmPmLabel { get; set; }
public string DTCDayOfWeekLabel { get; set; }
Copy link
Contributor

@justinormont justinormont Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be needed to be added to the code being wrapped here, but when featurizing DateTime fields, I prefer polar (Sin/Cos) transforms (along with the others listed).

Clocks and times/dates are cyclic. The current feature set doesn't include this fact.

The main gain of polar date transforms is that it places times like 11:50PM and 12:10AM right next to each other (both mapping to ~0.999 in the cosine transform of percent of day). This lets a model make a decision boundary at CosTimeOfDay > 0.9 to treat these two times as very similar.

Contrast this to the DTCHour feature which will map these to 23 and 0 (issue: non-continuous range, and not possible to group both the 0 and 23 in a single split point, which to a human is similar).

Info:

Code example I use for DateTime feature engineering:

DateTime dt;
var invalidDate = !DateTime.TryParse(input.Date, out dt);
var percentOfDay = (invalidDate ? Single.NaN : (float)(new TimeSpan(dt.Hour, dt.Minute, 0).TotalMinutes) / 1440 f);
var percentOfWeek = (invalidDate ? Single.NaN : ((float) dt.DayOfWeek + percentOfDay) / 7 f);
var percentOfMonth = (invalidDate ? Single.NaN : (float)(dt.Day + percentOfDay) / (float) DateTime.DaysInMonth(dt.Year, dt.Month));
var percentOfYear = (invalidDate ? Single.NaN : (float)((dt.DayOfYear + percentOfDay) / (new DateTime(dt.Year + 1, 1, 1) - new DateTime(dt.Year, 1, 1)).TotalDays));

// Time features like the current ones
output.DateWeekday = (invalidDate ? Single.NaN : (float) dt.DayOfWeek);
output.DateMonth = (invalidDate ? Single.NaN : (float) dt.Month);
output.DateHour = (invalidDate ? Single.NaN : dt.Hour);
output.DatePartOfDay = (invalidDate ? Single.NaN : (float) Math.Round(dt.AddHours(-3).Hour / 6 f));
output.InvalidDate = (invalidDate ? 1.0f : 0.0f);

// Polar date/time transforms for percent of day & week & year 
output.CosTimeOfDay = (float) Math.Cos(2 * Math.PI * percentOfDay);
output.SinTimeOfDay = (float) Math.Sin(2 * Math.PI * percentOfDay);
output.CosTimeOfWeek = (float) Math.Cos(2 * Math.PI * percentOfWeek);
output.SinTimeOfWeek = (float) Math.Sin(2 * Math.PI * percentOfWeek);
output.CosTimeOfMonth = (float) Math.Cos(2 * Math.PI * percentOfMonth);
output.SinTimeOfMonth = (float) Math.Sin(2 * Math.PI * percentOfMonth);
output.CosTimeOfYear = (float) Math.Cos(2 * Math.PI * percentOfYear);
output.SinTimeOfYear = (float) Math.Sin(2 * Math.PI * percentOfYear);
``` #Resolved

Copy link
Contributor

@justinormont justinormont Dec 4, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/cc @davidbrownellWork #Resolved

Copy link

@davidbrownellWork davidbrownellWork Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's add this as a potential feature for Mn. #Resolved

Copy link
Contributor

@gvashishtha gvashishtha Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean add to the ML.NET plan? Any estimate on how much additional dev time this would take? #Resolved

Copy link
Contributor

@justinormont justinormont Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the new features is pretty simple. I expect more of the work is in testing. #Resolved

{
ValueGetter<T> result = (ref T dst) =>
{
long dateTime = default;
Copy link
Member

@eerhardt eerhardt Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the input dateTime column a long and not a DateTime? #Resolved

Copy link
Member Author

@michaelgsharp michaelgsharp Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long to match a unix timestamp. We will be adding in the ability to use C# DateTime and string in an ISO format in the future, but the initial implementation is just the unix timestamp. #Resolved

Copy link
Member

@eerhardt eerhardt Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My concern here is that .NET developers expect to have their date time data using System.DateTime. That is the natural type in .NET. Forcing the user to convert from System.DateTime to a long unix timestamp doesn't seem like a good experience. It would make more sense if the input column was a System.DateTime. #Resolved

Copy link
Member

@eerhardt eerhardt Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not do the conversion yourself here? The user's input column can be DateTime, and then you convert it to a long unix timestamp before passing it to the C++ code? #Resolved

Copy link

@davidbrownellWork davidbrownellWork Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to ensure that the process of converting from a string to "Date time" is consistent across frameworks - especially considering that a DateTime in and of itself isn't sufficient in some scenarios, as we lose (potentially important) time zone information in the process. #Resolved

Copy link
Member

@eerhardt eerhardt Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not asking for string. I'm asking for the ML.NET data type that a user would feed into this transform to be the canonical System.DateTime type and not a long type.

Specifically, this transform should require the input column to be of this type:

/// <summary>
/// The standard date time type. This has representation type of <see cref="DateTime"/>.
/// Note this can have only one possible value, accessible by the singleton static property <see cref="Instance"/>.
/// </summary>
public sealed class DateTimeDataViewType : PrimitiveDataViewType

It doesn't make sense for a user to have to convert a DateTime to a long before calling this transform. #Resolved

Copy link
Member

@eerhardt eerhardt Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specifically, look at the sample code being checked in under samples:

            // Create a small dataset as an IEnumerable.
            // Future Date - 2025 June 30
            var samples = new[] { new DateTimeInput() { Date = 1751241600 } };

Instead, that sample code should be:

            // Create a small dataset as an IEnumerable.
            var samples = new[] { new DateTimeInput() { Date = new DateTime(2025, 6, 30) } };
``` #Resolved

Copy link

@davidbrownellWork davidbrownellWork Dec 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed - eventually a C# developer will be able to pass System:DateTime, string, or long. #Resolved

Copy link
Contributor

@natke natke Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree on using System.DateTime from the beginning. This will make the transform consistent with other parts of the API, as well as more usable to .NET developers #Resolved

Copy link
Member Author

@michaelgsharp michaelgsharp Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is on our todo list and will be added in in the future. We will eventually be supporting DateTime and string as well, but initial support will just be for long. #Resolved

ctx.Writer.Write(data);
}

public void Dispose()
Copy link
Member

@eerhardt eerhardt Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Who calls this Dispose method? We have a problem with tranformers that need to be disposed. See #906 and my comment on #4223

cc @codemzs #Resolved

Copy link
Member Author

@michaelgsharp michaelgsharp Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just be called automatically when the pipeline teardown happens as mentioned in that comment chain. I am not manually calling it anywhere. Is it happening during pipeline teardown ok? Or is there more desirable behavior? #Resolved

Copy link
Member

@eerhardt eerhardt Dec 5, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should just be called automatically when the pipeline teardown happens as mentioned in that comment chain.

There is no call of it when the pipeline teardown happens. That is why that bug was opened. I'm still not certain on why it was closed. Maybe @codemzs can comment. #Resolved

@michaelgsharp
Copy link
Member Author

michaelgsharp commented Dec 9, 2019

/azp run #Resolved

@azure-pipelines
Copy link

azure-pipelines bot commented Dec 9, 2019

Azure Pipelines successfully started running 2 pipeline(s).

#Resolved

@michaelgsharp
Copy link
Member Author

michaelgsharp commented Dec 9, 2019

Going to close and reopen the PR to reset the stuck builds. #Resolved

@codecov
Copy link

codecov bot commented Dec 10, 2019

Codecov Report

Merging #4521 into master will increase coverage by 0.03%.
The diff coverage is 88.11%.

@@            Coverage Diff             @@
##           master    #4521      +/-   ##
==========================================
+ Coverage   75.11%   75.15%   +0.03%     
==========================================
  Files         908      913       +5     
  Lines      160204   160891     +687     
  Branches    17254    17292      +38     
==========================================
+ Hits       120340   120911     +571     
- Misses      35050    35153     +103     
- Partials     4814     4827      +13
Flag Coverage Δ
#Debug 75.15% <88.11%> (+0.03%) ⬆️
#production 70.54% <86.24%> (+0.02%) ⬆️
#test 90.29% <91.41%> (+0.03%) ⬆️
Impacted Files Coverage Δ
....ML.Tests/Transformers/DateTimeTransformerTests.cs 100% <100%> (ø)
...crosoft.ML.Core.Tests/UnitTests/TestEntryPoints.cs 98.36% <100%> (ø) ⬆️
src/Microsoft.ML.Featurizers/Common.cs 19.58% <15%> (ø)
...estFramework/Attributes/NotCentOS7FactAttribute.cs 26.08% <26.08%> (ø)
...rc/Microsoft.ML.Featurizers/DateTimeTransformer.cs 90.57% <90.57%> (ø)
....ML.AutoML/PipelineSuggesters/PipelineSuggester.cs 83.19% <0%> (-3.37%) ⬇️
...rc/Microsoft.ML.LightGbm/WrappedLightGbmDataset.cs 74.21% <0%> (-2.12%) ⬇️
...rc/Microsoft.ML.LightGbm/WrappedLightGbmBooster.cs 83.8% <0%> (-0.63%) ⬇️
...soft.ML.TestFramework/DataPipe/TestDataPipeBase.cs 76.08% <0%> (-0.4%) ⬇️
...ML.Transforms/Text/StopWordsRemovingTransformer.cs 86.1% <0%> (-0.16%) ⬇️
... and 21 more
#Resolved

@codemzs codemzs requested a review from natke December 12, 2019 18:27
WeekIso = 15,
YearIso = 16,
MonthLabel = 17,
AmPmLabel = 18,
Copy link
Contributor

@natke natke Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we do AmPm label, would it make sense to also do weekend/weekday? (Often used in feature engineering) #Resolved

Copy link
Member Author

@michaelgsharp michaelgsharp Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think thats not a bad idea, but the initial ask we got was just for these columns. We will start with just these and then can add in more if the requirement arises. #Resolved

{
ValueGetter<T> result = (ref T dst) =>
{
long dateTime = default;
Copy link
Contributor

@natke natke Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly agree on using System.DateTime from the beginning. This will make the transform consistent with other parts of the API, as well as more usable to .NET developers #Resolved

/// <param name="inputColumnName">Input column name</param>
/// <param name="columnPrefix">Prefix to add to the generated columns</param>
/// <returns><see cref="DateTimeEstimator"/></returns>
public static DateTimeEstimator DateTimeTransformer(this TransformsCatalog catalog, string inputColumnName, string columnPrefix)
Copy link
Contributor

@natke natke Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The naming of the extension methods are not consistent with the other transformers (I'll leave it to others to judge whether the original scheme makes sense :-), but consistency is probably a more important consideration now)

If you look at the other transforms they mostly have the form Verb-Object e.g. CopyColumns, MapValueToKey etc. This makes sense when you are chaining them together in a pipeline.

Suggestions for this one: FeaturizeDateTime, SplitDateTime, TokenizeDateTime, ExtractDateTime ...

#Resolved

Copy link
Member Author

@michaelgsharp michaelgsharp Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of the ones you listed I would vote FeaturizeDateTime. Anyone have any objections if I rename the extension method to that? #Resolved

case DateTimeEstimator.ColumnsProduced.MonthLabel:
case DateTimeEstimator.ColumnsProduced.AmPmLabel:
case DateTimeEstimator.ColumnsProduced.DayOfWeekLabel:
case DateTimeEstimator.ColumnsProduced.HolidayName:
Copy link
Contributor

@justinormont justinormont Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Usability question, why are we producing multiple output columns of instead of a single vector type column?

The next step after this DateTime transform is:

  • Convert all the INTs/USHORTs/BYTEs columns to Float32
  • One-hot encode the STRING columns ("Friday" to [ 0, 0, 0, 0, 0, 1, 0 ])
  • Concatenate all of these feature to single vector of Float32

Why aren't we just outputting the final vector type column? Then the user can directly use its output, like our other transforms. #Resolved

Copy link
Contributor

@justinormont justinormont Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there's a usecase for the individual columns, perhaps just a boolean parameter produceRawColumns, which then produces the individual outputs instead of a singular vector column? #Resolved

Copy link
Member Author

@michaelgsharp michaelgsharp Dec 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We did it this way just because thats how the original python code works. It splits everything up into different columns and doesn't combine them into a vector. #Resolved

Copy link
Contributor

@natke natke Dec 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having a single vector is consistent with FeaturizeText. #Resolved

Copy link
Member Author

@michaelgsharp michaelgsharp Dec 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The original code we were bringing to ML.NET returned all individual columns. Because of that, for now, we will be leaving them as individual columns. If we learn that we never need individual columns and only use this as a vector then we can revisit this then. #Resolved

@azure-pipelines
Copy link

azure-pipelines bot commented Dec 17, 2019

No pipelines are associated with this pull request.

#Resolved

@michaelgsharp michaelgsharp merged commit 290e069 into dotnet:master Dec 17, 2019
@michaelgsharp michaelgsharp deleted the datetime-transformer branch December 17, 2019 21:35
@@ -219,5 +220,31 @@ internal static TypeId GetNativeTypeIdFromType(this Type type)

throw new InvalidOperationException($"Unsupported type {type}");
}

// The Native Featurizers do not currently support CentOS7, this method checks the OS and returns true if it is CentOS7.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about RedHat 7?

{
if (RuntimeInformation.IsOSPlatform(OSPlatform.Linux))
{
using (Process process = new Process())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A more performant way of doing this check would be to read from the file directly. Starting a new process just to get this info is too heavy-weight IMO.

See https://github.com/dotnet/runtime/blob/6f445d6dc237f59d730e0df47f8630b18887d776/src/installer/managed/Microsoft.DotNet.PlatformAbstractions/Native/PlatformApis.cs#L140-L142

@ghost ghost locked as resolved and limited conversation to collaborators Mar 19, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants