Precompute the normalized type names of DataTypes and use string.Create on NS2.1 for faster normalization. #364

ahsonkhan · 2019-12-05T07:27:33Z

I noticed some areas where the implementation could be made more performant by avoiding using Linq and caching the normalized type names for the simple and complex DataTypes. That way, they don't have to be computed every time. I am not sure if the code paths are perf critical, in which case, I would understand if this PR isn't useful. If that's the case, please let me know.

Also, added a call to use string.Create to make normalizing the type names a bit faster (and allocate less). If adding a netstandard2.1 TFM isn't feasible, please let me know and I can revert that part of the change.

The behavior isn't being changed at all, hence existing tests should be sufficient.

BenchmarkDotNet=v0.11.5.1159-nightly, OS=Windows 10.0.18362
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.100-preview1-014459
  [Host]     : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
  Job-PDGMZC : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT

PowerPlanMode=00000000-0000-0000-0000-000000000000  IterationTime=250.0000 ms  MaxIterationCount=10  
MinIterationCount=5  WarmupCount=1

Accessing the TypeName property for the known simple/complex types got much faster:

Method	Mean	Error	StdDev	Median	Min	Max	Ratio	Gen 0	Gen 1	Gen 2	Allocated
TypeName_old	41.71 ns	0.684 ns	0.304 ns	41.61 ns	41.31 ns	42.22 ns	1.00	0.0095	-	-	40 B
TypeName_new	14.52 ns	0.360 ns	0.188 ns	14.53 ns	14.25 ns	14.72 ns	0.35	-	-	-	-

Normalizing a new type name (using string.Create) reduces allocations:

Method	Mean	Error	StdDev	Median	Min	Max	Ratio	Gen 0	Gen 1	Gen 2	Allocated
NormalizeTypeName_old	42.60 ns	1.359 ns	0.899 ns	42.54 ns	41.42 ns	43.86 ns	1.00	0.0190	-	-	80 B
NormalizeTypeName_new	38.42 ns	0.762 ns	0.453 ns	38.22 ns	38.02 ns	39.35 ns	0.90	-	-	-	40 B

cc @eerhardt

…te on NS2.1 for faster normalization.

culture).

eerhardt · 2019-12-06T01:19:44Z

src/csharp/Microsoft.Spark/Microsoft.Spark.csproj

@@ -1,7 +1,7 @@
 <Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
-    <TargetFramework>netstandard2.0</TargetFramework>
+    <TargetFrameworks>netstandard2.0; netstandard2.1</TargetFrameworks>


Do we also want netcoreapp2.1, so we get the perf improvements there? Or is getting them only on .NET Core 3.0+ ok?

I don't mind adding the ncapp2.1 TFM if there are no concerns with doing so. If there are other perf benefits of adding netcoreapp2.1 specifically, then let's do it. Otherwise, this change alone might not warrant doing so.

eerhardt · 2019-12-06T01:21:40Z

src/csharp/Microsoft.Spark/Sql/Types/DataType.cs

+                Debug.Assert(s_complexTypeNormalizedNames == null);
+                BuildNormalizedStringMapping();
+            }
+            return s_simpleTypeNormalizedNames.AsSpan().IndexOf(typeName);


Why AsSpan here? Why not use Array.IndexOf?

Habit/personal preference to use spans :)

Array.IndexOf works fine (and is effectively equivalent). Shall I change it?

Your call. It was just surprising to get a span just for IndexOf.

eerhardt

ahsonkhan · 2019-12-06T03:17:43Z

@ahsonkhan ahsonkhan Ahson Khan FTE dismissed eerhardt Eric Erhardt FTE’s stale review via b33d6b7 1 hour ago

Looks like GitHub is asking for another approval :)

imback82 · 2019-12-06T03:31:55Z

Thanks @ahsonkhan for the PR! This code is not in a critical path, but I will evaluate if it's worthwhile for extra complexity. @suhsteve can you also take a look? Thanks!

suhsteve · 2019-12-06T23:01:12Z

src/csharp/Microsoft.Spark/Sql/Types/DataType.cs

+        private static string[] s_simpleTypeNormalizedNames = null;
+        private static string[] s_complexTypeNormalizedNames = null;


Can we replace this with something like

private static Lazy<string[]> s_simpleTypeNormalizedNames = new Lazy<string[]>( () => s_simpleTypes.Select(t => NormalizeTypeName(t.Name)).ToArray()); private static Lazy<string[]> s_complexTypeNormalizedNames = new Lazy<string[]>( () => s_complexTypes.Select(t => NormalizeTypeName(t.Name)).ToArray());

With this you should be able to get rid of

private static int SimpleTypeIndex(string typeName) private static int ComplexTypeIndex(string typeName) private static void BuildNormalizedStringMapping()

as well as remove the if check in

private static string NormalizeTypeName(Type type) { if (s_simpleTypeNormalizedNames == null)

With this you should be able to get rid of

Even with lazy initialization, how can we get rid of SimpleTypeIndex and ComplexTypeIndex? We still need to figure out what index the incoming type maps to in the normalized names array, no?

Can you call
s_simpleTypeNormalizedNames.Value.IndexOf(typeName) at the callsite instead ?

Sure. I was thinking of keeping it a method to add doc comments.

suhsteve · 2019-12-06T23:01:15Z

src/csharp/Microsoft.Spark/Sql/Types/DataType.cs

        /// <summary>
        /// Normalized type name.
        /// </summary>
-        public string TypeName => NormalizeTypeName(GetType().Name);
+        public string TypeName => NormalizeTypeName(GetType());


Instead of having a new method for this, it may be simpler to normalize the name and cache the result. Something like

private string _typeName; /// <summary> /// Normalized type name. /// </summary> public string TypeName { get { return _typeName ?? (_typeName = NormalizeTypeName(GetType().Name)); } }

ahsonkhan · 2019-12-07T02:01:07Z

Fyi, the ParseSimpleType and ParseDataType(JToken json) methods got faster and allocate less (when the type was found in the cache):

Method	Mean	Error	StdDev	Median	Min	Max	Ratio	Gen 0	Gen 1	Gen 2	Allocated
ParseSimpleType_old	465.22 ns	35.651 ns	23.581 ns	461.61 ns	429.68 ns	507.12 ns	1.00	0.0875	-	-	368 B
ParseSimpleType_new	88.88 ns	7.526 ns	4.978 ns	87.10 ns	84.02 ns	96.57 ns	0.19	0.0057	-	-	24 B

I suspect ParseDataType(string) will get even faster once this change is in (and we avoid using JToken.Parse): #358

imback82

LGTM, thanks @ahsonkhan

imback82 · 2019-12-07T03:48:07Z

@eerhardt, before I merge this PR, I have a naive question. Do you see any cons of targeting multiple frameworks going forward? I don't see any, but just wanted to confirm since you are an expert on this. Thanks!

suhsteve · 2019-12-07T03:55:53Z

@ahsonkhan Just a general comment, but if you really wanted, we can hardcode and create a mapping between the typename => type and use this instead of doing a linear search of the Array for the index of the type.

eerhardt · 2019-12-09T16:27:17Z

Do you see any cons of targeting multiple frameworks going forward?

The only real con is that you need to build multiple times, but since this repo is small, it won't matter.

This change will only help future code that wants to take advantage of new APIs (typically for perf reasons, like this change).

Precompute the normalized type names of DataTypes and use string.Crea…

9bf4db0

…te on NS2.1 for faster normalization.

imback82 requested review from imback82, eerhardt and suhsteve December 5, 2019 07:36

Add using directive for System.Globalization (needed for current

a899556

culture).

eerhardt reviewed Dec 6, 2019

View reviewed changes

eerhardt previously approved these changes Dec 6, 2019

View reviewed changes

Address feedback (add nca2.1) and fix typo to use the right types.

b33d6b7

ahsonkhan dismissed eerhardt’s stale review via b33d6b7 December 6, 2019 02:03

suhsteve reviewed Dec 6, 2019

View reviewed changes

ahsonkhan added 2 commits December 6, 2019 17:22

Address PR feedback. Use Lazy<T>.

84f7d3a

Remove unused using directive (Diagnostics).

859837f

imback82 approved these changes Dec 7, 2019

View reviewed changes

imback82 merged commit e650769 into dotnet:master Dec 9, 2019

ahsonkhan deleted the NormalizeTypeNameImprovements branch December 9, 2019 21:47

imback82 mentioned this pull request Jan 6, 2020

Support .NET Core 3.1 #291

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Precompute the normalized type names of DataTypes and use string.Create on NS2.1 for faster normalization. #364

Precompute the normalized type names of DataTypes and use string.Create on NS2.1 for faster normalization. #364

ahsonkhan commented Dec 5, 2019 •

edited

Loading

eerhardt Dec 6, 2019

ahsonkhan Dec 6, 2019 •

edited

Loading

eerhardt Dec 6, 2019

ahsonkhan Dec 6, 2019 •

edited

Loading

eerhardt Dec 6, 2019

eerhardt left a comment

ahsonkhan commented Dec 6, 2019

imback82 commented Dec 6, 2019

suhsteve Dec 6, 2019 •

edited

Loading

ahsonkhan Dec 7, 2019

suhsteve Dec 7, 2019

ahsonkhan Dec 7, 2019

suhsteve Dec 6, 2019

ahsonkhan commented Dec 7, 2019 •

edited

Loading

imback82 left a comment

imback82 commented Dec 7, 2019

suhsteve commented Dec 7, 2019

eerhardt commented Dec 9, 2019

		private static string[] s_simpleTypeNormalizedNames = null;
		private static string[] s_complexTypeNormalizedNames = null;

Precompute the normalized type names of DataTypes and use string.Create on NS2.1 for faster normalization. #364

Precompute the normalized type names of DataTypes and use string.Create on NS2.1 for faster normalization. #364

Conversation

ahsonkhan commented Dec 5, 2019 • edited Loading

eerhardt Dec 6, 2019

Choose a reason for hiding this comment

ahsonkhan Dec 6, 2019 • edited Loading

Choose a reason for hiding this comment

eerhardt Dec 6, 2019

Choose a reason for hiding this comment

ahsonkhan Dec 6, 2019 • edited Loading

Choose a reason for hiding this comment

eerhardt Dec 6, 2019

Choose a reason for hiding this comment

eerhardt left a comment

Choose a reason for hiding this comment

ahsonkhan commented Dec 6, 2019

imback82 commented Dec 6, 2019

suhsteve Dec 6, 2019 • edited Loading

Choose a reason for hiding this comment

ahsonkhan Dec 7, 2019

Choose a reason for hiding this comment

suhsteve Dec 7, 2019

Choose a reason for hiding this comment

ahsonkhan Dec 7, 2019

Choose a reason for hiding this comment

suhsteve Dec 6, 2019

Choose a reason for hiding this comment

ahsonkhan commented Dec 7, 2019 • edited Loading

imback82 left a comment

Choose a reason for hiding this comment

imback82 commented Dec 7, 2019

suhsteve commented Dec 7, 2019

eerhardt commented Dec 9, 2019

ahsonkhan commented Dec 5, 2019 •

edited

Loading

ahsonkhan Dec 6, 2019 •

edited

Loading

ahsonkhan Dec 6, 2019 •

edited

Loading

suhsteve Dec 6, 2019 •

edited

Loading

ahsonkhan commented Dec 7, 2019 •

edited

Loading