Skip to content

Extension types #2727

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mit-mit opened this issue Dec 16, 2022 · 173 comments
Closed

Extension types #2727

mit-mit opened this issue Dec 16, 2022 · 173 comments
Assignees
Labels
feature Proposed language feature that solves one or more problems

Comments

@mit-mit
Copy link
Member

mit-mit commented Dec 16, 2022

[Edit, mit-mit, Feb 2024 This feature launched in Dart 3.3; we have docs here.]

[Edit, eernstg: As of July 2023, the feature has been renamed to 'extension types'. Dec 2023: Adjusted the declaration of IdNumber—that example was a compile-time error with the current specification.]
[Edit, mit: Updated the text here too; see the history for the previous Inline Class content]

An extension type wraps an existing type into a new static type. It does so without the overhead of a traditional class.

Extension types must specify a single variable using a new primary constructor syntax. This variable is the representation type being wrapped. In the following example the extension type Foo has the representation type int.

extension type Foo(int i) { // A new type that wraps int.
 
  void function1() {
    print('my value is $i');
  }
}

main() {
  final foo = Foo(42);
  foo.function1(); // Prints 'my value is 42'
}

Extension types are entirely static. This means there is no additional memory consumption compared to using the representation type directly.

Invocation of a member (e.g. a function) on an extension type is resolved at compile-time, based on the static type of the receiver, and thus allows a compiler to inline the invocation making the extension type abstraction zero-cost. This makes extension types great for cases where no overhead is required (aka zero-cost wrappers).

Note that unlike wrapper creates with classes, extension types do not exist at runtime, and thus have the underlying representation type as their runtime type.

Extension types can declare a subtype relation to other types using implements <type>. For soundness, implements T where T is a non-extension type is only allowed if the representation type is a subtype of T. For example, we could have implements num in the declaration of IdNumber below, but not implements String, because int is a subtype of num, but it is not a subtype of String.

Example

// Create a type `IdNumber` which has `int` as the underlying representation.
extension type IdNumber(int i) {
  // Implement the less-than operator; smaller means assigned before.
  bool operator <(IdNumber other) => i < other.i;

  // Implement the Comparable<IdNumber> contract.
  int compareTo(IdNumber other) => this.i - other.i;

  // Verify that the IdNumber is allocated to a person of given age.
  bool verify({required int age}) => true; // TODO: Implement.
}

class Foo implements Comparable<Foo> {
  @override
  int compareTo(Foo other) => 1;
}

void main() {
  int myId = 42424242; // Storing an id as a regular int.
  myId = myId + 10;    // Allowed; myId is just a regular int.

  var safeId = IdNumber(20004242); // Storing an id using IdNumber.
  myId = safeId + 10; // Compile-time error, IdNumber has no operator `+`.
  myId = safeId;      // Compile-time error, type mismatch.

  print(safeId.verify(age: 22)); // Prints true; age 22 matches.
  myId = safeId as int; // Extension types support type casting.
  print(safeId.i);      // Prints 20004242; the representation value can be read.

  final ids = [IdNumber(20001), IdNumber(200042), IdNumber(200017)];
  ids.sort();
  print(ids);

  dynamic otherId = safeId;
  print(otherId.i); // Causes runtime error: extension types are entirely static, ..
  // .. the static type is `dynamic`, and the dynamic type of `otherId` has no `i`.
}

Specification

Please see the feature specification for more details.

This feature realizes a number of requests including #1474, #40

Experiment

A preview of this feature is available under the experiment flag inline-class.

Implementation tracking

See dart-lang/sdk#52684

@mit-mit mit-mit added the feature Proposed language feature that solves one or more problems label Dec 16, 2022
@ds84182
Copy link

ds84182 commented Dec 19, 2022

I like the simplicity of this proposal in particular. It has all the semantics expected of a wrapper/newtype and in its current form does not seem to have many caveats.

How does this interact with generics? I'm guessing <int>[] as List<IdNumber> succeeds but assignment without a cast is a compile time error. At runtime the cast is still a no-op.

Also, wondering if this could avoid the field. Like:

inline class IdNumber on int {
  IdNumber(super);

  operator <(IdNumber other) => super < other.super;
}

This may be breaking though, as it'll start treating super as an expression and not as a receiver.

@eernstg
Copy link
Member

eernstg commented Dec 19, 2022

I'm guessing <int>[] as List<IdNumber> succeeds but assignment without a cast is a compile time error. At runtime the cast is still a no-op.

Exactly.

wondering if this could avoid the field

That would certainly be possible. The use of an on clause was part of some earlier proposals for this feature, and we also had some proposals for using super to denote "the on object" (now known as the representation object). By the way, it shouldn't create any big issues to treat super as an expression.

However, the main reason why the proposal does not do these things today is that we gave a lot of weight to the value of having an inline class declaration which is as similar to a regular class declaration as possible. For instance, the use of constructors makes it possible to establish the representation object in many different ways, using a mechanism which is well-known (constructors, including redirecting and/or factory constructors).

We may very well get rid of the field in a different sense: We are considering adding a new kind of constructor known as a 'primary' constructor. This is simply a syntactic abbreviation that allows us to declare a constructor with a parameter, and implicitly get a final field with the parameter's type. This is similar to the primary constructors of Scala classes: It is part of the header of the declaration, it goes right after the name, and it looks like a declaration of a list of formal parameters.

inline class IdNumber(int value) {
  bool operator <(IdNumber other) => value < other.value;
}

The likely outcome is that primary constructors will be added to Dart, and they will be available on all kinds of declarations with constructors (that's classes and inline classes, for now). However, the design hasn't been finalized (and, of course, at this point we can't promise that Dart will have primary constructors at all), so they aren't available at this point.

@purplenoodlesoop
Copy link

Does this feature replace the previous proposal of views?

@torellifr
Copy link

Probably you have already considered it, but I wonder if it could be useful to extend the specification to support also inline classes with multiple fields.
Similarly to an inline class with a single field, no additional runtime instance would exist, but only the values of the fields would exist at runtime. A variable which static type is an inline class having n fields, would correspond at runtime to a set of n (hidden) variables (one per field of the inline class).
The effect would be simular to an inline class with a single field that wraps an immutable record, but without the need to have the immutable record and without the disadvantage of having the actual fields nested in another field.
I guess that inline classes with multiple fields could be useful to create efficient code that imposes a discipline on the usage of multiple objects created by third party libraries (e.g. to relate objects created by different libraries), in the same way that an inline class with a single field imposes a discipline on the usage of a single object.

@lrhn
Copy link
Member

lrhn commented Jan 2, 2023

An inline class with multiple fields cannot be assignable to Object, which requires (the reference to) the value to be stored in a single memory location. That means you cannot have a List<MultiValueInlineClass>.

With records, you can store several values in one object, so an inline class on a record type could behave like an inline class on multiple fields, with an extra indirection.

We could consider allowing multiple fields, and then implicitly making the representation type into a record, but then we'd have to define the record structure (all positional in source order, or all named?).
By requiring the class author to choose a record shape, it becomes more configurable.

@eernstg
Copy link
Member

eernstg commented Jan 3, 2023

@purplenoodlesoop wrote:

Does this feature replace the previous proposal of views?

Yes. The main differences are that the construct has been renamed, and the support for primary constructors has been taken out (such that we can take another iteration on a more general kind of primary constructors and then, if that proposal is accepted, have them on all kinds of classes).

@mateusfccp
Copy link
Contributor

@eernstg It seems that this new proposal doesn't have the possibility to show or hide members of the original type. Or am I missing something?

@eernstg
Copy link
Member

eernstg commented Jan 4, 2023

this new proposal doesn't have the possibility to show or hide members of the original type

True. I'm pushing for an extension to the inline class feature to do those things. It won't be part of the initial version of this feature, though.

@timsneath
Copy link
Contributor

I'm curious whether a future version of this feature might include some standardized support for enforcing that the type is a subset of all available values? For example, the IdNumber type used in the example might be required to be a positive eight digit integer, or a US social security identifier might match against a String-based regex.

Even without a formal ranged value feature, it would be nice to have a convention for asserting correctness of a value, for example, having something like bool get isValid that could be overridden by an inline class.

@lrhn
Copy link
Member

lrhn commented Jan 18, 2023

The current version definitely do not provide any such guarantee.
That's what allows us to erase the type at runtime. Because we do that, there is no way to distinguish a List<int> from a List<IdNumber> at runtime, and if we allow a cast of as List<IdNumber> at all, then it has mean as List<int> at runtime where the IdNumber type has been erased.

An alternative where we retain the inline-class type at runtime, so the type List<IdNumber> is different from, and incompatible with, a List<int>, and where we simply do not allow object as IdNumber (or force it to go through a check/constructor invocation defined on IdNumber) would be a way to keep the restriction on which values are accepted at runtime.

I don't think it's impossible. I think making is T go through an implicit operator is constructor-like validator is more viable than disallowing as T entirely when T is a reified inline class type, because that would be hard on generics.
(Unless inline class types are moved outside of the Object hierarchy, and we only allow is/as checks on Object?s, but that'll just make lots of generic classes not work with them, and I don't even know what covariant parameters would do. I guess the inline class types would also have to be invariant.)

That'd be a feature with much different performance tradeoffs, and would need deep hooks into the runtime system in order to have types (as sets of values) that are detached from our existing types (runtime types of objects).

@eernstg
Copy link
Member

eernstg commented Jan 18, 2023

@timsneath wrote:

enforcing that the type is a subset of all available values

We could choose to use a different run-time representation of an inline type (so the representation of IdNumber would be different from the representation of int, but the former probably has a pointer to the latter).

We could then make 2 as IdNumber throw (or it could be a compile-time error), and we could make 2 as X throw when X is a type variable whose value is IdNumber. Similarly, List<IdNumber> could be made incomparable to List<int>, and a cast from one to the other (involving statically known types or type variables in any combination) would throw.

I proposed a long time ago that we should create a connection between casts and execution of user-written code (so e as IdNumber and e as X where X has the value IdNumber would cause an isValid getter to be executed), but this is highly controversial and didn't get much support from the rest of the language team. One reason for this is that it is incompatible with hot reload (where the whole heap may need to be type checked, and runtimes like the VM do not tolerate execution of general Dart code during this process).

It would probably be possible to maintain a discipline where every value of an inline type is known to have been vetted by calling one of the constructors of the inline type, and similarly it might be possible to prevent that the representation object leaks out (with its "original" type).

So there are several things we could do, but we haven't explored this direction in the design very intensively. With the proposal that has been accepted, the run-time representation of an inline type is exactly the same as the run-time representation of the underlying representation type (for instance identical(IdNumber, int) is true, and so is 2 is IdNumber as well as myIdNumber is int).

I believe it would be possible to introduce a variant of the inline class feature which uses a distinct representation of the type at run time and gives us some extra guarantees. We could return to that idea at some point in the future (or perhaps not, if it doesn't get enough support from all the stakeholders ;-).

@timsneath
Copy link
Contributor

Thanks to both of you. Yeah, the feature you describe sounds pretty exciting.

I guess I'm wondering whether the current proposal leads us to having two slightly different solutions to the same use case: inline classes and type aliases. In both cases, the type is erased at runtime; the former is available at compile-time.

I'm very much speaking as a layperson here, so my comments don't have a ton of weight. But if we can only have two of the three (erased at compile/runtime, hybrid, distinct at compile/runtime), would we want to pick the two that have the greatest distinction in usage scenarios, to maximize their value? Inline classes seem to make type aliases redundant, at least for the things I use them for today.

@purplenoodlesoop
Copy link

@timsneath

Both type aliases and inline classes can be directly mapped to Haskell's Type synonyms and Newtypes, I think I even saw direct references in the feature specifications.

I think both have their uses – typedefs carry a semantic value when inline classes explicitly offer new functionality. I see it as two separate use cases.

@eernstg
Copy link
Member

eernstg commented Jan 20, 2023

@timsneath, I agree with @purplenoodlesoop that the two constructs are different enough to make them relevant to different use cases. Here's a bit more about what those use cases could be:

An inline class allows us to perform a replacement of the set of members available on an existing object (a partial or complete replacement as needed, except members of Object). We do this by giving it an inline type rather than its "original" type, aka its representation type (typically: a regular class type).

So we can make a decision to treat a given object in a customized manner—because that's convenient; or because we want to protect ourselves from doing things to that object which are outside the boundary of the inline class interface; or because we want to raise a wall between different objects with the same representation type, so that we don't confuse objects that are intended to mean different things (e.g., SocialSecurityNumber vs. MaxNumberOfPeopleInThisElevator, both represented as an int).

A typedef makes absolutely no difference for the interface, and it doesn't prevent assignment from the typedef name to the denoted type, or vice versa.

On the other hand, we can use a typedef as a simple type-level function, and this makes it possible to simplify code that needs to pass multiple type arguments in some situations. Here is an example where we also use the typedef'd name in an instance creation:

class Pair<X, Y> {
  final X x;
  final Y y;
  Pair(this.x, this.y);
}

typedef FunPair<X, Y, Z> = Pair<Y Function(X), Z Function(Y)>;

void main() {
  var p = FunPair((String s) => s.length, (i) => [i]);
  print(p.runtimeType);
  // 'Pair<(String) => int, (int) => List<int>>'.
}

FunPair provides a customized version of the Pair type which is suitable for creating pairs of functions that we can compose (so we're making sure that we don't make mistakes by writing Y Function(X), Z Function(Y) again and again).

In the first line of main we create a new FunPair. We need to write the type of the first argument (String), but the type inference machinery ensures that we can get the other two types correct without writing any more types. If you change FunPair to Pair in that expression, the second argument gets the type List<dynamic> Function(dynamic).

I think this illustrates that typedefs and inline classes can at least be used in ways that are quite different, and my gut feeling is that there won't be that many situations where we could use one or we could use the other, and there's real doubt about which one is better.

@timsneath
Copy link
Contributor

This is a really helpful explanation, thanks @eernstg! (We should make sure that we add something like this to the documentation when we get to that point...)

@yjbanov
Copy link

yjbanov commented Feb 19, 2023

I could be missing a bunch of aspects to this, but I think given Dart's direction towards a sound language, it's worth exploring how much the soundness can be preserved with this feature. Specifically, the most useful soundness guarantee would be during the int => SocialSecurityNumber conversion. The inverse - SocialSecurityNumber => int - can be relaxed to enable compatibility with general purpose containers, and Object/Object?. There's not much to validate when going from a subset to a superset anyway, since by definition the superset allows all elements of the subset.

One issue we'd want to avoid is reintroduce the wrapper object problem for containers, since to get a type-safe conversion from List<int> one would need to wrap it into a List<SocialSecurityNumber>, e.g. by calling map<SocialSecurityNumber>(...).toList().

This is where having as invoke the constructor of the inline class would be useful. However, we only need a limited one. Namely, as would only invoke the constructor of the outer type. It would not include any magic for converting generic types. Since List<int> as List<SocialSecurityNumber> is disallowed, one could create an inline class that validates the contents of List<int> and produces a soundly checked collection of SocialSecurityNumber. For example:

inline class InlineList<F, inline T on F> {
  InlineList(this._it);

  final List<F> _it;

  T operator[] (int index) => _it[index] as T;

  List<T> cast() => List.generate(_it.length, (i) => _it[index] as T);
}

Here InlineList is a wrapperless view on List<F>. It denotes that type T is an inline class type that wraps F. This is so that expression _it[index] as T inform the compiler that T's constructor must be called and that T is an inline type that accepts F as its underlying representation. Note that the language does not need to specify, and the compiler does not need to implement, how to iterate over and type check the contents of the underlying representation (which would be the case if we demanded support for List<int> as List<SocialSecurityNumber> expression to be supported and be sound). That would be difficult to do since there could be any number of collection types, including those not API-compatible with those from the standard libraries.

Now we can safely view a List<int> as a collection of SocialSecurityNumber:

typedef SocialSecurityNumberList = InlineList<int, SocialSecurityNumber>;

void main() {
  final rawList = <int>[1, 2, 3, 4, 5];
  final ssnList = rawList as SocialSecurityNumberList;
  print(ssnList[3]); // SocialSecurityNumber constructor invoked here
}

The following key properties are preserved:

  • int => SocialSecurityNumber conversion does not bypass validation.
  • There are no wrapper objects, with one caveat:
    • If SocialSecurityNumberList must be passed to a utility that takes List, it either needs to be cast back to List<int> or wrapped using InlineList.cast() method.

I'm not sure what the issue is with hot reload. Object representations are still the same. If the issue is with soundness, then for hot reload it can be relaxed. Hot reload already has type related issues. After all, all we do is validate the permitted value spaces. This is already an issue with hot reload. Consider expression:

list.where((i) => i.isOdd)

It extracts odd numbers out of a list, and potentially stores it somewhere in the app's state. Now what happens when you change it to:

list.where((i) => !i.isOdd)

Then hot reload. Any new invocations will produce even numbers. However, all existing outputs of this function stored in the app state will continue holding onto odd numbers. All bets are off at this point, and you have to restart the app (luckily there's hot-restart).

So I think it's totally reasonable for hot reload not preserve that level of soundness.

@lrhn
Copy link
Member

lrhn commented Feb 20, 2023

The as you're talking about here would be like a "default constructor" or default coercion operator, from representation type to inline class type, carried by the inline class somehow.

It cannot work with the current design. When we completely erase the inline class at runtime, the _it[index] as T cannot see that T is an inline class type, because it isn't. The type of T at runtime is the representation type.

Even if we know that the T was originally an inline class, from the inline T on F bound, we don't know which one, not unless we let the T carry extra information at runtime, which is precisely what we don't do. All that information is erased. It ceased to be. It's an ex-information.

In short: You cannot abstract over inline classes, because abstracting over types is done using type variables, which only get their binding at runtime, and inline classes do not exist at runtime.
Same reason extension methods of a subclass doesn't work with type parameters, all you know statically is the bound of the type parameter, not the actual value, and anything statically resolved will see only that.

We could make all inline classes automatically subtype a platform type InlineClass<T> where T is their representation type. Just like we make all enums implement Enum, functions implement Function and records implement Record.

That will allow one to enforce that a static type is an inline class type, and be used to bound type parameters like suggested above. (No need for special syntax like inline T on F.)
Not sure how useful that is going to be in practice with the current inline class design, but it can allow us to add operations that work on all inline class types.

We can make InlineClass<T> assignable to T by "implicit side-cast" if we want to, without making the actual inline types so. Not particularly necessary, since you can always do an explicit as T instead. Probably not worth it.

So, basically, the current inline class design does not provide a way to guard a type against some values of the representation type, no way to create subsets of types. Anything which requires calling a validation operation at runtime is impossible to enforce, because we can skirt it using the completely unsafe as, and going through generics.

The only security you get against using an incompatible int as a social security number is that there is no subtype relation between the two, and no assignability, so you can't just assign a List<int> to List<SocialSecurityNumber>. You need to do a cast. Which makes the SocialSecurityNumber abstraction safe, but not sound. (Normal casts are sound, not safe. They throw if the conversion isn't valid. Casts from representation type to inline class types always succeeds, but doesn't necessarily preserve all notions of soundness that the inline class might want to preserve, because it cannot do checks at runtime.)

If your program contains no as casts, or dynamic values, it's probably type sound.

@yjbanov
Copy link

yjbanov commented Feb 22, 2023

@lrhn

Thanks for the detailed explanation! Yeah, this would need extra RTTI to work.

If your program contains no as casts, or dynamic values, it's probably type sound.

Cool. If we require explicit casts, then maybe this is safe enough as is.

Anyway, looking forward to trying this out! This is my favorite upcoming language feature so far 👍

@Peng-Qian
Copy link

Is this feature will be shipped in Dart 3.0? It seems perfect to create value objects by the inline class.

@mraleph
Copy link
Member

mraleph commented Feb 24, 2023

Is this feature will be shipped in Dart 3.0? It seems perfect to create value objects by the inline class.

It is not part of 3.0.

@eernstg
Copy link
Member

eernstg commented Feb 24, 2023

@yjbanov wrote:

it's worth exploring how much the soundness can be preserved with this feature.

Note that we had some discussions about a feature which was motivated by exactly this line of thinking:

In an early proposal for the mechanism which is now known as inline classes, there is a section describing 'Protected extension types'. The idea is that a protected extension type is given a separate representation at run time when used as a part of a composite type (that is, as a type argument or as a parameter/return type of a function type). This means that we can distinguish a List<SocialSecurityNumber> from a List<int> at run time (but we still don't have a wrapper object for an individual instance).

The main element of this feature is that an inline class can be protected, which means that it must have a bool get isValid getter which will be invoked implicitly at the beginning of every generative constructor body, and there's a dynamic error if the given representation object isn't valid.

protected inline class SocialSecurityNumber {
  final int value;
  SocialSecurityNumber(this._value); // Implicit body: { if (!isValid) throw SomeError(...); }
  bool get isValid => ...; // Some validation.
}

void main() {
  var ssn = SocialSecurityNumber(42); // Let's just assume that the validation succeeds.

  // Single object.
  int i = ssn.value; // The inline class can always choose to leak the value.
  i = ssn as int; // We can also force the conversion: Succeeds at run time.
  ssn = i as SocialSecurityNumber; // Will run `isValid`.

  // Higher-order cases.
  List<int> xs = [42];
  List<SocialSecurityNumber> ys = xs; // Compile-time error.
  ys = xs as List<SocialSecurityNumber>; // Throws at run time.
  xs.forEach((i) => ys.add(i as SocialSecurityNumber)); // Runs `isValid` on each element.

  void f<Y>(List xs, List<Y> ys) {
    xs.forEach((x) => ys.add(x as Y)); // Runs `Y.isValid` on `x` when `Y` is a protected inline type.
  }
  f(xs, <SocialSecurityNumber>[]); // OK.
  f([41], <SocialSecurityNumber>[]); // Throws if 41 isn't valid.
}

This mechanism would rely on changing the semantics of performing a dynamic type check. This is the action taken by evaluation of i as T for some T, but also the action taken when executing ys.add(e as dynamic) when ys is a List<I> where I is a protected inline type, and also the action taken during covariance related type checks, etc.

The fact that the invocation of isValid occurs whenever there is a dynamic check for a protected inline type is quite promising, in the sense that it would probably suffice to ensure that no object o will ever have static type I where I is a protected inline type, unless o.isValid has returned true at some point.

Of course, if o is mutable and isValid depends on mutable state then o.isValid can be false in the current state. This is an issue that developers need to reason about "manually".

This would work also in the case where the source code has a type variable X whose value is a protected inline type: A run-time check against a given type which is a type variable is already performed using a thunk (a low-level function), at least on the VM. This thunk is specific to the given value of the type variable. This means that there is no additional cost for checking against any type which is not a protected inline type (they would just have a thunk which does exactly the same thing as today), and the thunk for a protected inline type can run isValid on the given object.

We could express the InlineList as follows:

inline class InlineList<X, I> {
  final List<X> _it;
  InlineList(this._it);
  I operator [](int index) => _it[index] as I; // Runs `isValid` if `I` is protected.
  List<I> cast() => List.generate(_it.length, (i) => _it[index] as I);
}

The whole idea about protected inline classes was discussed rather intensively for a while, e.g., in #1466, but it did not get sufficient support from the rest of the language team. Surely there would still be a need to sort out some details, and in particular it would be crucial that it is a pay-as-you-go mechanism, not an added cost for every dynamic type check.

@yjbanov
Copy link

yjbanov commented Mar 1, 2023

I totally agree that it should be pay-as-you-go. The zero-cost abstraction is the main feature of this. Otherwise, the current classes would be just fine. The kind of safety I'm looking for is mostly development-time safety, where performance is not as important as ability to catch bugs. There's not much recovery to do if isValid returns false while one is trying to stuff a value in a list. It's like a RangeError and indicates a bug in the program. In practice, having isValid contain assertions only, and when compiled in release mode, evaporate entirely (e.g. so that a huge JSON tree can be wrapped in a typed data structure in O(1) time in release mode, even if it's validated in O(N) time during development).

But yeah, if this makes the language too complex, then this is fine as specified.

@eernstg
Copy link
Member

eernstg commented Mar 1, 2023

Silly idea: We could have a special variant of inline classes: assert(protected) inline class I .... It would be required to declare a bool get isValid member, and it would work exactly as a protected inline class as described above when assertions are enabled. In particular, it would implicitly run isValid to check the validity of every dynamic type check for being an I, and it would throw if the given object isn't valid and the type is required (as), or it would return false if the type isn't required (is).

When assertions are not enabled it would be a plain inline class. isValid would no longer be invoked implicitly, and the run-time representation of the inline type I would now be identical to the run-time representation of the representation type.

This means that myObject as I would be a no-op (and recognized as such by the compiler) when the static type of myObject is the representation type of I. For higher-order "conversions" we'd use a few helper functions:

List<I> castList<X, I>(List<X> xs) {
  if (X == I) return xs as List<I>;
  var ys = <I>[];
  for (var x in xs) ys.add(x as I);
}

During a production execution we would have X == I (considering the intended invocation where X would be int and I would be SocialSecurityNumber or something like that), and hence we would shortcut the conversion. During a development execution we would have X != I, and the validation would occur on each element of the list.

It is a silly idea, of course, because we would make production execution substantially different from development execution (some types are different in one mode and identical in another), and that is always a serious source of difficulties, e.g., because some bugs occur in one mode and not in the other. But it's still kind of fun. ;-)

@jodinathan
Copy link

is currently possible to test this?

@eernstg
Copy link
Member

eernstg commented Mar 7, 2023

The inline class feature can be enabled (using an experimental and not entirely complete implementation) by giving the option --enable-experiment=inline-class to a recent dev-channel version of Dart.

@eernstg
Copy link
Member

eernstg commented Feb 14, 2025

If I clone extension_type_unions and make it all classes, would you prefer that? (Assuming that you'd even consider to use it in the first place). My assumption is that the run-time cost will matter too much.

@ghost
Copy link

ghost commented Feb 14, 2025

@eernstg:
I did some further experiments with the code of NumberC.
I modified the code and played with different mask values: (from 0x3 to 0xFFFF). (so we are kind of creating the objects in batches of varying size).

void testC(int mask) {
  var numbers = <NumberC>[];
  print('Measure `NumberC` $mask:');
  for (int i = 0; i < count; ++i) {
    if ((i&mask==0)) numbers.clear();
    numbers.add(NumberC(i));
  }
  speak('${numbers[0]}');
}

The size of the batch affects the performance greatly (we already saw this effect earlier). My theory is that in dart, the GC is configured so that the "young generation" is limited to some value between 8K and 16K objects (maybe 10K?).

Here's the thing: in a typical use case for union types (passing parameters to functions) the lifetime of the object is short. The tests show that the overhead of allocating/deallocating such objects is very small - probably negligible in the context of the application.
I can't imagine the purpose of creating huge lists of such objects, or using them for any reason other than passing parameters. But even if somebody does it - it's not worse than any other kind of object.

In javascript, the GC is configured differently (I ran similar tests to find out). The performance on batches drops sharply much later - after 64K or even later). I speculate that in dart, GC configuration was fine-tuned based on some tests of flutter apps or something.

I don't think we have to put too much emphasis on runtime costs for unions. We pay a reasonable price for a valuable feature.

EDIT: forgot to mention: the overhead of NumberC vs int is 1.6:1 for batch size < N, where N is approx. 10K objects.

@TekExplorer
Copy link

If I clone extension_type_unions and make it all classes, would you prefer that? (Assuming that you'd even consider to use it in the first place). My assumption is that the run-time cost will matter too much.

I would say that such a thing already exists - its called sealed types, and we wouldnt need a package for it.

extension_type_unions is fine as it is, since we dont have union types, and the way to do it now is with Object?/dynamic so its an improvement.

what we need is proper unions, but thats the burden of the language team. the most we can do is this, and maybe some linter assistance, if you intend on going that far.

@eernstg
Copy link
Member

eernstg commented Feb 17, 2025

I created class_unions in order to make it easy to experiment with the choice between an extension type based union type emulation and a class based one. It might even be useful to use the extension type version in production, and then switch to the class based one during a debugging session, if there is some hard-to-find location in a large program that creates an invalid value (class_unions has no invalid values).

@TekExplorer wrote:

I would say that such a thing already exists - its called sealed types

Right, if you can commit to always taking the exact same union of types for each of its operands, and you have the ability to edit each of the operand types, and you can manage to declare them in the same library—then you're better off just writing a small sealed hierarchy with the wannabe union type at the top and each of the operands as immediate subtypes.

sealed class UnionOfABC {}

class A implements UnionOfABC ... {...}
class B implements UnionOfABC ... {...}
class C implements UnionOfABC ... {...}

However, if you want to create a union where one or more operands is an existing type that you can't edit then I don't think it would be very helpful.

You could of course use the sealed hierarchy as a set of wrapper objects holding the desired operand types, but that's pretty much a reimplementation of class_unions.

@eernstg
Copy link
Member

eernstg commented Feb 17, 2025

@tatumizer wrote:

I did some further experiments with the code of NumberC

You could try to play around with new_gen_semi_initial_size and similar settings (dart --help --verbose). However, it remains a very tricky exercise to interpret all the settings and their effects, not to mention the subject which is being studied in the first place.

In the end, it's the application level performance that matters. If developers have a choice like "use extension_type_unions" or "use class_unions" then they can make their own experiments (and they can choose their preferred trade-off between the extension type approach that I still think is cheaper, and the class based approach which is guaranteed to be free of invalid values).

@TekExplorer
Copy link

@TekExplorer wrote:

I would say that such a thing already exists - its called sealed types

Right, if you can commit to always taking the exact same union of types for each of its operands, and you have the ability to edit each of the operand types, and you can manage to declare them in the same library—then you're better off just writing a small sealed hierarchy with the wannabe union type at the top and each of the operands as immediate subtypes.

sealed class UnionOfABC {}

class A implements UnionOfABC ... {...}
class B implements UnionOfABC ... {...}
class C implements UnionOfABC ... {...}

However, if you want to create a union where one or more operands is an existing type that you can't edit then I don't think it would be very helpful.

You could of course use the sealed hierarchy as a set of wrapper objects holding the desired operand types, but that's pretty much a reimplementation of class_unions.

Perhaps, but doing that myself means I can give each class a semantically relevant name, and define interface methods as desired.

This is what we call "tagged unions" and thanks to sealed classes, it's viable.

As a parameter, it's annoying because you can't provide a value directly - which you also can't do with extension types anyway.

The only fix for that is either proper unions, or implicit constructors.

As a return type, an extension type is actually pretty good since it supports the existing interface, so that something such as jsonDecode could cleanly support it, while still allowing is and as checks. Of course, untagged unions are still better here, since it would be exhaustive.

@ghost
Copy link

ghost commented Feb 17, 2025

@eernstg: I think you are overlooking the argument that union types are mostly used for passing parameters, in which case the objects are always short-lived. So by introducing a "cheap" version of union type, you might be targeting a use case that will rarely (or never) materialize. Why everything in dart has to come in 2 colors? :-)

I also have 2 specific questions:

  • when in our benchmarks we test pure int values, they exhibit the same kind of performance degradation as the size of a batch size grows. Which means that even the int values get "boxed". But they are still faster than NumberC. I wonder why. How boxing a single value of type int is different? Maybe there's a special optimization for 1-field classes, which gets applied only to primitive types? What stops the compiler from applying the optimization to ANY single-field class?

  • could you please comment on the feasibility of recursive union types like

sum type Encodable (int | String | List<Encodable>); // strawman

@eernstg
Copy link
Member

eernstg commented Feb 17, 2025

@TekExplorer wrote:

Perhaps, but doing that myself means I can give each class a semantically relevant name, and define interface methods as desired.

Right, and it should also be noted that a sealed hierarchy allows for built-in exhaustiveness analysis. So if it fits into your use case to use a sealed hierarchy then by all means do so!

@tatumizer wrote:

even the int values get "boxed"

As everything that's got to do with performance, it gets more tricky if we try to understand more details. For example, the virtual machine which is running if you do dart main.dart will represent values of type int as "Smi's" ("small integers"), unless they are really big (I think the encoding takes 2 bits away from a pointer, so basically anything which isn't the maximal value will fit). This means that int values are (in practice) always unboxed, because they are simply stored in the location where we'd otherwise store a pointer to a heap allocated object.

On web platforms compiling to JavaScript, the int values are JavaScript numbers, and the performance depends on the JavaScript vm which is running it.

With dart compile exe, int values are (...IIUC, the story is probably a lot longer...) 64 bit two's complement representations which can be inlined in many situations, but must be heap allocated (and boxed) in other situations, e.g., if a variable of type Object or a top type gets an integer value assigned.

About the recursive union types:

There is no problem if we want to have algebraic data types similar to the basic ones in SML:

datatype arith =
    Plus of arith * arith
  | Times of arith * arith
  | Negate of arith
  | Var of string
  | Num of int

This is very similar to the following (using primary constructors in order to remain somewhat concise):

sealed class const Arith;
class const Plus(Arith a, Arith b) implements Arith;
class const Times(Arith a, Arith b) implements Arith;
class const Negate(Arith a) implements Arith;
class const Var(String s) implements Arith;
class const Num(int i) implements Arith;

However, the point is that this kind of types will reify the structure of the type declaration, in the sense that if you have a Plus value then it's going to be something that identifiers the "Plusness" (here: an object whose type is Plus), and some other objects available from there which will be other Arith subtypes.

With an untagged union type we want to specify a type which contains the operand type values directly, not as a value which can only be accessed indirectly through reifications of the entire union type structure.

I'd assume that you want to be able to test at run time whether a given object is Encodable with the following declaration:

sum type Encodable (int | String | List<Encodable>);

// Another notation that we've used.
typedef Encodable = int | String | List<Encodable>;

Consider the situation where we do not reify Encodable, we just treat the declaration of Encodable as a (recursive) specification of other types that we are lumping together.

If we insist that this type includes 42 and 'Hello!' and [42, [[], 'Hello!']] and so on, then e is Encodable will be forced to perform an unbounded amount of work at run time in order to know whether or not this is true.

In particular, if you build an Encodable that includes a large tree (lists inside lists inside lists, with all leaves being int or String values, or an empty list), say, a million objects, then you can't know whether it's actually an Encodable before you have traversed the entire tree.

Note that it is possible that someone changes an int leaf in this tree to a Symbol or some other type which is not included in the Encodable family, which means that you can never know whether a thing is an Encodable even though it was an Encodable when you last looked at it.

If Encodable isn't reified then the next obvious question is "What's the actual type argument of [] in that expression of type Encodable? It would basically have to be Object, because it must be a supertype of int, String, and List<Encodable>. But then the code that "destroys the Encodableness" as mentioned above could very well do it as follows: myList.add(#foo), where myList has static type List<Object>.

All these difficulties are a direct consequence of the fact that we do not want to reify the Encodable type itself.

In JavaScript, types are not reified at run time (roughly, it is still possible to determine that 1 is a number and such), and there is no support for run-time enforcement of statically known types (that is, soundness).

As far as I can see, there's no other way to support recursive type aliases in Dart than to reify them. That's a whole other can of worms. ;-)

@ghost
Copy link

ghost commented Feb 17, 2025

@eernstg: WRT smi, things don't add up. :-)
Either there's a problem with your explanation, or with my head, or both. :-)

Yes, I'm aware of the smi thing. That's why I'm surprised to see the results of the benchmark. And that's exactly why I'm asking.
I'm runnting on VM, no javascript, no compile (in the CMD terminal outside VSCode, running dart test.dart.) And I'm getting the following results:

  Measure `int` 65535:
    1498 ms, result= 99942400
  Measure `int` 32767:
    1323 ms, result= 99975168
  Measure `int` 16383:
    1070 ms, result= 99991552
  Measure `int` 8191:
    254 ms, result= 99999744
  Measure `int` 511:
    287 ms, result= 99999744
void testInt(int mask) {
  var numbers = <int>[];
  print('Measure `int` $mask:');
  for (int i = 0; i < count; ++i) {
    if ((i&mask==0)) numbers.clear();
    numbers.add(i);
  }
  speak("${numbers[0]}");
}

The value of int doesn't exceed 100000000, so it certainly fits in smi. If nothing is boxed, where does the degradation come from?


Later in the day, I conducted a crucial experiment: modified the program so that neither int not NumberC variant fill the list at all: in the loop, they just sum up the values: sum+=func(i) for int vs sum+=anotherFunc(NumberC(i)).value for NumberC, both functions returning the argument after doing some stupid things, but not as obvious for compiler to optimize them away.
And you know what happened? The performance was absolutely the same for int and NumberC! I can't explain that.
I don't attach the code here for the purity of the experiment - please try it yourself!

BTW, the absolute performance was about 5x better for int, and 8 times better for NumberC, compared with the previous test where we used to fill the list, even with N=511. Maybe I did something wrong though - the results are too good to be true 😄

@eernstg
Copy link
Member

eernstg commented Feb 18, 2025

😁

Running something that would be similar to your example (I used count and speak and other declarations from here), I can see that the five invocations of testInt give rise to a similar number of GCs: 1696, 1869, 1928, 1593, 1564 (in total 8650 GCs). The old generation collections are as follows: 749, 739, 567, 0, 0 (total 2055 old GCs). So you're basically measuring that old GC is costly.

The _Smi manipulations do not generate garbage, so it's all about List.add, List.clear, and internal allocations in the list when the fixed size backing array is filled up and a new one twice the size is allocated (except that we can't know whether or not it's implemented exactly like that).

I then added NumberC and ran the same test on both:

Measure `int` 65535:
  4445: 99942400
Measure `int` 32767:
  4724: 99975168
Measure `int` 16383:
  4140: 99991552
Measure `int` 8191:
  721: 99999744
Measure `int` 4095:
  655: 99999744
Measure `NumberC` 65535:
  8440: 99942400
Measure `NumberC` 32767:
  7790: 99975168
Measure `NumberC` 16383:
  6228: 99991552
Measure `NumberC` 8191:
  1200: 99999744
Measure `NumberC` 4095:
  1093: 99999744

As far as I can see, it does make a difference.

The performance was absolutely the same for int and NumberC!
the results are too good to be true 😄

Sounds like the optimizer somehow managed to remove all the code that makes those two cases different. That is probably not going to be true when you scale up the complexity of the program.

Here are the results I get with the following code:

Code
const count = 1_000_000_000;

late Timer timer;

class Timer {
  int start;
  Timer(): start = _now();
  int get time => _now() - start;
  void clear() => start = _now();
  static int _now() => DateTime.now().millisecondsSinceEpoch;
}

void speak(String message) {
  print('  ${timer.time}: $message');
  timer.clear();
}

class NumberC {
  final int value;
  const NumberC(this.value);
  NumberC operator +(NumberC other) => NumberC(value + other.value);
}

// Example from tatumizer -- I guessed the code, wasn't included

void testInt(int mask) {
  int sum = 0;
  print('Measure `int` $mask:');
  for (int i = 0; i < count; ++i) {
    if ((i&mask==0)) sum = 0;
    sum += func(dummy = i);
  }
  speak("$sum");
}

// Added by me

dynamic dummy = 0;

int func(int arg) {
  if (1 > 2) arg = dummy;
  return arg;
}

NumberC anotherFunc(NumberC arg) {
  if (1 > 2) arg = NumberC(dummy);
  return arg;
}

void testNumberC(int mask) {
  int sum = 0;
  print('Measure `NumberC` $mask:');
  for (int i = 0; i < count; ++i) {
    if ((i&mask==0)) sum = 0;
    sum += anotherFunc(dummy = NumberC(i)).value;
  }
  speak("$sum");
}

void main() {
  timer = Timer();
  testInt(0xFFFF);
  testInt(0x7FFF);
  testInt(0x3FFF);
  testInt(0x1FFF);
  testInt(0x0FFF);
  testNumberC(0xFFFF);
  testNumberC(0x7FFF);
  testNumberC(0x3FFF);
  testNumberC(0x1FFF);
  testNumberC(0x0FFF);
}

Results:

Measure `int` 65535:
  2220: 51710662908672
Measure `int` 32767:
  1902: 18943820552960
Measure `int` 16383:
  1629: 2559996721920
Measure `int` 8191:
  1668: 2559996721920
Measure `int` 4095:
  1646: 2559996721920
Measure `NumberC` 65535:
  4231: 51710662908672
Measure `NumberC` 32767:
  4207: 18943820552960
Measure `NumberC` 16383:
  4102: 2559996721920
Measure `NumberC` 8191:
  4120: 2559996721920
Measure `NumberC` 4095:
  4059: 2559996721920

If we eliminate the list allocation work (by using the same backing store all the time and storing directly at the desired index), we get the following results:

Measure `int` 65535:
  183: 99942400
Measure `int` 32767:
  249: 99975168
Measure `int` 16383:
  163: 99991552
Measure `int` 8191:
  163: 99999744
Measure `int` 4095:
  164: 99999744
Measure `NumberC` 65535:
  1389: 99942400
Measure `NumberC` 32767:
  1354: 99975168
Measure `NumberC` 16383:
  525: 99991552
Measure `NumberC` 8191:
  496: 99999744
Measure `NumberC` 4095:
  506: 99999744

Here's the code:

Code
const count = 100000000;

late Timer timer;

class Timer {
  int start;
  Timer(): start = _now();
  int get time => _now() - start;
  void clear() => start = _now();
  static int _now() => DateTime.now().millisecondsSinceEpoch;
}

void speak(String message) {
  print('  ${timer.time}: $message');
  timer.clear();
}

class NumberC {
  final int value;
  const NumberC(this.value);
  NumberC operator +(NumberC other) => NumberC(value + other.value);
}

// Example from tatumizer.

void testInt(int mask) {
  var numbers = List<int>.filled(mask + 1, 0);
  var index = 0;
  print('Measure `int` $mask:');
  for (int i = 0; i < count; ++i) {
    if (i & mask == 0) index = 0;
    numbers[index] = i;
    ++index;
  }
  speak("${numbers[0]}");
}

// Added by me

void testNumberC(int mask) {
  var numbers = List<NumberC>.filled(mask + 1, const NumberC(0));
  print('Measure `NumberC` $mask:');
  var index = 0;
  for (int i = 0; i < count; ++i) {
    if (i & mask == 0) index = 0;
    numbers[index] = NumberC(i);
    ++index;
  }
  speak("${numbers[0].value}");
}

void main() {
  timer = Timer();
  testInt(0xFFFF);
  testInt(0x7FFF);
  testInt(0x3FFF);
  testInt(0x1FFF);
  testInt(0x0FFF);
  testNumberC(0xFFFF);
  testNumberC(0x7FFF);
  testNumberC(0x3FFF);
  testNumberC(0x1FFF);
  testNumberC(0x0FFF);
}

This again illustrates that NumberC allocation does make a difference compared with non-allocating computations (using _Smis), and old GC is more expensive than young GC.

[NB: This is still somewhat relevant to the topic 'extension types' because they are treated like the representation object at run time, and that could be an int. Note that if the representation object is a big heap allocated thing then the relative difference will get correspondingly smaller, but the absolute cost of having the extra wrapper object would presumably remain about the same.]

@ghost
Copy link

ghost commented Feb 18, 2025

Yeah, I missed the trick with dynamic.
But the difference you measured is minuscule. For the case where objects remain in the young gen (most likely scenario), the difference is about 3 nanoseconds per iteration - nothing to worry about.

I'm thinking of a different test though. When we pass a union as a parameter, the function has to implement a switch over all member types. So any test should incorporate the whole thing, with the switch. I'm going to figure out how to simulate int vs NumberC in this scenario. Will report my findings here.


At least one effect got demistified. I changed a growable list to a fixed list of size (mask+1), and the data now agrees with common sense. The culprit behind degrading performance on int was not GC - most likely, it was cache. The cache lines of all L1, L2, L3 caches are all of the same size (64 bytes). As we grow a list, we have more L1 cache misses, then L2 cache misses, etc. (they become progressively more expensive). That's my theory. As usual, I can be wrong.


On every realistic scenario I've tried, the compiler optimizes away NumberC while calling a function f(NumberC c). I tried also to simulate the case where NumberC is defined as a wrapper for an Object, with a switch over 3 variants (int, double, String) - the compiler still optimizes it away. It means that we have the most common case covered for free.
We can introduce the syntax typedef C=int | double | String; and allow the variables to have this type.
With implicit constructors (which can be automatically defined for such a class), this can work fine.
Just disallow the type arguments of union types (no List<IntOrString>) - you can say the feature will come later, and see how many people complain. The feature would still be quite useful IMO.

@eernstg
Copy link
Member

eernstg commented Feb 19, 2025

But the difference you measured is minuscule.

Consider the experiment that uses a pre-allocated list, such that we're focusing more precisely on the cost of object allocation and creation (NumberC only), plus the cost of performing int computations vs. NumberC computations. It is about 163ms for int and a bit more than 500ms on average for NumberC. That's 3x.

The relative cost is relevant if we're considering all kinds of representation objects: int is particularly well optimized, so we get a speedup of about 3x. If the representation object is some other object (perhaps even a large one) then the relative difference in GC pressure (that is: the amount of memory allocated and managed) will of course be smaller, even though the wrapper object may have exactly the same size as NumberC.

In short, if you recommend that developers stop using int and start using something like NumberC (with the full interface such that everything can still be done as usual, and renamed to int such that you just need to import the wrapper class, and assuming implicit constructors such that integer literals continue to work), I do think those developers will tell you that the performance difference matters, too. ;-)

On every realistic scenario I've tried, the compiler optimizes away NumberC while calling a function f(NumberC c).

I'd suspect that those scenarios are exactly the opposite: They are unrealistic, in the sense that this optimization will not be possible in an actual application, with the increase in size and complexity.

We can introduce the syntax typedef C=int | double | String; and allow the variables to have this type.
With implicit constructors (which can be automatically defined for such a class), this can work fine.

Built-in, nominal, non-recursive union types with reification is indeed a possible way ahead for union types in Dart. It's a huge effort (like everything that introduces a new kind of types), and controversial (e.g., because it's likely to make type inference a lot slower), but I think it is possible. That's for #83.

@ghost
Copy link

ghost commented Feb 19, 2025

The test of int vs NumberC is a bit contrived: we are comparing the weight of infusoria with the weight of infusoria-in-a-box - it's not completely unexpected that the box weighs more than its contents.
But let's recall that we are discussing union types. Union type is unlikely to contain only infusorias: the members could be any animals, some of them real behemots. There will always be the overhead of creating these objects themselves before wrapping them in an extra box.
In an attempt to get an idea of relative costs, I used a String object with and without an extra box (using your program as a template).
In one case, I populated the list with Strings like '$i', in another - StringInABox('$i').
The results:

Measure String 65535:
1512: 99942400
Measure StringInABox 65535:
1845: 99942400
Measure String 32767:
1467: 99975168
Measure StringInABox 32767:
1830: 99975168
Measure String 16383:
1330: 99991552
Measure StringInABox 16383:
1617: 99991552
Measure String 8191:
1325: 99999744
Measure StringInABox 8191:
1615: 99999744
Measure String 511:
1305: 99999744
Measure StringInABox 511:
1552: 99999744

The difference is just ~15%. There's no optimization here - these are realistic figures.

Code ```dart void testString(int mask) { var strings = List.filled(mask + 1, ''); var index = 0; print('Measure `String` $mask:'); for (int i = 0; i < count; ++i) { if (i & mask == 0) index = 0; strings[index] = '$i'; ++index; } speak(strings[0]); } class StringInABox { final String value; const StringInABox(this.value); } void testStringInABox(int mask) { var boxedStrings = List.filled(mask + 1, const StringInABox('')); print('Measure `StringInABox` $mask:'); var index = 0; for (int i = 0; i < count; ++i) { if (i & mask == 0) index = 0; boxedStrings[index] = StringInABox('$i'); ++index; } speak(boxedStrings[0].value); } ```

My point was not to question the fact that the boxes affect performance. I was just trying to determine how much they affect performance in the case of reified union types.

If we agree that 15% is not too much (for bigger behemoths the overhead in % terms will be much lower), then the next question is:
what happens if we introduce reified union types, but don't allow union types as type arguments? Will this be a major scandal, or a small annoyance, or nothing to worry about?

I remember that union types were requested as a way of interfacing with Javascript APIs, which often come with parameters of essentially union types. This goal can still be achieved with our stripped-down union types. It's not necessarily "all or nothing at all". For example, dart currently supports a very limited subset of constant expressions. But even this limited subset is useful!


My theory from the previous message was that the performance degradation we see on testInt while increasing the mask (and with it, the size of the list) is attributable to cache misses. I put this theory to test. In one run, I used NumberC as it is. In another test, I added 3 int fields to NumberC object (all final ints initialized to 0). If the theory is correct, then the timing should be worse at least by a factor of 2. And that's indeed what I observed. (I checked the default configuration of GC - the new gen is much bigger than whatever we use on mask=65535, so the GC is not a culprit).

It tells us that it the expected slowdown factor while creating a boxed object can be estimated roughly as (S+B+1)/(S+1), where

  • S - the size of original object in 64-bit words
  • B - the size of the boxed object, which is independent of S, because boxed object contains only the standard object header (1 word) plus a single reference. 2 words in total.
  • 1 is added because when we store the reference in the list, we have to store it regardless,

For the testInt experiment, we have to be more pedantic in our calculations because int is the only kind of "object" which is stored in unboxed format in the list. It occupies exactly 1 word. But NumberC takes 2 words, plus we still have to store a reference to it in the list. In total, slowdown factor is 3/1, which is exactly what we see. (*)

Even if GC comes into play at bigger sizes, this ratio will remain more or less the same - because the size of new gen is configured in terms of MB, not the total number of objects).

(*) these estimates are valid only in the scenario when the list is big enough to hit the limit of L1 cache, which is 64KB.
For the lists of smaller sizes, the cost of the allocation/deallocation becomes the dominant factor. After hitting cache limit, this factor is dwarfed by the costs of cache misses.

@ghost
Copy link

ghost commented Feb 21, 2025

I'm thinking along the following lines: what will happen if we support union types like

typedef MyUnion=int | String;
// or, better yet
union type MyUnion = int | String;

such that MyUnion is not an Object?
We also prohibit subclassing of union types.
The idea is to support a Type object for MyUnion (so that List<MyUnion> is a valid type), but not create any wrappers. (Union type therefore becomes a ghost type for the compiler),
The reason for the wrappers to become unnecessary is that the compiler's view on any object of type MyUnion never diverges from the runtime's view: indeed, the type MyUnion cannot be anything other than MyUnion - neither for the compiler, nor for the runtime. It becomes impossible to cast anything into MyUnion, or cast MyUnion to anything else. In memory, the object of MyUnion type occupies exactly one word (no matter what is "inside").
To get access to the contents of union type, we pretend there's a getter called "value".

MyUnion u1 = 1; // OK
MyUnion u2 = 'abc'; // OK
Object o = whatever;
o is MyUnion; // warning : expression is always false
o as MyUnion; // ERROR impossible cast: o cannot have type MyUnion
List<MyUnion> list = [1, 'abc']; // OK
list as List<Object>; // ERROR impossible cast: 
u1 is int; // ERROR: impossible cast
u1.value is int; // OK
print(u1.value);

Do you know any counterexamples?

(I can see one problem with this: if somebody asks "What is the static type of u1.value?", what is the answer?
We have to either introduce the type "unknown", or conjure up some generated "$$$MyUnion" or something)

@TekExplorer
Copy link

TekExplorer commented Feb 21, 2025

That seems unnecessary. Why would we need another non-Object?

What you're suggesting is just a pointer, which provides no real benefit. Especially when the whole point of the union is to be able to treat it as the type we expect.

Your suggestion would also prevent adoption by interfaces that need it, like jsonDecode's return type, and the websocket listener type (String | List)

Even internal stuff like in _Future's listener/value

Let alone allowing for accepting multiple types, making stuff like addString and addBytes reduce to add(String | List<int>)

Not to mention that making it not an object for some reason would prevent it from being used in most places.

@ghost
Copy link

ghost commented Feb 21, 2025

Why would we need another non-Object?

This is necessary if we want to implement recursive types like Encodable.
To illustrate, suppose union types implement Object. Then consider

Object map=<int, String>{};
map is Encodable; // how do you implement that?

To be able to answer the question whether a given object is Encodable, the runtime has to have enough information passed in Type objects. With the current structure of Type objects, there's no way to do that. So, to implement this feature alone, dart would have to come up with a different, much more complicated, design of Type objects1, which will hurt the performance of everything.
(Not to mention that the compiler probably needs to implement an additional pass to form these Type objects)
Now, consider the alternative:

f(Encodable e) {};
var map = <int, String>{}; // static type <int, String>
f(map); // OK - because compiler can statically verify that Map<int, String> is encodable
f(<Foo, int>{}); // STATIC error

Footnotes

  1. otherwise, how would the runtime know that Map<int, String> is Encodable, and Map<int, Foo> is not?

@eernstg
Copy link
Member

eernstg commented Feb 24, 2025

@tatumizer wrote:

what happens if we introduce reified union types, but don't allow union types as type arguments? Will this be a major scandal, or a small annoyance, or nothing to worry about?

First, this is the 'extension type' issue and it does make sense to talk about specific usages (including extension_type_unions). However, union types as a language construct should be discussed in #83.

That said—it could still be considered relevant to the overall topic of extension types to discuss reified vs. non-reified types, and their inherent capabilities and limitations.

what will happen if we support union types like

typedef MyUnion=int | String;
// or, better yet
union type MyUnion = int | String;

such that MyUnion is not an Object?

(or was that "not an Object?"? ;-)

The most crucial question here is whether or not MyUnion will be reified. The answer seems to be implied here:

The idea is to support a Type object for MyUnion (so that List<MyUnion> is a valid type), but not create any wrappers. (Union type therefore becomes a ghost type for the compiler),

If the union type becomes a ghost type for the compiler, my interpretation would be that it is not reified (at least, that's what I meant when I said 'ghost type'). That is, MyUnion has a run-time representation which is some other type (perhaps Object, Object?, or the standard upper bound of all the operands), and the only constraints that go beyond the regular constraints on the representation of MyUnion at run time would be compile-time errors.

I think this implies that the type declarations named MyUnion above would be very nearly the same thing as extension_type_unions. The main differences would be that (1) because this is a language feature and not just a library, it would be possible to use structural rules to determine subtype relationships among union types (e.g., with typedef AnotherUnion = String | int;, AnotherUnion and MyUnion could be subtypes of each other). I'm not sure I'd recommend that, though, because it may be rather costly in terms of the performance of type inference. (2), we could make operand types assignable to the union type (that is, we could avoid the getters like u21 which are needed with extension_type_unions—as long as we don't have implicit constructors, at least).

In any case, List<MyUnion> can be a valid type with or without reification: If E is an extension type then List<E> is a valid type whose run-time representation is List<R>, where R is the instantiated representation type corresponding to E. Any approach to union types without reification would presumably do something similar.

This kind of type is not useless, but E should either (1) be an appropriate type for every instance of type R (standard example: FancyString - it's OK for every string to be treated as a FancyString), or (2) the code which is executed where E is passed as a type argument should be parametric (that is, it shouldn't try to use run-time type queries to "discover" the actual value of that type parameter).

An important example is that the system provided collection types (Iterable, List, Set, Map) are using their type parameter(s) in a parametric manner. Another example would be any generic function that doesn't use run-time type queries on the corresponding type parameter:

extension type Name(String _it) {}
extension on String { Name get toName => Name(this); }

List<X> uniquify<X>(List<X> list) {
  final set = list.toSet();
  return set.toList();
}

void main() {
  final names = [
    'Phileas Misst'.toName,
    'Sherlock Isletes'.toName,
    'Phileas Misst'.toName
  ];
  final uniqueNames = uniquify(names);
  List<Name> _ = uniqueNames; // OK.
}

The point is that even without reification, the invocation of uniquify is able to preserve the static information that those strings have the type Name, not just String.

I proposed in #3620 that we should support parametric type parameters (that is, the compiler should check that the run-time value of the actual type argument is actually never used), but even without such a mechanism it may be useful to be aware of the fact that non-reified types can safely and usefully be used as actual type arguments.

Agreeing with @TekExplorer on ..

Why would we need another non-Object?

.. I don't immediately understand how it would help. We already have that kind of type, by the way: Every extension type that doesn't declare that it implements T where T is Object or some subtype of Object will actually be a subtype of Object?, and not a subtype of Object.

If we're talking about "another non-Object?" it's a different matter. We don't have any types that aren't subtypes of Object?.

The language team has discussed that kind of type a number of times. They could be used to disable a lot of OO features for specific types, which could then be very low-level. In particular, we could support bit fields as parts of bit vectors (perhaps int/double could be split into smaller bit vectors, or the contents of a Uint8List), or it could be used to access foreign language data directly. The point is that we can't have a "normal pointer" to 3 bits in the middle of a byte in memory, so we need to make sure that this kind of type is always directly represented in programs (in particular, they're never the value of a type variable).

If that's the point ("they are never the value of a type variable, or a subterm in a composite type which is the value of a type variable") then we could just make it an error for such types to occur in those ways. Another way to say this is that union types are not first class types. I suspect that we won't get anything else out of introducing a non-Object? type just for union types.

However, I do think somebody is going to say "but I want to have a List<MyUnion>!", and that wouldn't be possible if we do anything that prevents the union types from being first class types.

@tatumizer wrote (about non-Object or non-Object? types):

This is necessary if we want to implement recursive types like Encodable.

I already argued here that there's no way we can have recursive union types without reification.

If we have a declaration like this:

typedef Encodable = int | String | List<Encodable>;

where Encodable has its own run-time representation different from all other types(that is, Encodable is reified) then we can rather easily test whether a given object is an Encodable:

If it's an int then the answer is yes. If it's a String then the answer is yes. If it is a List<T> such that T is Encodable or a subtype thereof then the answer is yes (we'd probably want to say that a List<Never> is a List<Encodable>). Otherwise the answer is no.

To be able to answer the question whether a given object is Encodable, the runtime has to have enough information passed in Type objects.

Right, this is precisely the difference between the non-reified and the reified version of Encodable.

I don't think there is anything particularly complex about the run-time representation of the reified Encodable. We would need to use code to determine whether any given object has that type (such that we can do o is X in a situation where X is Encodable, or some composite type containing Encodable like Map<String, Encodable>). But that's no worse than function types.

Now, consider the alternative:

class Foo {} // Just a class, unrelated to `Encodable`.

f(Encodable e) {}

void main() {
  var map = <int, String>{}; // Static type `Map<int, String>`.
  f(map); // OK - because compiler can statically verify that Map<int, String> is encodable.
  f(<Foo, int>{}); // Compile-time error.
}

I adjusted the code in a couple of locations. I couldn't find a declaration of Encodable that would make Map<int, String> an Encodable, but I'm assuming that we have something like this:

typedef Encodable = int | String | List<Encodable> | Map<Encodable, Encodable>;

We might have subtype relationships like int <: Encodable and String <: Encodable (that's a lot harder than just allowing for assignability). If we do have these subtype relationships then Map<int, String> <: Map<Encodable, Encodable> <: Encodable. So we will indeed get 'OK' and 'compile-time error' as mentioned in the code snippet above (this would be true for a reified as well as a non-reified version of the union type, because the type Encodable always occurs explicitly in this example, not as the value of a type variable).

@ghost
Copy link

ghost commented Feb 24, 2025

I admit I made a mistake when I mentioned the type object for Encodable - indeed, if union types don't extend Object, then this assumption was unnecessary.

Here's why I was thinking about union types that don't implement Object.

I assumed that while evaluating the predicate obj is Encodable, the runtime doesn't want to run any complex algorithms, especially the recursive ones 1. Such was my impression. With the recursive algorithm running in runtime, filling the List with different variants of Encodable may become very expensive. But again, I thought that the runtime didn't want to do that out of principle. Was I wrong?

If the union type is not an Object, then the predicate like obj is Encodable can always be evaluated in compile time: if obj has a static type conforming to Encodable, the predicate is true, otherwise it's false. That's the idea.
But now I see - the idea might indeed be untenable for other reasons (unless we introduce more restrictions).

If we allow a union to be an Object (that is, a union becomes a fully reified type), the compiler will often be able to eliminate redundant checks based on static information. This would cover most of the real-life scenarios. I overlooked this argument, too.

Footnotes

  1. I thought the runtime check must be always feasible in O(1), or at worst O(N) where N is the number of interfaces implemented by an object (this could become O(1) via a hashtable lookup).

@eernstg
Copy link
Member

eernstg commented Feb 25, 2025

I think it would be useful to make a couple of things explicit:

If a union type like Encodable is reified as a type argument then an expression like Encodable == Object? will surely be false; if it is not reified as a type argument then Encodable == Object? or Encodable == Object may very well be true. The point is that when it is reified, it can be recognized as Encodable at run time; when it is not reified it will have a representation at run time which is some other existing type (say, Object?), and then we can't decide at run time whether it was specified as Encodable in source code, or it was specified as Object?.

In any case, I would not expect a union type to be reified in any way at run time in an instance of any of the non-recursive operand types:

typedef Encodable = int | String | List<Encodable> | Map<Encodable, Encodable>;
Encodable e = "Hello!";

The value of e is just an instance of String (or some secret platform subtype of String), and this string has no information whatsoever about the fact that it's being viewed as an Encodable when it is obtained as the value of e. Nada, zilch.

On the other hand, an <Encodable>[] contains a representation of the actual type argument Encodable, which means that we can directly (in time O(1)) determine that this is a value of type Encodable. It follows (by soundness) that it only contains other objects whose type is also Encodable.

In other words, when a union type (recursive or not) is reified as a type argument, it is possible to determine in time O(1) whether any given object does or does not have that type.

The need to traverse an arbitrarily large data structure in order to be able to say "yes" or "no" only arises when the union type is not reified as a type argument.

I don't see any connection to the subtype relationship: Encodable <: Object may be true or false both when Encodable is reified as a type argument and when it's not. (But, surely, it wouldn't work to have Encodable <: S where S is any type which isn't a supertype of every operand; for instance, we can't have Encodable <: int because a value of type Encodable might be a String).

An entirely separate question is whether a value of type Encodable would be a value whose type is one of the operand types, or it would be a wrapper object that contains an object whose type is one of the operand types. I don't think it makes sense to consider an approach with wrapper objects if we're talking about union types as a language feature. For example, it wouldn't be possible to have subtype relationships from the operand type to the union type (that is, we'd need some kind of injection, like u21 and u22 in extension_type_unions)

Y upcast<X extends Y, Y>(X x) => x;

void main() {
  Encodable e = upcast<String, Encodable>("Hello!");
}

The upcast occurs in the body of upcast when x is returned, but there's no way we can expect the runtime to "know" that it must create a wrapper object at that time.

Wrapper objects are relevant to some emulations of union types, but they wouldn't be part of a real language mechanism.

If the union type is not an Object, then the predicate like obj is Encodable can always be evaluated in compile time: if obj has a static type conforming to Encodable, the predicate is true, otherwise it's false. That's the idea.

That wouldn't suffice:

void main() {
  dynamic d = someCondition ? "Hello!" : #aSymbol;
  if (d is Encodable) { // Is true if and only if `d` is the string.
    ...
  }
}

We can't know at compile time whether d will or will not be the string (assuming that the value of someCondition isn't known at compile time), so we can't evaluate the expression d is Encodable at compile time.

However, we can compile the expression to a bigger expression which will do the right thing:

void main() {
  dynamic d = someCondition ? "Hello!" : #aSymbol;
  if (d is int || d is String || d is List<Encodable> || d is Map<Encodable, Encodable>) {
    ...
  }
}

If Encodable is reified as a type argument then we can also perform the same kind of test when Encodable is the value of a type variable at run time. (Again: this is no worse than handling a function type in a similar situation.)

If we allow a union to be an Object (that is, a union becomes a fully reified type) ...

This is where I get a little bit confused—does 'allow a union to be an Object' mean that Encodable is a subtype of Object? As I mentioned, this can be true for a version of union types which are not reified as a type argument, and also for a version which is reified as a type argument. So I don't really see a connection between being reified as a type argument and being a subtype of Object (or any other particular type).

I don't think any approach that includes a wrapper object will be an acceptable foundation for union types as a language feature, but the phrase 'a union becomes a fully reified type' does sound a bit like there is a wrapper object?

The wrapper object can be part of an emulation of union types (in particular, that's true for class_unions), but this will definitely prevent subtype relationships (at compile time and at run time) from the operand types to the emulated union type. This again implies that every value of that emulated union type must be "injected" into the union type by means of some sort of computation (e.g., "Hello!".u21 will inject "Hello!" into the union type Union2<String, ...>, which is an instance of the class Union2 in class_unions).

@ghost
Copy link

ghost commented Feb 25, 2025

The "connection" between Encodable and Object I had in my head was an illusion - I failed to find the counterexamples like the one with dynamic. It wasn't a good idea regardless.

For the reified variant, one option I had in mind is the following.

The compiler can see that Encodable is a union type (doesn't matter if it's recursive or not in this context)
For such a type, the compiler implements an extra pass: for every Encodable type (including parameterized ones) encountered in the program, adds an item to the list of implemented interfaces to a corresponding type object. That's how the object of type String becomes Encodable both statically and in runtime . And int, too. And Map<String, int>. But the runtime type representing Map<RandomFoo, int> won't have an Encodable in its list of implemented interfaces. Now 1) the compiler and the runtime are always in agreement 2) the predicate obj is Encodable is resolved in O(1) in runtime.

Examples:

5 as Encodable; // compiler says: unnecessary cast
Object five = 5;
five as Encodable; // runtime cast successful
var list=<Encodable>[1, 'A'];
list.add(1); // OK
list.add('a'); // OK
list.add(RandomFoo()); // static error: type mismatch
List<Object> list1 = <Encodable>[1, 'A'];
list1.add(5); // OK in comptime, OK in runtime
list1.add(RandomFoo()); // accepted by compiler, but fails in runtime

Can you find any contradictions in this model?

If not, there's another observation: unless the program loosens the type of Encodable to Object, all the checks are implemented in comp. time and have zero runtime cost 1. And, without the wrapper, the whole thing has zero cost also in terms of memory.

Footnotes

  1. assuming there exists a mechanism for the compiler to tell the runtime: "don't check this value - I've already checked it". It can be some boolean flag or something. Otherwise the standard runtime costs apply.

@eernstg
Copy link
Member

eernstg commented Feb 27, 2025

Can you find any contradictions in this model?

I'm considering a union type mechanism which would be reified as a type argument, using no wrapper objects, having only the subtype relationships with its operands (so StringOrInt x = 'Hello'; would be OK because String is a subtype of StringOrInt), plus any explicitly declared superinterfaces (which could be other union types or regular non-union types), and allowing recursion.

This seems to be compatible with your assumptions. I think the 'extra pass' in the compiler would not be needed: If we can determine whether a subtype relationship exists then there's nothing that forces this query to rely on data in the subtype (so String doesn't have to "know" that it is a subtype of StringOrInt, as long as we can test whether String is a subtype of StringOrInt and get true).

// We can say `implements Object` because every operand type is a subtype.
union type Encodable implements Object =
    int | String | List<Encodable> | Map<Encodable, Encodable>;

void main() {
  // `unnecessary_cast` reported, no error, yields `true`.
  5 as Encodable;

  Object five = 5;

  // Does not throw.
  five as Encodable;

  // OK: `int <: Encodable` and `String <: Encodable`.
  var list = <Encodable>[1, 'A'];
  list.add(1); // OK.
  list.add('a'); // OK.
  list.add(RandomFoo()); // Compile-time error, not assignable to `Encodable`.

  // OK, `Encodable <: Object` based on explicit `implements` clause.
  List<Object> list1 = <Encodable>[1, 'A'];
  list1.add(5); // No error at compile time, does not throw at run time.
  list1.add(RandomFoo()); // No error at compile time, throws at run time.
}

Yes, going through the cases I do get the exact same outcome. So the models we're considering are at least compatible.

all the checks are implemented in comp. time and have zero runtime cost

That wouldn't be true with the model I'm considering: For instance, five as Encodable must perform work which is equivalent to the following:

five is int 
    ? five
    : five is String
        ? five
        : five is List<Encodable>
            ? five
            : five is Map<Encodable, Encodable>
                ? five
                : throw new TypeError(...)

In the situation where Encodable is statically resolved we can simply generate code like this. However, when Encodable is the value of a type parameter (e.g., when we're adding an element to a List<Encodable>, it is necessary to have runtime support for querying whether the actual argument of add is a subtype of the actual type argument which was given to this list when it was created (which was Encodable). That's where we need a thunk which is passed along with an actual type argument, in a way that can presumably be very similar to the approach which is currently used with function types.

So we can't expect to settle all questions about type queries involving union types at run time, or even to generate code that simply unfolds every given union type to its operand types. But we can perform the tasks that are needed for those queries at run time.

@ghost
Copy link

ghost commented Feb 27, 2025

I'm considering a union type mechanism which would be reified as a type argument, using no wrapper objects

The same is true with my model. I see you have an issue with the "extra pass", right? It's in fact a "micro-pass", it will be very fast.
Another point of contention might be: retrofitting the String's RTO with an extra "implements Encodable" (that is, adding a reference to Encodable to the list of interfaces implemented by a String). That's what you are referring to when saying "String KNOWS that it implements Encodable". But it's not the only way to implement the feature. Let's say: String doesn't know that. but someone knows it in runtime, and when we invoke myObjectThatIsAStringDowncastToObject is Encodable, this someone handles it. But my model doesn't include the infinite switch!
This "someone" can compute `myObject is Encodable" by a hashtable lookup, in O(1). But this hashtable is precomputed 1. That's why we need an extra "micro-pass": to populate said hashmap statically. Please think about it. I feel that there's some misunderstanding either on your part or on mine, or maybe both, but I don't know how to pinpoint it.


Suppose the problem of detecting whether type is Encodable is solvable - maybe we have an oracle or something.
In your earlier posts, you mentioned "a can of worms" associated with union types. Can you give an example of one such worm?

Footnotes

  1. the hashtable in question is rather a hashset: it contains the runtime types that conform to Encodable. E.g. it contains String, List<List<int>> etc. We can fully precompute this table because we know statically that List<List<int> is indeed used by a program somewhere, but, say, List<List<List<anything> is not. So the size of the table is finite, and in fact rather small.
    Maybe you have a counterargument that shows that precomputing such a hashset is impossible? Even then, it's not the end of the story - we can discuss it further: the existence of specially constructed edge cases never occurring in practice is not a show-stopper.

@eernstg
Copy link
Member

eernstg commented Feb 28, 2025

retrofitting the String's RTO with an extra "implements Encodable"

The point is that this is not necessary. Consider two function types likes int Function(int?) and num Function(int); the representation of the former does not contain anything like a reference to the latter saying "I'm a subtype of that guy". Instead, each of the function types just have a representation that makes it possible to take them apart at run time. So we can determine that the first one has return type int and the latter has return type num, and similarly for each parameter type, and for the structure of the formal parameter list (that is, the name of a named parameter, required or not for a named parameter, and being optional or not for positional parameters). This allows the runtime to compute the subtype relationship (namely: int Function(int?) <: num Function(int)) when the question is asked, based on their parts and their structure.

The same thing would be true for union types: Each union type has a representation that allows us to find the list of operand types at run time. We may then use the operand types to determine whether or not there is a subtype relationship with any other type. So it's simply not necessary (and not even a good idea) to try to represent the subtype relationships directly in the run-time representation of structural types (such as union types or function types).

But my model doesn't include the infinite switch!

The crucial distinction here is "being reified as a type argument". If we have that property then the run-time subtype and instance-of queries will have complexity O(1) in the size of the structure of the type, if we don't have it then there's no limit on the amount of work we may need to perform during an instance-of query (and the subtype query will probably be equivalent to graph isomorphism on the structure of the types, which is also expensive). But that doesn't even matter much, because it won't work in the first place (as I mentioned here: immediately after we've traversed an object structure consisting of a million objects and confirmed that it satisfies the constraints associated with a specific type, the requirements could already be violated.

In other words, being reified as a type argument is an absolute requirement. If we have that then the subtype and instance-of queries will trivially be O(1). If we don't have it then nothing works anyway, and it may take an unbounded amount of time to do that. ;-)

@ghost
Copy link

ghost commented Feb 28, 2025

Your example of Functions is not convincing. I understand that for functions, it's an expensive operation. This doesn't mean it's OK to make the cast an expensive operation for any other type.

In other words, being reified as a type argument is an absolute requirement. If we have that then the subtype and instance-of queries will trivially be O(1).

That's what I don't quite understand.
Consider 2 examples:

List<String> list=['a', 'b'];
list is Encodable; // unnecessary cast!

Here, the predicate list is Encodable must have a runtime cost of exactly 0 (not even O(1))
vs

List<Object> list=['a', 'b'];
list is Encodable; // runtime cast

How are you going to have a runtime cost of O(1) here, without precomputing the set of Encodable types in comptime and passing it to the runtime? If that information is not encoded in a kind of hashset, and not encoded in the RTO of List<String>, then how would the cost become O(1)? (And not just O(1), but trivially so!)

@eernstg
Copy link
Member

eernstg commented Feb 28, 2025

I understand that for functions, it's an expensive operation.

I don't think it's particularly expensive (we need to perform 2 subtype checks in order to determine whether or not R1 Function(S1) is a subtype of R2 Function(S2), and then a bit of work in order to see that the two types have the same shape _ Function(_)). Not a big deal, and O(size_of_the_type_structure). If you're comparing two function types where each of them take 10.000 parameters and everything succeeds except the last parameter then it's more work, but types usually aren't particularly big so we just don't worry about that.

List<String> list=['a', 'b'];
list is Encodable; // unnecessary cast!

True, that's an unnecessary cast, but it will still be performed at run time whenever your program reaches that point in the code. The lint just says that this is wasteful, that doesn't mean we won't do it.

So it's still O(1) work (where "1" is considered to represent the number of operand types in the union type Encodable). The amount of work is small, and fixed at compile time (it doesn't depend on any computations during the execution of the program).

List<Object> list=['a', 'b'];
list is Encodable; // runtime cast

This will give rise to a small amount of work at run time (namely: one subtype check per operand type, and we have 4 of those). The result is false, because Object isn't a subtype of Encodable, so List<Object> isn't a subtype of List<Encodable> (and it is also not a subtype of int or String or Map<Encodable, Encodable>).

I think it's fair to say that 4 subtype tests is O(1) even though it is strictly O(n) where n is the number of operands in the union type. But for any given program we can scan it and find all union types and choose a constant k such that no union type has more than k operands (I'd be surprised if k is bigger than 9), and then the work needed to decide T <: MyUnionType will be at most k subtype queries, which is definitely O(1) according to the standard definition of the O notation.

not encoded in the RTO of List<String>

I think it's highly unrealistic that the subtype relationship from List<String> would ever encode such a thing as "this type is a subtype of Encodable". It's not a problem to check it if the query is posed at run time, we just need to have a subtype checking routine associated with Encodable which is able to check whether any given type is or is not a subtype. In this case it would reach the case List<Encodable>, and then it would call itself recursively to ask whether String is a subtype of Encodable, and that's a base case which is computed to true, which is then also the result of the original query.

@ghost
Copy link

ghost commented Feb 28, 2025

Oh, I made a mistake in my example, I meant

List<Object> list=<String>['a', 'b']; // born with the type List<String>
list is Encodable; // runtime cast, returns true

Now it should return true, right?

You remember, we started the discussion with the question: can we express union types such that they have zero extra cost?
The trick with the hashset was supposed to reduce the extra cost to an actual zero.
With your treatment, it's not exactly zero. Not a high cost, nothing special, but still...
From a different perspective, the model has zero extra cost in terms of memory, so it least we don't incur an extra penalty there.

Are you satisfied with your model? Do you expect any "worms" that are still unaccounted for?

BTW, I'm surprised there's no mechanism for the compiler to pass a flag to the runtime: I've checked this value statically, I guarantee it conforms to the runtime type, so don't insert runtime checks.

@eernstg
Copy link
Member

eernstg commented Mar 1, 2025

Now it should return true, right?

Indeed!

The trick with the hashset was supposed to reduce the extra cost to an actual zero.

Oh, but the run-time representation certainly needs to somehow encode each of the operand types such that it is possible for the runtime to iterate over them when it is testing whether there is a subtype relation to some other type. The work is still proportional to the number of operand types, except that in the case where an operand type is recursive (like union type Encodable ... = ... List<Encodable> ...;), there will be an extra invocation of the same algorithm on the type arguments, but that's still bounded by the number of recursive occurrences of the type (e.g., how many times the word Encodable occurs in the body of the declaration of Encodable). I really don't think that's anything to be worried about, it's essentially constant time.

Are you satisfied with your model?

I'd like to have a more firm opinion about how to specify supertypes of a union type. I think it's really, really useful for performance reasons (during inference, in particular) to say that there are no other direct supertypes than the ones that the union type declaration specifies explicitly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature Proposed language feature that solves one or more problems
Projects
Status: Done
Development

No branches or pull requests