Skip to content

FluentBundle.formatPattern #380

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 28 commits into from
Jul 17, 2019

Conversation

stasm
Copy link
Contributor

@stasm stasm commented Jun 26, 2019

Rather than hide the runtime representation of messages, let's document it and commit to only changing it in major version releases.

  • FluentResource now derives from Array rather than a Set. I've noticed a perf win related to this change.

  • Messages and terms are now always stored as a {value, attributes:{}} object. The value maybe null or may be a pattern (more on that below). attributes is an Object.create(null) whose keys are attribute names and values are patterns.

  • Patterns represented either as primitive strings (when they don't have any placeable) or as arrays of primitive strings or expressions.

  • Expressions are stored as {type, ...} objects. The exact fields depend on the expression.

  • The mindset of formatting shifts from formatting a message or formatting an attribute to simply: formatting a pattern. To make it explicit, I renamed FluentBundle.format to .formatPattern.

  • To format a message with attributes, something like the following should be used:

    let message = bundle.getMessage(id);
    if (message.value) {
        bundle.formatPattern(message.value, args, errors);
    }
    for (let [name, value] of Object.entries(message.attributes)) {
        bundle.formatPattern(value, args, errors);
    }

    This pattern supports all possible methods of iterating over the keys of an object, as well as formatting only a set of known attributes instead of all of them.

  • I still need to figure out how to document or even formally describe the now-public data representation of messages. Perhaps this could be left as a follow up and taken care of together with or after Migrate to TypeScript? #376, if we decide to do it.

@stasm
Copy link
Contributor Author

stasm commented Jun 26, 2019

This PR could be viewed as a fix to #358.

@stasm
Copy link
Contributor Author

stasm commented Jun 26, 2019

The performance looks good, with a significant win in create-runtime related to FluentResource now extending Array. Here's make perf-compare-jsshell against master:

parse-runtime/simple:
  mean:   4.82 (+0%)
  stdev:  0.47
  sample: 30
create-runtime/simple:
  mean:   0.74 (-60%)
  stdev:  0.07
  sample: 30
resolve-runtime/simple:
  mean:   1.55 (-14%)
  stdev:  0.21
  sample: 30

parse-runtime/preferences:
  mean:   18.3 (-2%)
  stdev:  3.17
  sample: 30
create-runtime/preferences:
  mean:   1.19 (-63%)
  stdev:  0.28
  sample: 30
resolve-runtime/preferences:
  mean:   18.95 (+2%)
  stdev:  1.95
  sample: 30


parse-runtime/menubar:
  mean:   9.59 (-7%)
  stdev:  1.04
  sample: 30
create-runtime/menubar:
  mean:   0.85 (-56%)
  stdev:  0.17
  sample: 30
resolve-runtime/menubar:
  mean:   3.7 (-13%)
  stdev:  0.67
  sample: 30


parse-runtime/workload-low:
  mean:   10.09 (0%)
  stdev:  1.21
  sample: 30
create-runtime/workload-low:
  mean:   0.87 (-56%)
  stdev:  0.07
  sample: 30
resolve-runtime/workload-low:
  mean:   3.25 (+2%)
  stdev:  0.32
  sample: 30

@eemeli
Copy link
Member

eemeli commented Jun 27, 2019

With this design, how deep does the definition go? Specifically, could/should a pattern be considered an opaque blob, or is there some benefit in defining e.g. expressions as well?

@stasm
Copy link
Contributor Author

stasm commented Jun 27, 2019

I'd like to document all of the runtime AST, including expressions. I would not, however, expose additional methods like formatExpression. I think treating patterns as the only entry points to translations should be sufficient.

@eemeli
Copy link
Member

eemeli commented Jun 27, 2019

I would prefer patterns being left opaque, or at least documented as internal to the implementation. The output of fluent-compiler is a bit different from that level down, and needing to match the runtime API for the patterns and potentially expressions as well would be rather tedious, especially if there aren't any real use cases for them outside of formatPattern.

@stasm
Copy link
Contributor Author

stasm commented Jun 28, 2019

Would you say that the goal of fluent-compiler is to be interoperable wrt. the runtime AST with fluent (soon to be called @fluent/bundle, so I'll use that as the name)? I think it would be nice if it was a drop-in replacement for @fluent/bundle, implementing the same API methods. At the same time, I wouldn't expect a message retrieved from fluent-compiler to work in @fluent/bundle and vice versa. Is this something that would be helpful?

(I can see how the format/compound API sidesteps this problem, BTW.)

When I say documented and public, I don't mean part of the Fluent specification. I only propose that the shape of the runtime AST be documented as part of @fluent/bundle's implementation, and any changes to it would require major version bumps of @fluent/bundle.

@eemeli
Copy link
Member

eemeli commented Jun 28, 2019

My intent is to get the compiler to a state where tools that are built on top of @fluent/bundle could use the output of the compiler as a drop-in replacement, so that the Resource and Bundle of both implementations would provide the same APIs. I don't mean that you'd be able to take a Resource from one and plug it into a Bundle of another. Hence the desire to limit the depth into patterns/expressions that might be included in the published API, as that limits what the compiler would need to do as well.

Maybe another way of expressing this would be to ask whether anything else except for Bundle#formatPattern() would be expected to care about the shape or internals of a pattern?

@zbraniecki
Copy link
Collaborator

Maybe another way of expressing this would be to ask whether anything else except for Bundle#formatPattern() would be expected to care about the shape or internals of a pattern?

I second this. While I'm not strictly opposed to Stas' approach, I think that specifying the AST as public and stable is a major effort for very symbolic gain.
I'd rather keep it as private.

@stasm
Copy link
Contributor Author

stasm commented Jul 4, 2019

(@eemeli) My intent is to get the compiler to a state where tools that are built on top of @fluent/bundle could use the output of the compiler as a drop-in replacement, so that the Resource and Bundle of both implementations would provide the same APIs. I don't mean that you'd be able to take a Resource from one and plug it into a Bundle of another.

This sounds great, thank you for clarifying.

Hence the desire to limit the depth into patterns/expressions that might be included in the published API, as that limits what the compiler would need to do as well.

I see what you mean. If all of the runtime AST it is public, than the criteria for creating drop-in replacements widen.

Maybe another way of expressing this would be to ask whether anything else except for Bundle#formatPattern() would be expected to care about the shape or internals of a pattern?

Thanks for rephrasing. In the formatPattern proposal it has been my intent that only formatPattern cares and works with the internals of a pattern.

(@zbraniecki) I second this. While I'm not strictly opposed to Stas' approach, I think that specifying the AST as public and stable is a major effort for very symbolic gain. I'd rather keep it as private.

I'd like for it to be private too, but the reality and past experience is that there's no true private for data which is meant to be serializable to JSON. We've seen people use it for use-caes which should really use fluant-syntax (example).

Would it be enough to slightly obfuscate the internals of patterns to hint to consumers that they're not supposed to access them? Something like {_type: "var", _name: "user"}, {@@type: "var", @@name: "user"}, {_type: NodeKind.VariableReference, _name: "user"} (if we migrate to TypeScript), or even a tuple: ["var", "user"]?

@stasm
Copy link
Contributor Author

stasm commented Jul 4, 2019

I think the question about what's public and what's private can be broken down into two questions:

  1. How should changes to the Pattern type impact the version number of @fluent/bundle?

    Originally I proposed that any changes are signaled by a major version bump. This would effectively make the entire runtime AST public, which has consequences for Question 2.

    Would it be OK to allow for changes inside of the Pattern type in minor releases?

  2. What level of compatibility is expected from drop-in replacements?

    My original thinking was that replacements must implement the same methods as @fluent/bundle's FluentBundle (i.e. addResource, hasMessage, getMessage, formatPattern) and the object returned by getMessage must have exactly the same shape as returned by FluentBundle.getMessage.

    After @eemeli's comments, I'm now partial to requiring that getMessage return {value: Pattern, attributes: {[name: string]: Pattern}} shapes, where Pattern is an implementation-specific type.

@zbraniecki
Copy link
Collaborator

I'm not sure how this will work. At some place we need to expose a stable public API, and while working on Rust and Gecko implementations I have two strong cases where I have to explicitly decide on what I expose.

In Rust, it's a bit more flexible, I can create a type called Pattern and expose it. But in WebIDL it's a bit more tricky. I can't imagine specifying IDL for a complex structure that I think you want to make Pattern to be.

What I have right now is:

Message:

L10nKey:

Those two are relatively simple to encode and carry around, and even can be easily serialized if needed.
L10nKey serves as input, and Message serves as output.

I know FluentBundle is meant to be lower level and maybe it won't public IDL-like APIs, but I'm concerned about writing it in JS as "Object" and then trying to reverse-engineer into more specific structures later.

Additionally, if you make the FluentBundle output implementation-specific opaque types you basically make the FluentBundle and its higher-level wrapper bound to each other.
It may sound like a theoretical issue, but I'm right this moment experimenting with replacing JS FluentBundle and FluentResource with Rust FluentBundle and FluentResource in Gecko without touching L10nRegistry and/or Localization APIs. It seems very promising and is only possible because the API of FluentBundle takes "simple" structs and returns "simple" structs.
It seems to me like your approach will make us lose it.

p.s. The number of places where this conversation is being held is a bit overwhelming for me.

@eemeli
Copy link
Member

eemeli commented Jul 6, 2019

(@stasm) I think the question about what's public and what's private can be broken down into two questions:

  1. How should changes to the Pattern type impact the version number of @fluent/bundle?
    Originally I proposed that any changes are signaled by a major version bump. This would effectively make the entire runtime AST public, which has consequences for Question 2.
    Would it be OK to allow for changes inside of the Pattern type in minor releases?

I would say that Pattern changes should be allowed in minor releases, as/if its internals are not a part of the public API. It's also good to note here that it's much easier to make private things public, rather than the reverse.

  1. What level of compatibility is expected from drop-in replacements?
    My original thinking was that replacements must implement the same methods as @fluent/bundle's FluentBundle (i.e. addResource, hasMessage, getMessage, formatPattern) and the object returned by getMessage must have exactly the same shape as returned by FluentBundle.getMessage.

    After @eemeli's comments, I'm now partial to requiring that getMessage return {value: Pattern, attributes: {[name: string]: Pattern}} shapes, where Pattern is an implementation-specific type.

This would be perfect for me. fluent-compiler for instance encodes a Pattern as a function of the message arguments, returning an array of parts that are then stringified and joined by formatPattern.

(@zbraniecki) Additionally, if you make the FluentBundle output implementation-specific opaque types you basically make the FluentBundle and its higher-level wrapper bound to each other.
It may sound like a theoretical issue, but I'm right this moment experimenting with replacing JS FluentBundle and FluentResource with Rust FluentBundle and FluentResource in Gecko without touching L10nRegistry and/or Localization APIs. It seems very promising and is only possible because the API of FluentBundle takes "simple" structs and returns "simple" structs.
It seems to me like your approach will make us lose it.

As I see it, this binding only applies if the opaque output of the bundle is meant to be consumed by anything other than the bundle itself. I'm facing pretty much the same issue with fluent-compiler, and I don't see a problem here.

@stasm
Copy link
Contributor Author

stasm commented Jul 8, 2019

Message:

* Rust - https://github.com/projectfluent/fluent-rs/blob/master/fluent-bundle/src/bundle.rs#L21
* WebIDL - https://searchfox.org/mozilla-central/rev/040aa667f419932adf425d92c7438f03230ad96b/dom/chrome-webidl/Localization.webidl#32-40

Both of these describe a formatted message. The formatPattern proposal also requires that there be a raw message shape, returned by FluentBundle.getMessage. This raw message also has the value and attributes keys, whose values are a Pattern and a map of attribute names to Patterns, respectively. Those patterns can then be passed to FluentBundle.formatPattern to get a formatted string.

Additionally, if you make the FluentBundle output implementation-specific opaque types you basically make the FluentBundle and its higher-level wrapper bound to each other.

I don't think this is true. Higher-level wrappers will continue to use FluentBundle.getMessage and FluentBundle.formatPattern in unison. As long as both of these methods come from the same implementation, things should work fine.

It seems very promising and is only possible because the API of FluentBundle takes "simple" structs and returns "simple" structs. It seems to me like your approach will make us lose it.

The formatPattern proposal specifically doesn't change the fact that FluentBundle takes "simple" structs and returns "simple" structs, unless I misunderstand what you mean by that. In your experiments, how do you describe the struct passed to the current FluentBundle.format method? The Localization.webidl you pointed to describes a formatted message, but FluentBundle.format takes a runtime AST structure.

// a FluentType instance without incurring the cost of creating one.
function resolveExpression(scope, expr) {
// A special case for VariantKeys.
// XXX Should not go through bundle._transform.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #387 for this.

*
* @returns {Iterator}
*/
get messages() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll remove addMessages in a fix to #328.

@@ -34,7 +34,7 @@ suite('Reference bombs', function() {
// https://bugzil.la/1307126
test.skip('does not expand all placeables', function() {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #388 to unskip this test.

if (Array.isArray(expr)) {
return Pattern(scope, expr);
}

switch (expr.type) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #386 to consider renaming fields to indicate that they're not part of the public API.

* If formatting fails, it will return a partially resolved entity.
*
* In both cases, an error is being added to the errors array.
* If the message doesn't have a value, return `null`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is how it worked before, too. The comment was wrong.

test('returns undefined for terms and missing messages', function() {
assert.strictEqual(bundle.getMessage('-bar'), undefined);
assert.strictEqual(bundle.getMessage('baz'), undefined);
assert.strictEqual(bundle.getMessage('-baz'), undefined);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I filed #389 to switch to using strictEqual in all tests.

if (typeof message.value === "string") {
return this._transform(message.value);
}
throw new TypeError("Invalid Pattern type");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not really sure what's the right thing to do here. Should we guard against unexpected types of patterns? This could be helpful when someone forgets to check if message.value is not null before calling formatPattern. If I don't throw here, resolveComplexPattern likely will, anyways, but with a more cryptic error.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should throw, but rather push something to errors, and return undefined.

Copy link
Contributor Author

@stasm stasm Jul 12, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My rationale for throwing here is that this isn't a formatting error. It's not caused by a bug in the translation which resulted in an incompletely formatted output. As such, I don't think it fits into errors either. This is an error in how the API has been used by the consumer. If the consumer takes care to always only pass Patterns to formatPattern, this error will never happen.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess your last comment also addresses your initial concern if developers should guard their calls with try catch ?

Also, we're not really protecting this API by checking if the input is an Array against being fed garbage, [{foopy:true}] is still bad ;-)

If Pattern was a subclass of Array, we might have a slightly stronger assertion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess your last comment also addresses your initial concern if developers should guard their calls with try catch ?

Oh! Yes, I think so! :)

Let me try to rephrase this to make sure I understand your point: If I'm fine expecting developers to use a try catch, I should also be fine expecting them to check if (message.value) before calling formatPattern. Is this what you meant?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what you're trying to achieve. If this is just about message.value being null unexpectedly, I'd consider null to be actually an OK return value.

I read this to be about creating a defined error scenario when people pass in garbage like Math.random or so. I wouldn't expect good users of the API to try-catch to proof them for API abuse.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks for clarifying. I'd like to consider adding null as an OK return value in a follow-up. I'd prefer if this method only returned strings, so that it's not necessary to check for non-null output after every formatPattern.

For now, I'll keep the throwing on non-string, non-array pattern in order to help developers find buggy callsites faster.

@stasm stasm requested review from Pike and zbraniecki July 11, 2019 16:37
@stasm
Copy link
Contributor Author

stasm commented Jul 11, 2019

@Pike This is now ready for your review.

The PR changes the FluentResource to be a subclass of Array; all runtime parser fixtures had to be regenerated. I also had to update almost all tests, because of the formatformatPattern change.

I left a few comments above. I have a few ideas for direct follow-ups, but I didn't want to make this PR even bigger. I filed new issues for each of the planned follow-ups.

My biggest question is: should formatPattern throw on invalid pattern types? I'd like to avoid forcing consumers of the API to wrap each call to formatPattern in a try...catch block, mostly for performance reasons. If you only pass value and attributes that you got from getMessage, the pattern type is guaranteed to be correct. So as long as you use the API right, no throwing will happen. However, I also see value in throwing early on invalid types. Otherwise, the error is way more cryptic since it happens deep inside the resolver.

@stasm
Copy link
Contributor Author

stasm commented Jul 11, 2019

@zbraniecki Would you mind taking a look at the changes to fluent-dom, please?

@stasm stasm changed the title FluentBunde.formatPattern FluentBundle.formatPattern Jul 11, 2019
Copy link
Contributor

@Pike Pike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a couple of comments, surprisingly enough, the react one bothers me most.

I only glanced at fluent-dom, leaving that one to zibi for real.

I'd also like to push out the optional parameter definition for formatPattern.

if (typeof message.value === "string") {
return this._transform(message.value);
}
throw new TypeError("Invalid Pattern type");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should throw, but rather push something to errors, and return undefined.

@stasm stasm requested a review from Pike July 15, 2019 18:46
@stasm
Copy link
Contributor Author

stasm commented Jul 15, 2019

@Pike I've updated the PR to address your feedback. I also added two tests to fluent-react, as mentioned earlier. Thanks again for catching the isValidElement problem.

I kept throw in formatPattern when the argument is not a string nor an array, because I'd like to avoid adding null as a valid return value. At least for now. My rationale is that doing so would effectively force every call to formatPattern to be accompanied by a !== null check. Perhaps we can find a better way in #390.

Copy link
Contributor

@Pike Pike left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, lgtm. Still need @zbraniecki to look at the changes in fluent-dom, aka Fluent in gecko in spe.

@zbraniecki
Copy link
Collaborator

Thanks, lgtm. Still need @zbraniecki to look at the changes in fluent-dom, aka Fluent in gecko in spe.

I think that Fluent in Gecko currently becomes more of a does it work in fluent-rs than fluent-dom.

I'm tackling that in projectfluent/fluent-rs#119 and trying to find a way to massage this API into Rust and Gecko.
I'm not sure if this is the best approach - I can also just review this patch as-is, but then I don't think there's another place to say "From Gecko perspective, it's not enough that the API proposal works in JS, and we need to make it work in Gecko" and I believe it should block this.

@stasm stasm changed the base branch from master to release-fluent-zero-thirteen July 16, 2019 16:20
@stasm
Copy link
Contributor Author

stasm commented Jul 16, 2019

I created a release branch called release-fluent-zero-thirteen and re-targeted this PR to it. I hope to be able to land this PR there soon and move on to working on the follow-up for the rest of the week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants