-
Notifications
You must be signed in to change notification settings - Fork 17
Allow reporting the type of arbitrary expressions on demand #50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Just to be clear, this issue is just a fact-finding ticket right now; we'd like to get an idea of how difficult it would be to provide this support so we can decide if we want to extend our feature-set to include it. |
ghc-mod sources https://github.com/kazu-yamamoto/ghc-mod/blob/master/Info.hs#L53) |
Note that this either means keeping the typechecked AST in memory at all times (very expensive) or else re-typecheck the module on every invocation of this new function (also expensive). |
It should be possible to typecheck an arbitrary expression, just based on a textual representation that the IDE provides us with, given that we already have all identifier information. That's the theory -- practice may be different :) |
In practice ghc-mod's API allows this. I think it would have to be re-checked every time because there's no trivial way to know what's been changed in a file's types because of one small tweak, or even across different modules… So the only advantage to keeping the checked AST around would just be more or less if they didn't change any files yet. How expensive is it to run the type-check function? If we're already doing a Flymake-style thing (which devs will expect) there should already be a GHC session open, so there's no (additional) GHC initialization overhead. So what's the actual type-checking overhead? Also, I wonder if we can take advantage of .hi files so we don't have to re-check all dependent files unless we need to. FWIW we also get much more than just types but also exports, instances, breakpoints and more:
Could be additionally useful elsewhere in the IDE. Also notice the definition of
That could also be super useful for doing lookups of reverse-dependencies (“who calls this function?”) and doing refactoring (renaming of all instances of this variable), etc. It seems to me that at some point we will have to use this part of the API (or re-implement it ourselves), if we're going to provide features that Java/C++/C# IDEs do out of the box. Just my two cents. |
The overhead for checking an entire module would be quite large, however ghc does also provide means of checking the type of a single expression -- and that would suffice for this case, as long as we can pass in the right typing environment. (And yes, this assumes things haven't changed. When things change another type check cycle needs to be initiated anyway.) |
Do we get the typing environment from GHC's "load" function? |
(With a patch) we get access to the same environment that you'd get from typecheckModule, that's to say, top-level identifiers only. However, we traverse the AST and collect type information for all identifiers so we can construct our own, more complete, environment. |
But we get more than top-level identifiers only with
Make a
And use like this:
Don't we have everything we need above? We get types for more than just identifiers, but expressions. Anyhoo, this is the backend so I guess it's not my area of interest/expertise. |
Hi Chris, the code you've got there is pretty similar to what we're doing already, in that it's walking over the typechecked AST and pulling out the types and spans in various places. So far we have only needed the types at the identifier leaves, but we could also collect the spans and types at expression nodes. From that we could search up/down to find the various enclosing expressions and their types. So difficulty: we would have to alter our traversal a bit and accumulate an extra structure for the spans & types of expression nodes. We would have to accumulate that into an appropriate search structure (probably an interval tree) so that we can quickly find the minimum enclosing source span and all the enclosing spans (& corresponding types). Obviously keeping this info around would increase our memory requirements, compared to just keeping it for the leaves. So yes, totally doable, and I'd guess roughly two weeks work. |
The really ideal situation would be that we would have types for sub-expressions even in the presence of type errors. I've asked about this before, and got the answer that it wouldn't be easy. I think it might be good to revisit the question, though. @edsko @dcoutts How hard would it be to do this? @BartoszMilewski Do we want to do this anytime soon? IMHO, it would be good to start work on this soon, because this is one area where more partialness in compilation leads to the greatest user benefit. When we have that info, we can really start powerfully helping the user resolve their type errors. Afterall, much of what we're doing when we're figuring out a tricky type error is running typechecking in our head or asking GHCI questions about subexpressions. Originally posted here: https://github.com/fpco/fpco/issues/637#issuecomment-16634545 |
I'd really like the leaf type information to be polished and tested with users before we do any more advanced stuff. The ultimate goal is to be able to say "Everything GHCi can do, we can do better" so let's consider it in this larger context. I'd rather fist work out REPL functionality because I bet users will be clamoring for it. |
As of fe64d22 the internal IdMap type is now a proper interval map, and we never expose it's actual concrete representation. |
@BartoszMilewski Well, asking GHCI for the type of a sub-expression is something you do all the time. The difference is that usually you'd have to re-construct / pun the local bindings before doing so. So having more than just leaf type info fits right into "Everything GHCi can do, we can do better". I do agree a repl would be very nice. Maybe for beta, especially if our alpha users mention it. |
We record the subexpression types for identifiers, application, lambda abstraction, (overloaded) literals, and let expressions. The approach seems to work well and is efficient (every node in the AST is considered only once). Just left to do is to add support for more expressions, and the order of the "dominators" seems somewhat haphazard at the moment.
@snoyberg, @dcoutts, @jwiegley This feature is now ready for testing on the experimental branch. Most expression types are supported, with two notable exceptions: arrow syntax (just not implemented, can add if desired), and TH. The most important problem with TH is that the typechecked tree contains no traces of any TH splices (but only their expanded forms), and moreover, every subterm of those expansions is assigned the location of the entire original TH splice. So, given something like module A where
qComp x xs = [| x : xs |] and module B where
import A
t2 = $(qComp 'a' "bc") then we report all of The other problem currently is that the order of the types appears a bit random at the moment (i.e., not in domination order). I'm not sure why that is, I will look into that now. |
@snoyberg The order of the dominators comes straight from the |
Note: Please wait for @snoyberg to sign off on my suggestions below before doing anything about them: @edsko Awesome!! Using
I think it's /ok/ for now that TH has non-ideal results. However, I have a suggestion for a cheap way to fix this problem:
This would give us the type of the splice / QQ, which is what we want. It would also avoid storing a ton of information for generated code. |
Thanks @edsko, I've experimented with this briefly and it seems to be working nicely. I've set up a new isolation-runner command for getting this information, I'll open up a new issue in the fpco repo to get this integrated into the UI. |
Problem reported in https://github.com/fpco/fpco/issues/2609, looking at that now. |
@snoyberg, @mgsloan Note that the issue of overlapping ranges in the subexpression type extends beyond just TH. For instance, consider type family TestTypeFamily a
type instance TestTypeFamily [a] = Maybe a
t2 :: TestTypeFamily [a]
t2 = Nothing at the location of forall a1. Maybe a1
Maybe a
TestTypeFamily [a] reported, all at the same location. The first type is the polymorphic type of |
@edsko Is the information about where these different types came from available at? It's awesome that all of this information is available, but it's difficult to leverage if there's a lot of it, without ways to discriminate between the different results. These are coming from an annotated AST, right? So it seems like it would be possible to distinguish the two cases of "coincident types, on the same AST node", and "coincident AST nodes". I'm not very familiar with how this works, though, so I could easily be wrong about that. |
They come from traversing the AST, yes -- not sure what kind of additional information you would like us to make available though? |
@edsko It's not additional information, but less - I'd prefer that the types from TH-generated ASTs only reported the type(s) of the outermost AST node. My proposed scheme for doing this is to discard type results when they're from a child node that has the same span as its parent - see this comment: #50 (comment) Anyway, I don't think this is the focus for now, just something to consider at some point. |
Closing this issue as we now support getting types for subexpressions. Please open new issues for specific problems/enhancements (for instance, the generalization of types discussed in https://github.com/fpco/fpco/issues/3043). |
The ghc-mod utility is able to use the GHC API to determine the type of any identifier or expression within a well-formed Haskell module. At present, the ide-backend only provides the type for identifiers.
We'd like to be able to request the type of arbitrary expressions, with the understanding that such information would only be gathered as it is requested.
Pinging @snoyberg @edsko @dcoutts
The text was updated successfully, but these errors were encountered: