Skip to content

Fix mathematical notation in conditional_logprob docstrings #6821

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

amyoshino
Copy link
Member

@amyoshino amyoshino commented Jul 9, 2023

What is this PR about?
Closes #6798

This PR Fix mathematical notation in conditional_logprob docstrings

The changes are pointed out by @larryshamalama in #6798. The notation changes are:

  • Given that $\Sigma$ is commonly used for covariance matrices, it is replaced by $\sigma$ which is a more common notation.
  • $P(Y = y)$ is replaced by $P_{Y}(y)$ to avoid incorrect notation in the case of continuous-valued random variables.
image

Checklist

Major / Breaking Changes

  • ...

New features

  • ...

Bugfixes

  • ...

Documentation

  • ...

Maintenance

  • ...

📚 Documentation preview 📚: https://pymc--6821.org.readthedocs.build/en/6821/

@amyoshino
Copy link
Member Author

amyoshino commented Jul 9, 2023

@larryshamalama I have followed your suggestions in the issue #6798 but I am not sure if the way I represented the line L438 is correct: $\log p(Y = y \mid \Sigma^2)$ --> $\log p_{Y \mid s^2}(y \mid s^2)$. It looks odd to me because I am repeating the $s$ in the subscript and in the conditional side. Let me know if this is not the best way to represent it.

In the lines L448-L449 I shifted the $\Sigma$ with $s$ in your suggestion because I believe this is how the original version is described after replacing $\Sigma$ (subscript) with $s$ (value). Let me know if I am mistaken here as well.
https://www.pymc.io/projects/docs/en/stable/api/generated/pymc.logprob.conditional_logp.html#pymc.logprob.conditional_logp

@codecov
Copy link

codecov bot commented Jul 9, 2023

Codecov Report

Merging #6821 (c44a5d2) into main (659f177) will increase coverage by 0.00%.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #6821   +/-   ##
=======================================
  Coverage   92.04%   92.04%           
=======================================
  Files          95       95           
  Lines       16261    16262    +1     
=======================================
+ Hits        14967    14968    +1     
  Misses       1294     1294           
Impacted Files Coverage Δ
pymc/logprob/basic.py 94.28% <ø> (ø)

... and 1 file with indirect coverage changes

@amyoshino amyoshino marked this pull request as ready for review July 9, 2023 01:54
Copy link
Member

@larryshamalama larryshamalama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @amyoshino for opening this PR, you're on a roll!

Apologies if the initial issue was confusing. I wrote a long text ahead, none of this is major but rather a clarification of what I previously meant. I can get quite picky with mathematical notation...

@larryshamalama I have followed your suggestions in the issue #6798 but I am not sure if the way I represented the line L438 is correct: log⁡p(Y=y∣Σ2) --> log⁡pY∣s2(y∣s2). It looks odd to me because I am repeating the s in the subscript and in the conditional side. Let me know if this is not the best way to represent it.

Your instinct is correct, we shouldn't have s as a subscript (random variable) and as an argument (value variable). In the hierarchical model definition, we just want random variables. In the mathematical function representing the log-density, arguments are the "value variables". I hope that this clarifies things.

We ideally want to adopt a notation that is more "standard" and easier to understand. That's mostly subjective... 1) I prefer using $\sigma^2$ to denote the variance. 2) Often, using both random variables as indices and arguments to denote what densities we are referring to can be too much... Here, I think that it's important, because we would like to delineate random variables from value variables in PyMC/PyTensor.

It seems that you use $\sigma^2$ as an argument and $s^2$ as the stochastic variable. (*) I personally would change the order, especially that we define sigma2_rv to be the PyTensor random variable and s_vv to be the valued variable. (*) We should probably even write s2_vv instead of s_vv. If you have any preferences, you can add your suggestion. Otherwise, I provided mine in this review.

In short, your instinct correctly addresses an error in mathematical notation, but the solution can be entirely subjective. Actionable items are presented by a (*) for clarity :)

amyoshino and others added 4 commits July 10, 2023 21:02
Accepting change in s. It becomes sigma

Co-authored-by: larryshamalama <[email protected]>
Accepting change in s. It becomes sigma part 2

Co-authored-by: larryshamalama <[email protected]>
@amyoshino
Copy link
Member Author

@larryshamalama thank you for the comments and review! I think it makes way more sense to me now, my apologies for not making sense of your initial comments and changing the use of $\sigma$ and $s$. I believe it is now very clear to the readers.

I have commited your suggestions and added a last commit to cover the changes from s_vv to s2_vv. Sorry the number of commits as well, I started accepting one by one instead of accepting all in one batch, lesson learned for the next PR :)

I think everything is ok now, but let me know if I forgot to address anything or if you have any other suggestions or comments.

@larryshamalama
Copy link
Member

@larryshamalama thank you for the comments and review! I think it makes way more sense to me now, my apologies for not making sense of your initial comments and changing the use of σ and s. I believe it is now very clear to the readers.

Ah please don't be sorry, I'm very happy to help :)

I have commited your suggestions and added a last commit to cover the changes from s_vv to s2_vv. Sorry the number of commits as well, I started accepting one by one instead of accepting all in one batch, lesson learned for the next PR :)

Don't worry about the number of commits! We often squash merge, but there is a certain "art" to writing nice commits. I'm still learning myself, @ricardoV94 writes great commits if you want to have some inspiration. Around the beginning of my open-source involvement, I was referred to this website, which I found somewhat helpful: https://wiki.openstack.org/wiki/GitCommitMessages#Structural_split_of_changes

Copy link
Member

@larryshamalama larryshamalama left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks @amyoshino!

@larryshamalama larryshamalama merged commit 13e7c88 into pymc-devs:main Jul 11, 2023
@amyoshino
Copy link
Member Author

amyoshino commented Jul 11, 2023

@larryshamalama Thank you for your guidance!! Thanks for also pointing out resources for me to improve, I found the linked document very informative, will carry it to the next PRs as best as I can.

@amyoshino amyoshino deleted the conditional_logprob_docstrings branch July 11, 2023 15:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

DOC: Fix mathematical notation in conditional_logprob docstrings
2 participants