-
-
Notifications
You must be signed in to change notification settings - Fork 2.1k
Add start of dimensionality notebook doc #5437
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Codecov Report
@@ Coverage Diff @@
## main #5437 +/- ##
=======================================
Coverage 80.23% 80.23%
=======================================
Files 82 82
Lines 13941 13941
=======================================
Hits 11185 11185
Misses 2756 2756 |
@@ -0,0 +1,709 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A general advice regarding shape, probably worth adding at the top: __Pick prime numbers for dimension lenghts.__
Then one can always reconstruct how, for example, two variables were concatenated.
Also makes it easier to pinpoint errors.
In this example, using eye(3)
would make it clear which output dimensions correspond to the support dim.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont quite get what you mean here, though I took a guess and added a new cell to show the limit of where Im losing the understanding
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What Michael meant is that instead of having a (2, 2) distribution (repeated dimensions), you should use something like (2, 3), so that it's obvious what is base and what is implied/replicated dimensionality.
Your example now needs a size 3 vectors for mu though. That's why it's failing
also, more of a PSA, I am not subscribed to notifications for pymc repo, only for pymc-examples. If there are docs PR please tag me so I see them asap. I try to review new issues and PRs 1-2 a week and I saw this one relatively early, but it is often too late already |
Noted @OriolAbril, sorry about that and thanks for reviewing |
@@ -0,0 +1,742 @@ | |||
{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mu=[0, 0]
Technically, it's incorrect to specify a scalar mu for the multivariate normal, although for backwards compatibility we reshape mu behind the scenes
Reply via ReviewNB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If its just for backwards compability why dont we drop it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop things after warning about them for long enough. If you want to drop supporting the automatic broadcast, then we'll need FutureWarning in a release version for about a year. After that we can make the break and argue that it's okay to not bump the major version.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we have a major release coming up where breaks are expected.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can drop it with a useful ValueError about the changed behavior. See also #5440
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Thomas, we're releasing a major version this is our chance to reform the code without the need for FutureWarnings.
This is good for both devs and the community. For example in this case some of these incorrect but sort of works situations makes using the library more challenging. As a dev when I try and figure out what's happening I have to navigate the maze of conditionals and reference calls in the backend to figure out what is happening. I also need to read through so many much more code and tests to make a useful contribution.
In the nicest way I'm finding you (Ricardo and Michael) are PyMC/Aesara/AePPL power users and you're not inside the maze of code. You guys are so smart you have this elevated view and can see this maze from above, avoid all the traps, and know all the shortcuts. For noobs like me I run into all the deadends and then have to backtrack and try another route.
My strong pitch is we make the maze simpler. As with all codebases the complexity will increase over time, look at Theano. But we have this one opportunity right to make things simpler so please lets please do it, if not for you for me ❤️
This is a bigger discussion than this PR so if you have thoughts let move it to more conversational medium.
In the meanwhile though, I really appreciate you both giving me handwritten notes of how to navigate things. Think of it likes clues for the maze that help me, and hopefully future users, navigate things as well as you two :)
Updated per comments. Verified that its compiling locally. @OriolAbril tagging you per request, also to ensure you feel like the changes youd like have been made |
still have mixed feelings about the "named array" term, the caption looks good otherwise, but we should wait until fixing CI to check the preview on readthedocs. I have one question though. Are those terms/definitions used elsewhere in the documentation? We could include those in the glossary and here so that when they appear somewhere else we can use |
Great Idea ill add them to the glossary |
@OriolAbril actually can we create an issue ticket to add the terms to the glossary and merge this as is? This will be a great issue for the data umbrella sprint, and worst case I'll just get back to it afterwards. RE: Name arrays that has been changed by Michael so we should be good to go there as well |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OriolAbril actually can we create an issue ticket to add the terms to the glossary and merge this as is? This will be a great issue for the data umbrella sprint, and worst case I'll just get back to it afterwards.
We can take this over in a follow up PR, and if you have time please open an issue. But I don't think it's a good issue for the sprint. It's not only a matter of copying definitions from one place to another but about making sure definitions appear in both places (pulling them from a single source of truth though) as they are key to understanding the doc, and having to click a link to read them would be a disservice.
Ill definitely open an issue to track. As for this not being good for the sprint curious why would it wouldnt be? Its a straightforward change and takes folks through all the steps of compiling the docs and making a pr? Not implying you need to change your mind, just curious what I'm missing |
Note, I'll merge this after a rebase on main to make sure Ci is passing. Will get to it in next ~72 hours |
The key is in the from a single source of truth syntagma. IMO both the glossary and this page should show the complete definitions, and then other pages using these terms should link to the glossary using the The first idea that comes to mind is using myst substitutions for this which would mean writing the definitions in conf.py and then having this page and the glossary. None of the webinars cover this and there are no references on how to do this anywhere in our docs, you need to know myst. Sphinx is amazing and very flexible, but I think one should first use it with a provided configuration, and once you know how to take advantage of the available features and settings and only then look at the conf.py file. I think this is also the main reason the jupyterbook people decided to use a yaml file instead of trying to teach juypterbook users to use conf.py. I find the yaml too limiting and wouldn't use it (and you can do the same jupyterbook does and more -or less!- with conf.py directly) but I see the yaml as a very valuable resource to get people to use sphinx straight away without getting lost in conf.py which can be overwhelming and is also very tempting place for premature optimization. |
extra note: I don't think CI will pass until after we merge #5449 |
Co-authored-by: Michael Osthege <[email protected]>
b2a691e
to
e4e80b7
Compare
Thanks, makes sense now. Regarding CI if it wont pass I'll just merge this as theres zero chance broke anything in the main library |
Merging |
Adding start of shapes doc. Note for this PR I am not intending to reach "feature exhaustive" but more minimally useful that can be built upon.
There's two primary areas I could use with in this PR in ranked order
Huge thanks to @michaelosthege who spent 2 hours in his Friday evening explaining this to me very patiently and kindly