Skip to content

Add start of dimensionality notebook doc #5437

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Feb 17, 2022

Conversation

canyon289
Copy link
Member

@canyon289 canyon289 commented Feb 1, 2022

Adding start of shapes doc. Note for this PR I am not intending to reach "feature exhaustive" but more minimally useful that can be built upon.

There's two primary areas I could use with in this PR in ranked order

  • Feedback on the notebook to ensure the documentation is correct and useful
  • Help getting it to actually show up in the rendered html. The things Ive tried so far haven't seemed to work

Huge thanks to @michaelosthege who spent 2 hours in his Friday evening explaining this to me very patiently and kindly

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@codecov
Copy link

codecov bot commented Feb 1, 2022

Codecov Report

Merging #5437 (e4e80b7) into main (3d958ad) will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #5437   +/-   ##
=======================================
  Coverage   80.23%   80.23%           
=======================================
  Files          82       82           
  Lines       13941    13941           
=======================================
  Hits        11185    11185           
  Misses       2756     2756           

@@ -0,0 +1,709 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reiterate this is the "implied" situation again?


Reply via ReviewNB

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A general advice regarding shape, probably worth adding at the top: __Pick prime numbers for dimension lenghts.__

Then one can always reconstruct how, for example, two variables were concatenated.

Also makes it easier to pinpoint errors.

In this example, using eye(3) would make it clear which output dimensions correspond to the support dim.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I dont quite get what you mean here, though I took a guess and added a new cell to show the limit of where Im losing the understanding

Copy link
Member

@ricardoV94 ricardoV94 Feb 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What Michael meant is that instead of having a (2, 2) distribution (repeated dimensions), you should use something like (2, 3), so that it's obvious what is base and what is implied/replicated dimensionality.

Your example now needs a size 3 vectors for mu though. That's why it's failing

@OriolAbril
Copy link
Member

also, more of a PSA, I am not subscribed to notifications for pymc repo, only for pymc-examples. If there are docs PR please tag me so I see them asap. I try to review new issues and PRs 1-2 a week and I saw this one relatively early, but it is often too late already

@canyon289
Copy link
Member Author

also, more of a PSA, I am not subscribed to notifications for pymc repo, only for pymc-examples. If there are docs PR please tag me so I see them asap. I try to review new issues and PRs 1-2 a week and I saw this one relatively early, but it is often too late already

Noted @OriolAbril, sorry about that and thanks for reviewing

@canyon289 canyon289 changed the title Add start of shapes doc Add start of dimensionality notebook doc Feb 2, 2022
@@ -0,0 +1,742 @@
{
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mu=[0, 0]

Technically, it's incorrect to specify a scalar mu for the multivariate normal, although for backwards compatibility we reshape mu behind the scenes


Reply via ReviewNB

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If its just for backwards compability why dont we drop it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop things after warning about them for long enough. If you want to drop supporting the automatic broadcast, then we'll need FutureWarning in a release version for about a year. After that we can make the break and argue that it's okay to not bump the major version.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we have a major release coming up where breaks are expected.

Copy link
Member

@ricardoV94 ricardoV94 Feb 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can drop it with a useful ValueError about the changed behavior. See also #5440

Copy link
Member Author

@canyon289 canyon289 Feb 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Thomas, we're releasing a major version this is our chance to reform the code without the need for FutureWarnings.

This is good for both devs and the community. For example in this case some of these incorrect but sort of works situations makes using the library more challenging. As a dev when I try and figure out what's happening I have to navigate the maze of conditionals and reference calls in the backend to figure out what is happening. I also need to read through so many much more code and tests to make a useful contribution.

In the nicest way I'm finding you (Ricardo and Michael) are PyMC/Aesara/AePPL power users and you're not inside the maze of code. You guys are so smart you have this elevated view and can see this maze from above, avoid all the traps, and know all the shortcuts. For noobs like me I run into all the deadends and then have to backtrack and try another route.

My strong pitch is we make the maze simpler. As with all codebases the complexity will increase over time, look at Theano. But we have this one opportunity right to make things simpler so please lets please do it, if not for you for me ❤️

This is a bigger discussion than this PR so if you have thoughts let move it to more conversational medium.

In the meanwhile though, I really appreciate you both giving me handwritten notes of how to navigate things. Think of it likes clues for the maze that help me, and hopefully future users, navigate things as well as you two :)

@canyon289
Copy link
Member Author

Updated per comments. Verified that its compiling locally. @OriolAbril tagging you per request, also to ensure you feel like the changes youd like have been made

image

@OriolAbril
Copy link
Member

OriolAbril commented Feb 8, 2022

still have mixed feelings about the "named array" term, the caption looks good otherwise, but we should wait until fixing CI to check the preview on readthedocs.

I have one question though. Are those terms/definitions used elsewhere in the documentation? We could include those in the glossary and here so that when they appear somewhere else we can use term role and render it as a link to its glossary definition

@canyon289
Copy link
Member Author

still have mixed feelings about the "named array" term, the caption looks good otherwise, but we should wait until fixing CI to check the preview on readthedocs.

I have one question though. Are those terms/definitions used elsewhere in the documentation? We could include those in the glossary and here so that when they appear somewhere else we can use term role and render it as a link to its glossary definition

Great Idea ill add them to the glossary

@canyon289
Copy link
Member Author

@OriolAbril actually can we create an issue ticket to add the terms to the glossary and merge this as is? This will be a great issue for the data umbrella sprint, and worst case I'll just get back to it afterwards.

RE: Name arrays that has been changed by Michael so we should be good to go there as well

Copy link
Member

@OriolAbril OriolAbril left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@OriolAbril actually can we create an issue ticket to add the terms to the glossary and merge this as is? This will be a great issue for the data umbrella sprint, and worst case I'll just get back to it afterwards.

We can take this over in a follow up PR, and if you have time please open an issue. But I don't think it's a good issue for the sprint. It's not only a matter of copying definitions from one place to another but about making sure definitions appear in both places (pulling them from a single source of truth though) as they are key to understanding the doc, and having to click a link to read them would be a disservice.

@canyon289
Copy link
Member Author

Ill definitely open an issue to track.

As for this not being good for the sprint curious why would it wouldnt be? Its a straightforward change and takes folks through all the steps of compiling the docs and making a pr? Not implying you need to change your mind, just curious what I'm missing

@canyon289
Copy link
Member Author

canyon289 commented Feb 13, 2022

Note, I'll merge this after a rebase on main to make sure Ci is passing. Will get to it in next ~72 hours

@OriolAbril
Copy link
Member

The key is in the from a single source of truth syntagma. IMO both the glossary and this page should show the complete definitions, and then other pages using these terms should link to the glossary using the term role.

The first idea that comes to mind is using myst substitutions for this which would mean writing the definitions in conf.py and then having this page and the glossary. None of the webinars cover this and there are no references on how to do this anywhere in our docs, you need to know myst. Sphinx is amazing and very flexible, but I think one should first use it with a provided configuration, and once you know how to take advantage of the available features and settings and only then look at the conf.py file.

I think this is also the main reason the jupyterbook people decided to use a yaml file instead of trying to teach juypterbook users to use conf.py. I find the yaml too limiting and wouldn't use it (and you can do the same jupyterbook does and more -or less!- with conf.py directly) but I see the yaml as a very valuable resource to get people to use sphinx straight away without getting lost in conf.py which can be overwhelming and is also very tempting place for premature optimization.

@OriolAbril
Copy link
Member

extra note: I don't think CI will pass until after we merge #5449

@canyon289
Copy link
Member Author

The key is in the from a single source of truth syntagma. IMO both the glossary and this page should show the complete definitions, and then other pages using these terms should link to the glossary using the term role.

The first idea that comes to mind is using myst substitutions for this which would mean writing the definitions in conf.py and then having this page and the glossary. None of the webinars cover this and there are no references on how to do this anywhere in our docs, you need to know myst. Sphinx is amazing and very flexible, but I think one should first use it with a provided configuration, and once you know how to take advantage of the available features and settings and only then look at the conf.py file.

I think this is also the main reason the jupyterbook people decided to use a yaml file instead of trying to teach juypterbook users to use conf.py. I find the yaml too limiting and wouldn't use it (and you can do the same jupyterbook does and more -or less!- with conf.py directly) but I see the yaml as a very valuable resource to get people to use sphinx straight away without getting lost in conf.py which can be overwhelming and is also very tempting place for premature optimization.

Thanks, makes sense now.

Regarding CI if it wont pass I'll just merge this as theres zero chance broke anything in the main library

@canyon289
Copy link
Member Author

Merging

@canyon289 canyon289 merged commit 0f08c9e into pymc-devs:main Feb 17, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants