Skip to content

Add moments for CAR distribution #5220

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 26, 2022
Merged

Conversation

zoj613
Copy link
Contributor

@zoj613 zoj613 commented Nov 22, 2021

This adds moments for the CAR distribution discussed in #5078 (comment)

@zoj613
Copy link
Contributor Author

zoj613 commented Nov 22, 2021

The tests are based on

def test_car_rng_fn(sparse):
delta = 0.05 # limit for KS p-value
n_fails = 20 # Allows the KS fails a certain number of times
size = (100,)
W = np.array(
[[0.0, 1.0, 1.0, 0.0], [1.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 1.0], [0.0, 1.0, 1.0, 0.0]]
)
tau = 2
alpha = 0.5
mu = np.array([1, 1, 1, 1])

@ckrapu @aerubanov @ricardoV94 Could you please take a look at this when you can.

@codecov
Copy link

codecov bot commented Nov 22, 2021

Codecov Report

Merging #5220 (260586b) into main (b6f76e5) will increase coverage by 0.00%.
The diff coverage is 100.00%.

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #5220   +/-   ##
=======================================
  Coverage   79.02%   79.02%           
=======================================
  Files          87       87           
  Lines       14376    14382    +6     
=======================================
+ Hits        11360    11366    +6     
  Misses       3016     3016           
Impacted Files Coverage Δ
pymc/distributions/multivariate.py 73.98% <100.00%> (+0.22%) ⬆️

@zoj613
Copy link
Contributor Author

zoj613 commented Nov 22, 2021

Not sure why the coverage tanked here. I did not make any changes to parallel_sampling.py.

@twiecki twiecki requested a review from ricardoV94 November 23, 2021 08:15
Comment on lines +842 to +957
tau = 2
alpha = 0.5
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are tau and alpha constrained to be scalars?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just noticed the class docstring allows them to be arrays. I just don't see it being tested anywhere in

@pytest.mark.parametrize(
"sparse, size",
[(False, ()), (False, (1,)), (False, (4,)), (False, (4, 4, 4)), (True, ()), (True, (4,))],
ids=str,
)
def test_car_logp(sparse, size):
"""
Tests the log probability function for the CAR distribution by checking
against Scipy's multivariate normal logpdf, up to an additive constant.
The formula used by the CAR logp implementation omits several additive terms.
"""
np.random.seed(1)
# d x d adjacency matrix for a square (d=4) of rook-adjacent sites
W = np.array(
[[0.0, 1.0, 1.0, 0.0], [1.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 1.0], [0.0, 1.0, 1.0, 0.0]]
)
tau = 2
alpha = 0.5
mu = np.zeros(4)
xs = np.random.randn(*(size + mu.shape))
# Compute CAR covariance matrix and resulting MVN logp
D = W.sum(axis=0)
prec = tau * (np.diag(D) - alpha * W)
cov = np.linalg.inv(prec)
scipy_logp = scipy.stats.multivariate_normal.logpdf(xs, mu, cov)
W = aesara.tensor.as_tensor_variable(W)
if sparse:
W = aesara.sparse.csr_from_dense(W)
car_dist = CAR.dist(mu, W, alpha, tau, size=size)
car_logp = logp(car_dist, xs).eval()
# Check to make sure that the CAR and MVN log PDFs are equivalent
# up to an additive constant which is independent of the CAR parameters
delta_logp = scipy_logp - car_logp
# Check to make sure all the delta values are identical.
tol = 1e-08
if aesara.config.floatX == "float32":
tol = 1e-5
assert np.allclose(delta_logp - delta_logp[0], 0.0, atol=tol)
@pytest.mark.parametrize(
"sparse",
[False, True],
ids=str,
)
def test_car_matrix_check(sparse):
"""
Tests the check of W matrix symmetry in CARRV.make_node.
"""
np.random.seed(1)
tau = 2
alpha = 0.5
mu = np.zeros(4)
xs = np.random.randn(*mu.shape)
# non-symmetric matrix
W = np.array(
[[0.0, 1.0, 2.0, 0.0], [1.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 1.0], [0.0, 1.0, 1.0, 0.0]]
)
W = aesara.tensor.as_tensor_variable(W)
if sparse:
W = aesara.sparse.csr_from_dense(W)
car_dist = CAR.dist(mu, W, alpha, tau)
with pytest.raises(AssertionError, match="W must be a symmetric adjacency matrix"):
logp(car_dist, xs).eval()
# W.ndim != 2
if not sparse:
W = np.array([0.0, 1.0, 2.0, 0.0])
W = aesara.tensor.as_tensor_variable(W)
with pytest.raises(ValueError, match="W must be a matrix"):
car_dist = CAR.dist(mu, W, alpha, tau)

and

@pytest.mark.parametrize("sparse", [True, False])
def test_car_rng_fn(sparse):
delta = 0.05 # limit for KS p-value
n_fails = 20 # Allows the KS fails a certain number of times
size = (100,)
W = np.array(
[[0.0, 1.0, 1.0, 0.0], [1.0, 0.0, 0.0, 1.0], [1.0, 0.0, 0.0, 1.0], [0.0, 1.0, 1.0, 0.0]]
)
tau = 2
alpha = 0.5
mu = np.array([1, 1, 1, 1])
D = W.sum(axis=0)
prec = tau * (np.diag(D) - alpha * W)
cov = np.linalg.inv(prec)
W = aesara.tensor.as_tensor_variable(W)
if sparse:
W = aesara.sparse.csr_from_dense(W)
with pm.Model(rng_seeder=1):
car = pm.CAR("car", mu, W, alpha, tau, size=size)
mn = pm.MvNormal("mn", mu, cov, size=size)
check = pm.sample_prior_predictive(n_fails, return_inferencedata=False)
p, f = delta, n_fails
while p <= delta and f > 0:
car_smp, mn_smp = check["car"][f - 1, :, :], check["mn"][f - 1, :, :]
p = min(
st.ks_2samp(
np.atleast_1d(car_smp[..., idx]).flatten(),
np.atleast_1d(mn_smp[..., idx]).flatten(),
)[1]
for idx in range(car_smp.shape[-1])
)
f -= 1
assert p > delta

which is what this PR's tests are based on. I also don't remember any literature that treats tau is anything more than a scalar.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the random/logp methods work with non-scalar inputs for those parameters, we need get_moment to do the same. If they are not really supposed to work with those being non-scalar we don't need to, but we might have to open a new issue similar to #5214

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#5214 was fixed in #5241 - the changeset was rather simple. @zoj613 do you want to add the enforcement of scalar tau and alpha, or add them as vectors to the test cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to

ndims_params = [1, 2, 0, 0]

it looks like tau and alpha are both scalars. The docstring seems wrong about allowing array input for those parameters.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ndims_params indicates the minimum/base dimensions, not whether they can't be of higher dimensionality. Idelly we wouldn't impose any limitations, but the rng_fn/logp methods might not broadcast/handle higer values than the minimum case properly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ckrapu is the tau parameter meant to handle array input? I have not seen that case in literature. I also do not see that case being tested in the implementation unittests. I'm asking because it appears you submitted the PR for the distribution. Would you be able to clarify if we need to account for non-scalar tau?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zoj613 I think it's safe to go ahead and add that explicit limitation for the time being. We can always lift it later (and add tests). Do you want to include that in this PR?

Copy link
Contributor Author

@zoj613 zoj613 Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ricardoV94 I think it's best that array tau be handled as a separate PR. I believe there is no good motivation to account for array tau given its lack of application in literature. IIRC the paper linked in the doc-string only mentions tau as an array in the case an MCAR model, which I believe is different from what is implemented by CAR.

Copy link
Member

@ricardoV94 ricardoV94 Jan 26, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lack of application in the literature shouldn't be a reason to restrict it, but instead whether we handling it properly (and have tests to give us confidence we are). I agree it can be done in a separate PR, would you be interested in doing that? I'll go ahead and merge this one after one last review

@mjhajharia
Copy link
Member

@zoj613 any updates on this?!

@zoj613
Copy link
Contributor Author

zoj613 commented Jan 3, 2022

@zoj613 any updates on this?!

The tests pass, however there is still ambiguity about allowing non-scalar input for the tau parameter. The docstring suggests array input as acceptable, but that case is not tested anywhere in the codebase.

@twiecki
Copy link
Member

twiecki commented Jan 7, 2022

@zoj613 Can you add a test for that case?

@ricardoV94 ricardoV94 modified the milestones: v4.0.0b2, v4.0.0b3 Jan 7, 2022
@ricardoV94 ricardoV94 merged commit e0592ec into pymc-devs:main Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants