Skip to content

BUG: Posterior predictive sampling of Latent GPs runs into LinAlgError at the first iteration #7754

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
michaelosthege opened this issue Apr 13, 2025 · 5 comments · Fixed by #7755
Labels
bug GP Gaussian Process

Comments

@michaelosthege
Copy link
Member

Describe the issue:

I can no longer do posterior predictive sampling with any model involving a gp.Latent.
Every model I tried instantly runs into LinAlgError.

Reproduceable code example:

import numpy as np
import pymc as pm

X = np.linspace(0, 10, 10)
Xnew = np.linspace(0, 10, 100)
Y = np.random.RandomState(666).uniform(-1, 1, size=len(X))

with pm.Model():
    scaling = pm.HalfNormal("scaling")
    ls = pm.Uniform("ls", 0.5, 2)
    cov = scaling**2 * pm.gp.cov.ExpQuad(1, ls=ls)
    mean = pm.gp.mean.Constant(0)

    gp = pm.gp.Latent(mean_func=mean, cov_func=cov)
    ylatent = gp.prior("ylatent", X=X[:, None])

    pm.Normal("likelihood", mu=ylatent, sigma=0.2, observed=Y)

    idata = pm.sample(cores=4, chains=4)

    # Add predictions
    gp.conditional("ypred", Xnew=Xnew[:, None])

    idata.extend(pm.sample_posterior_predictive(idata, var_names=["ypred"]))

Error message:

Sampling: [ypred]
Sampling ... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- / 0:00:00
---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\pytensor\compile\function\types.py:1037, in Function.__call__(self, output_subset, *args, **kwargs)
   1036 try:
-> 1037     outputs = vm() if output_subset is None else vm(output_subset=output_subset)
   1038 except Exception:

File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\pytensor\graph\op.py:531, in Op.make_py_thunk.<locals>.rval(p, i, o, n, cm)
    523 @is_thunk_type
    524 def rval(
    525     p=p,
   (...)    529     cm=node_compute_map,
    530 ):
--> 531     r = p(n, [x[0] for x in i], o)
    532     for entry in cm:

File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\pytensor\tensor\random\op.py:402, in RandomVariable.perform(self, node, inputs, outputs)
    400 outputs[0][0] = rng
    401 outputs[1][0] = np.asarray(
--> 402     self.rng_fn(rng, *args, None if size is None else tuple(size)),
    403     dtype=self.dtype,
    404 )

File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\pytensor\tensor\random\basic.py:911, in MvNormalRV.rng_fn(self, rng, mean, cov, size)
    910 if self.method == "cholesky":
--> 911     A = np_cholesky(cov)
    912 elif self.method == "svd":

File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\numpy\linalg\_linalg.py:839, in cholesky(a, upper)
    837 with errstate(call=_raise_linalgerror_nonposdef, invalid='call',
    838               over='ignore', divide='ignore', under='ignore'):
--> 839     r = gufunc(a, signature=signature)
    840 return wrap(r.astype(result_t, copy=False))

File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\numpy\linalg\_linalg.py:107, in _raise_linalgerror_nonposdef(err, flag)
    106 def _raise_linalgerror_nonposdef(err, flag):
--> 107     raise LinAlgError("Matrix is not positive definite")

LinAlgError: Matrix is not positive definite
Apply node that caused the error: MvNormalRV{name='multivariate_normal', signature='(n),(n,n)->(n)', dtype='float64', inplace=True, method='cholesky'}(RNG(<Generator(PCG64) at 0x28D13C02CE0>), NoneConst{None}, CGemv{inplace}.0, Gemm{inplace}.0)
Toposort index: 29
Inputs types: [RandomGeneratorType, <pytensor.tensor.type_other.NoneTypeT object at 0x0000028D022A8530>, TensorType(float64, shape=(100,)), TensorType(float64, shape=(100, 100))]
Inputs shapes: ['No shapes', 'No shapes', (100,), (100, 100)]
Inputs strides: ['No strides', 'No strides', (8,), (800, 8)]
Inputs values: [Generator(PCG64) at 0x28D13C02CE0, None, 'not shown', 'not shown']
Outputs clients: [[output[1](MvNormalRV{name='multivariate_normal', signature='(n),(n,n)->(n)', dtype='float64', inplace=True, method='cholesky'}.0)], [output[0](ypred)]]

Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
  File "c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\IPython\core\interactiveshell.py", line 3362, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\IPython\core\interactiveshell.py", line 3607, in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\IPython\core\interactiveshell.py", line 3667, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "C:\Users\osthege\AppData\Local\Temp\ipykernel_2180\34222201.py", line 19, in <module>
    gp.conditional("ypred", Xnew=Xnew[:, None])
  File "C:\Users\osthege\Repos\pymc\pymc\gp\gp.py", line 284, in conditional
    f = pm.MvNormal(name, mu=mu, cov=cov, **kwargs)
  File "C:\Users\osthege\Repos\pymc\pymc\distributions\distribution.py", line 505, in __new__
    rv_out = cls.dist(*args, **kwargs)
  File "C:\Users\osthege\Repos\pymc\pymc\distributions\multivariate.py", line 272, in dist
    return super().dist([mu, cov], **kwargs)
  File "C:\Users\osthege\Repos\pymc\pymc\distributions\distribution.py", line 574, in dist
    rv_out = cls.rv_op(*dist_params, size=create_size, **kwargs)

PyMC version information:

  • PyMC main @ 30f3899
  • PyTensor v2.30.3

Context for the issue:

No response

@michaelosthege michaelosthege added bug GP Gaussian Process labels Apr 13, 2025
@ricardoV94
Copy link
Member

The MvNormal now uses cholesky decomp instead of svd by default which is faster but more brittle to extreme/degenerate covariances

@michaelosthege
Copy link
Member Author

What can we do about it? Because gp.Latent is 100 % broken in this state..

This is what the cov produced by Latent._build_conditional looks like:

Image

@michaelosthege
Copy link
Member Author

pymc-examples/.../GP-Latent.ipynb runs into the LinAlgError already in the 3rd cell, with a cov_func(X).eval().min() -> np.float64(3.0859997567422685e-21).

@ricardoV94
Copy link
Member

It's not about min/max it will fail wherever calling np.cholesky would fail. We can pass method="svd" to the MvNormal created by GP (and perhaps also let the user override whatever default we settle on)

michaelosthege added a commit to michaelosthege/pymc that referenced this issue Apr 13, 2025
MvNormal was recently switched to default to use
cholesky decomposition which is very fragile for the
higher-dimensional situations encountered in `Latent` GPs.

Closes pymc-devs#7754
@michaelosthege
Copy link
Member Author

Can confirm that method="svd" works. PR is #7755.

I will also do a PR to fix the GP-Latent example notebook.

michaelosthege added a commit to michaelosthege/pymc-examples that referenced this issue Apr 13, 2025
MvNormal was switched from SVD to Cholesky decomposition by default.
This is brittle for many GPs, therefore PyMC > 5.22.0 will defaults Latent GP
conditionals to `method="svd"`.
The example notebook includes manually created `MvNormal`s that also need
`method="svd"` to work.

I also switched from numpyro back to the default sampler,
because I couldn't install numpyro on my Windows.

See pymc-devs/pymc#7754.
michaelosthege added a commit to michaelosthege/pymc-examples that referenced this issue Apr 13, 2025
MvNormal was switched from SVD to Cholesky decomposition by default.
This is brittle for many GPs, therefore PyMC > 5.22.0 will defaults
Latent GP conditionals to `method="svd"`.
The example notebook includes manually created `MvNormal`s that
also need `method="svd"` to work.

I also switched from numpyro back to the default sampler,
because I couldn't install numpyro on my Windows.

See pymc-devs/pymc#7754.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug GP Gaussian Process
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants