BUG: Posterior predictive sampling of Latent GPs runs into `LinAlgError` at the first iteration #7754

michaelosthege · 2025-04-13T12:44:03Z

Describe the issue:

I can no longer do posterior predictive sampling with any model involving a gp.Latent.
Every model I tried instantly runs into LinAlgError.

Reproduceable code example:

import numpy as np
import pymc as pm

X = np.linspace(0, 10, 10)
Xnew = np.linspace(0, 10, 100)
Y = np.random.RandomState(666).uniform(-1, 1, size=len(X))

with pm.Model():
    scaling = pm.HalfNormal("scaling")
    ls = pm.Uniform("ls", 0.5, 2)
    cov = scaling**2 * pm.gp.cov.ExpQuad(1, ls=ls)
    mean = pm.gp.mean.Constant(0)

    gp = pm.gp.Latent(mean_func=mean, cov_func=cov)
    ylatent = gp.prior("ylatent", X=X[:, None])

    pm.Normal("likelihood", mu=ylatent, sigma=0.2, observed=Y)

    idata = pm.sample(cores=4, chains=4)

    # Add predictions
    gp.conditional("ypred", Xnew=Xnew[:, None])

    idata.extend(pm.sample_posterior_predictive(idata, var_names=["ypred"]))

Error message:

Sampling: [ypred]
Sampling ... ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━   0% -:--:-- / 0:00:00
---------------------------------------------------------------------------
LinAlgError                               Traceback (most recent call last)
File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\pytensor\compile\function\types.py:1037, in Function.__call__(self, output_subset, *args, **kwargs)
   1036 try:
-> 1037     outputs = vm() if output_subset is None else vm(output_subset=output_subset)
   1038 except Exception:

File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\pytensor\graph\op.py:531, in Op.make_py_thunk.<locals>.rval(p, i, o, n, cm)
    523 @is_thunk_type
    524 def rval(
    525     p=p,
   (...)    529     cm=node_compute_map,
    530 ):
--> 531     r = p(n, [x[0] for x in i], o)
    532     for entry in cm:

File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\pytensor\tensor\random\op.py:402, in RandomVariable.perform(self, node, inputs, outputs)
    400 outputs[0][0] = rng
    401 outputs[1][0] = np.asarray(
--> 402     self.rng_fn(rng, *args, None if size is None else tuple(size)),
    403     dtype=self.dtype,
    404 )

File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\pytensor\tensor\random\basic.py:911, in MvNormalRV.rng_fn(self, rng, mean, cov, size)
    910 if self.method == "cholesky":
--> 911     A = np_cholesky(cov)
    912 elif self.method == "svd":

File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\numpy\linalg\_linalg.py:839, in cholesky(a, upper)
    837 with errstate(call=_raise_linalgerror_nonposdef, invalid='call',
    838               over='ignore', divide='ignore', under='ignore'):
--> 839     r = gufunc(a, signature=signature)
    840 return wrap(r.astype(result_t, copy=False))

File c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\numpy\linalg\_linalg.py:107, in _raise_linalgerror_nonposdef(err, flag)
    106 def _raise_linalgerror_nonposdef(err, flag):
--> 107     raise LinAlgError("Matrix is not positive definite")

LinAlgError: Matrix is not positive definite
Apply node that caused the error: MvNormalRV{name='multivariate_normal', signature='(n),(n,n)->(n)', dtype='float64', inplace=True, method='cholesky'}(RNG(<Generator(PCG64) at 0x28D13C02CE0>), NoneConst{None}, CGemv{inplace}.0, Gemm{inplace}.0)
Toposort index: 29
Inputs types: [RandomGeneratorType, <pytensor.tensor.type_other.NoneTypeT object at 0x0000028D022A8530>, TensorType(float64, shape=(100,)), TensorType(float64, shape=(100, 100))]
Inputs shapes: ['No shapes', 'No shapes', (100,), (100, 100)]
Inputs strides: ['No strides', 'No strides', (8,), (800, 8)]
Inputs values: [Generator(PCG64) at 0x28D13C02CE0, None, 'not shown', 'not shown']
Outputs clients: [[output[1](MvNormalRV{name='multivariate_normal', signature='(n),(n,n)->(n)', dtype='float64', inplace=True, method='cholesky'}.0)], [output[0](ypred)]]

Backtrace when the node is created (use PyTensor flag traceback__limit=N to make it longer):
  File "c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\IPython\core\interactiveshell.py", line 3362, in run_cell_async
    has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
  File "c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\IPython\core\interactiveshell.py", line 3607, in run_ast_nodes
    if await self.run_code(code, result, async_=asy):
  File "c:\Users\osthege\AppData\Local\miniforge3\envs\pmdev\Lib\site-packages\IPython\core\interactiveshell.py", line 3667, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "C:\Users\osthege\AppData\Local\Temp\ipykernel_2180\34222201.py", line 19, in <module>
    gp.conditional("ypred", Xnew=Xnew[:, None])
  File "C:\Users\osthege\Repos\pymc\pymc\gp\gp.py", line 284, in conditional
    f = pm.MvNormal(name, mu=mu, cov=cov, **kwargs)
  File "C:\Users\osthege\Repos\pymc\pymc\distributions\distribution.py", line 505, in __new__
    rv_out = cls.dist(*args, **kwargs)
  File "C:\Users\osthege\Repos\pymc\pymc\distributions\multivariate.py", line 272, in dist
    return super().dist([mu, cov], **kwargs)
  File "C:\Users\osthege\Repos\pymc\pymc\distributions\distribution.py", line 574, in dist
    rv_out = cls.rv_op(*dist_params, size=create_size, **kwargs)

PyMC version information:

PyMC main @ 30f3899
PyTensor v2.30.3

Context for the issue:

No response

The text was updated successfully, but these errors were encountered:

ricardoV94 · 2025-04-13T13:17:30Z

The MvNormal now uses cholesky decomp instead of svd by default which is faster but more brittle to extreme/degenerate covariances

michaelosthege · 2025-04-13T14:13:37Z

What can we do about it? Because gp.Latent is 100 % broken in this state..

This is what the cov produced by Latent._build_conditional looks like:

michaelosthege · 2025-04-13T14:21:05Z

pymc-examples/.../GP-Latent.ipynb runs into the LinAlgError already in the 3rd cell, with a cov_func(X).eval().min() -> np.float64(3.0859997567422685e-21).

ricardoV94 · 2025-04-13T14:22:51Z

It's not about min/max it will fail wherever calling np.cholesky would fail. We can pass method="svd" to the MvNormal created by GP (and perhaps also let the user override whatever default we settle on)

MvNormal was recently switched to default to use cholesky decomposition which is very fragile for the higher-dimensional situations encountered in `Latent` GPs. Closes pymc-devs#7754

michaelosthege · 2025-04-13T14:39:30Z

Can confirm that method="svd" works. PR is #7755.

I will also do a PR to fix the GP-Latent example notebook.

MvNormal was switched from SVD to Cholesky decomposition by default. This is brittle for many GPs, therefore PyMC > 5.22.0 will defaults Latent GP conditionals to `method="svd"`. The example notebook includes manually created `MvNormal`s that also need `method="svd"` to work. I also switched from numpyro back to the default sampler, because I couldn't install numpyro on my Windows. See pymc-devs/pymc#7754.

michaelosthege added bug GP Gaussian Process labels Apr 13, 2025

michaelosthege mentioned this issue Apr 13, 2025

Default to SVD for MvNormal in Latent GP conditionals #7755

Merged

10 tasks

michaelosthege closed this as completed in #7755 Apr 13, 2025

michaelosthege closed this as completed in 2842401 Apr 13, 2025

michaelosthege mentioned this issue Apr 13, 2025

Rerun Latent GP with method="svd" for MvNormals pymc-devs/pymc-examples#784

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BUG: Posterior predictive sampling of Latent GPs runs into `LinAlgError` at the first iteration #7754

BUG: Posterior predictive sampling of Latent GPs runs into `LinAlgError` at the first iteration #7754

michaelosthege commented Apr 13, 2025

ricardoV94 commented Apr 13, 2025

michaelosthege commented Apr 13, 2025

michaelosthege commented Apr 13, 2025

ricardoV94 commented Apr 13, 2025

michaelosthege commented Apr 13, 2025

BUG: Posterior predictive sampling of Latent GPs runs into LinAlgError at the first iteration #7754

BUG: Posterior predictive sampling of Latent GPs runs into LinAlgError at the first iteration #7754

Comments

michaelosthege commented Apr 13, 2025

Describe the issue:

Reproduceable code example:

Error message:

PyMC version information:

Context for the issue:

ricardoV94 commented Apr 13, 2025

michaelosthege commented Apr 13, 2025

michaelosthege commented Apr 13, 2025

ricardoV94 commented Apr 13, 2025

michaelosthege commented Apr 13, 2025

BUG: Posterior predictive sampling of Latent GPs runs into `LinAlgError` at the first iteration #7754

BUG: Posterior predictive sampling of Latent GPs runs into `LinAlgError` at the first iteration #7754