Response Surfaces¶

Response surfaces are inexpensive surroagates for expensive computational models.

Polynomial Ridge Function¶

class psdr.PolynomialRidgeFunction(basis, coef, U)[source]¶: A polynomial ridge function

Polynomial Ridge Approximation¶

class psdr.PolynomialRidgeApproximation(degree, subspace_dimension, basis='legendre', norm=2, n_init=1, scale=True, keep_data=True, domain=None, bound=None, rotate=True, **kwargs)[source]¶

Constructs a ridge approximation using a total degree approximation

Given a basis of total degree polynomials \(\lbrace \psi_j \rbrace_{j=1}^N\) on \(\mathbb{R}^n\), this class constructs a polynomial ridge function that minimizes the mismatch on a set of points \(\lbrace \mathbf{x}_i\rbrace_{i=1}^M \subset \mathbb{R}^m\) in a \(p\)-norm:

\[\min_{\mathbf{U} \in \mathbb{R}^{m\times n}, \ \mathbf{U}^\top \mathbf{U} = \mathbf I, \ \mathbf{c}\in \mathbb{R}^N } \sqrt[p]{ \sum_{i=1}^M \left|f(\mathbf{x}_i) - \sum_{j=1}^N c_j \psi_j(\mathbf{U}^\top \mathbf{x}_i) \right|^p}\]

This approach assumes \(\mathbf{U}\) is an element of the Grassmann manifold obeying the orthogonality constraint.

For the 2-norm (\(p=2\)) this implementation uses Variable Projection following [HC18] to remove the solution of the linear coefficients \(\mathbf{c}\), leaving an optimization problem posed over the Grassmann manifold alone.

For both the 1-norm and the \(\infty\)-norm, this implementation uses a sequential linear program with a trust region coupled with a nonlinear trajectory through the search space.

Parameters:

degree (int) – Degree of polynomial
subspace_dimension (int) – Dimension of the low-dimensional subspace associated with the ridge approximation.
basis (['legendre', 'monomial', 'chebyshev', 'laguerre', 'hermite']) – Basis for polynomial representation
norm ([1, 2, np.inf, 'inf']) – Norm in which to evaluate the mismatch between the ridge approximation and the data
scale (bool (default:True)) – Scale the coordinates along the ridge to ameliorate ill-conditioning
bound ([None, 'lower', 'upper']) – If ‘lower’ or ‘upper’ construct a lower or upper bound
rotate (bool) – If True, rotate the U matrix to align to the active subspace with average increasing gradients

References

[HC18]

J. M. Hokanson and Paul G. Constantine. Data-driven Polynomial Ridge Approximation Using Variable Projection. SIAM J. Sci. Comput. Vol 40, No 3, pp A1566–A1589, DOI:10.1137/17M1117690.

fit(X, fX, U0=None)[source]¶

Given samples, fit the polynomial ridge approximation.

Parameters:	X (array-like (M, m)) – Input coordinates fX (array-like (M,)) – Evaluations of the function at the samples

Polynomial Ridge Bound¶

Gaussian Process Model¶

class psdr.GaussianProcess(structure='const', degree=None, nugget=None, Lfixed=None, n_init=1)[source]¶

Fits a Gaussian Process by maximizing the marginal likelihood

Given \(M\) pairs of \(\mathbf{x}_i \in \mathbb{R}^m\) and \(y_i \in \mathbb{R}\), this fits a model

\[g(\mathbf{x}) = \sum_{i=1}^M \alpha_i e^{-\| \mathbf{L}( \mathbf{x} - \mathbf{x}_i) \|_2^2} + \sum_{j} \beta_j \psi_j(x)\]

where \(\mathbf{L}\) is a lower triangular matrix, \(\lbrace \psi_j \rbrace_j\) is a polynomial basis (e.g., linear), and \(\boldsymbol{\alpha}, \boldsymbol{\beta}\) are vectors of weights. These parameters are choosen to maximize the log-marginal likelihood (see [RW06], Sec. 2.7), or equivalently minimize

\[\begin{split}\min_{\mathbf{L}} & \ \frac12 \mathbf{y}^\top \boldsymbol{\alpha} + \frac12 \log \det (\mathbf{K} + \tau \mathbf{I}), & \quad [\mathbf{K}]_{i,j} &= e^{-\frac12 \| \mathbf{L}(\mathbf{x}_i -\mathbf{x}_j)\|_2^2} \\ \text{where} & \ \begin{bmatrix} \mathbf{K} + \tau \mathbf{I} & \mathbf{V} \\ \mathbf{V}^\top & \mathbf{0} \end{bmatrix} \begin{bmatrix} \boldsymbol{\alpha} \\ \boldsymbol{\beta} \end{bmatrix} = \begin{bmatrix} \mathbf{y} \\ \mathbf{0} \end{bmatrix}, & \quad [\mathbf{V}]_{i,j} &= \psi_j(\mathbf{x}_i).\end{split}\]

Here \(\tau\) denotes the nugget–a Tikhonov-like regularization term commonly used to combat the ill-conditioning of the kernel matrix \(\mathbf{K}\). The additional constraint for the polynomial basis is described by Jones ([Jon01] eq. (3) and (4)).

This class allows for four kinds of distance matrices \(\mathbf{L}\):

scalar	\(\mathbf{L} = \ell \mathbf{I}\)
scalar multiple	\(\mathbf{L} = \ell \mathbf{L}_0\)
diagonal	\(\mathbf{L} = \text{diag}(\boldsymbol{\ell})\)
lower triangular	\(\mathbf{L}\) is lower triangular

Internally, we optimize with respect to the matrix log; specificially, we parameterize \(\mathbf{L}\) in terms of some scalar or vector \(\boldsymbol{\ell}\); i.e.,

scalar	\(\mathbf{L}(\ell) = e^{\ell}\mathbf{I}\)
scalar multiple	\(\mathbf{L}(\ell) = e^{\ell} \mathbf{L}_0\)
diagonal	\(\mathbf{L}(\ell) = \text{diag}(\lbrace e^{\ell_i} \rbrace_{i})\)
lower triangular	\(\mathbf{L}(\ell) = e^{\text{tril}(\boldsymbol{\ell})}\)

In the constant and diagonal cases, this corresponds to the standard practice of optimizing with respect to the (scalar) log of the scaling parameters. Our experience suggests that the matrix log similarly increases the accuracy when working with the lower triangular parameterization. For the first three classes we have simple expressions for the derivatives, but for the lower triangular parameterization we use complex step approximation to compute the derivative [MN10]. With this derivative information, we solve the optimization problem using L-BFGS as implemented in scipy.

Parameters:

structure (['tril', 'diag', 'const', 'scalar_mult']) –

Structure of the matrix L, either

const: constant * eye
scalar_mult
tril: lower triangular
diag: diagonal

Returns:

L (np.ndarray(m,m)) – Distance matrix
alpha (np.ndarray(M)) – Weights for the Gaussian process kernel
beta (np.ndarray(N)) – Weights for the polynomial component in the LegendreTensorBasis
obj (float) – Log-likelihood objective function value

References

[RW06]

Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, 2006 MIT Press

[Jon01]

A Taxonomy of Global Optimization Methods Based on Response Surfaces, Donald R. Jones, Journal of Global Optimization, 21, pp. 345–383, 2001.

[MN10]

“The complex step approximation to the Frechet derivative of a matrix function”, Awad H. Al-Mohy and Nicholas J. Higham, Numerical Algorithms, 2010 (53), pp. 133–148.

fit(X, y, L0=None)[source]¶

Fit a Gaussian process model

Parameters:	X (array-like (M, m)) – M input coordinates of dimension m y (array-like (M,)) – y[i] is the output at X[i]