Response Surfaces

Response surfaces are inexpensive surroagates for expensive computational models.

Polynomial Ridge Function

class psdr.PolynomialRidgeFunction(basis, coef, U)

A polynomial ridge function

derivative_roots()

Compute the roots of the derivative of the polynomial

profile_grad(X)

gradient of the profile function g

roots()

Compute the roots of the polynomial

set_scale(X, U=None)

Set the normalization map

Polynomial Ridge Approximation

class psdr.PolynomialRidgeApproximation(degree, subspace_dimension, basis='legendre', norm=2, n_init=1, scale=True, keep_data=True, domain=None, bound=None)

Constructs a ridge approximation using a total degree approximation

Given a basis of total degree polynomials \(\lbrace \psi_j \rbrace_{j=1}^N\) on \(\mathbb{R}^n\), this class constructs a polynomial ridge function that minimizes the mismatch on a set of points \(\lbrace \mathbf{x}_i\rbrace_{i=1}^M \subset \mathbb{R}^m\) in a \(p\)-norm:

\[\min_{\mathbf{U} \in \mathbb{R}^{m\times n}, \ \mathbf{U}^\top \mathbf{U} = \mathbf I, \ \mathbf{c}\in \mathbb{R}^N } \sqrt[p]{ \sum_{i=1}^M \left|f(\mathbf{x}_i) - \sum_{j=1}^N c_j \psi_j(\mathbf{U}^\top \mathbf{x}_i) \right|^p}\]

This approach assumes \(\mathbf{U}\) is an element of the Grassmann manifold obeying the orthogonality constraint.

For the 2-norm (\(p=2\)) this implementation uses Variable Projection following [HC18] to remove the solution of the linear coefficients \(\mathbf{c}\), leaving an optimization problem posed over the Grassmann manifold alone.

For both the 1-norm and the \(\infty\)-norm, this implementation uses a sequential linear program with a trust region coupled with a nonlinear trajectory through the search space.

Parameters:
  • degree (int) – Degree of polynomial
  • subspace_dimension (int) – Dimension of the low-dimensional subspace associated with the ridge approximation.
  • basis (['legendre', 'monomial', 'chebyshev', 'laguerre', 'hermite']) – Basis for polynomial representation
  • norm ([1, 2, np.inf, 'inf']) – Norm in which to evaluate the mismatch between the ridge approximation and the data
  • scale (bool (default:True)) – Scale the coordinates along the ridge to ameliorate ill-conditioning
  • bound ([None, 'lower', 'upper']) – If ‘lower’ or ‘upper’ construct a lower or upper bound

References

[HC18]J. M. Hokanson and Paul G. Constantine. Data-driven Polynomial Ridge Approximation Using Variable Projection. SIAM J. Sci. Comput. Vol 40, No 3, pp A1566–A1589, DOI:10.1137/17M1117690.
fit(X, fX, U0=None, **kwargs)

Given samples, fit the polynomial ridge approximation.

Parameters:
  • X (array-like (M, m)) – Input coordinates
  • fX (array-like (M,)) – Evaluations of the function at the samples

Polynomial Ridge Bound

Gaussian Process Model

class psdr.GaussianProcess(structure='const', degree=None, nugget=None, Lfixed=None, n_init=1)

Fits a Gaussian Process by maximizing the marginal likelihood

Given \(M\) pairs of \(\mathbf{x}_i \in \mathbb{R}^m\) and \(y_i \in \mathbb{R}\), this fits a model

\[g(\mathbf{x}) = \sum_{i=1}^M \alpha_i e^{-\| \mathbf{L}( \mathbf{x} - \mathbf{x}_i) \|_2^2} + \sum_{j} \beta_j \psi_j(x)\]

where \(\mathbf{L}\) is a lower triangular matrix, \(\lbrace \psi_j \rbrace_j\) is a polynomial basis (e.g., linear), and \(\boldsymbol{\alpha}, \boldsymbol{\beta}\) are vectors of weights. These parameters are choosen to maximize the log-marginal likelihood (see [RW06], Sec. 2.7), or equivalently minimize

\[\begin{split}\min_{\mathbf{L}} & \ \frac12 \mathbf{y}^\top \boldsymbol{\alpha} + \frac12 \log \det (\mathbf{K} + \tau \mathbf{I}), & \quad [\mathbf{K}]_{i,j} &= e^{-\frac12 \| \mathbf{L}(\mathbf{x}_i -\mathbf{x}_j)\|_2^2} \\ \text{where} & \ \begin{bmatrix} \mathbf{K} + \tau \mathbf{I} & \mathbf{V} \\ \mathbf{V}^\top & \mathbf{0} \end{bmatrix} \begin{bmatrix} \boldsymbol{\alpha} \\ \boldsymbol{\beta} \end{bmatrix} = \begin{bmatrix} \mathbf{y} \\ \mathbf{0} \end{bmatrix}, & \quad [\mathbf{V}]_{i,j} &= \psi_j(\mathbf{x}_i).\end{split}\]

Here \(\tau\) denotes the nugget–a Tikhonov-like regularization term commonly used to combat the ill-conditioning of the kernel matrix \(\mathbf{K}\). The additional constraint for the polynomial basis is described by Jones ([Jon01] eq. (3) and (4)).

This class allows for four kinds of distance matrices \(\mathbf{L}\):

scalar \(\mathbf{L} = \ell \mathbf{I}\)
scalar multiple \(\mathbf{L} = \ell \mathbf{L}_0\)
diagonal \(\mathbf{L} = \text{diag}(\boldsymbol{\ell})\)
lower triangular \(\mathbf{L}\) is lower triangular

Internally, we optimize with respect to the matrix log; specificially, we parameterize \(\mathbf{L}\) in terms of some scalar or vector \(\boldsymbol{\ell}\); i.e.,

scalar \(\mathbf{L}(\ell) = e^{\ell}\mathbf{I}\)
scalar multiple \(\mathbf{L}(\ell) = e^{\ell} \mathbf{L}_0\)
diagonal \(\mathbf{L}(\ell) = \text{diag}(\lbrace e^{\ell_i} \rbrace_{i})\)
lower triangular \(\mathbf{L}(\ell) = e^{\text{tril}(\boldsymbol{\ell})}\)

In the constant and diagonal cases, this corresponds to the standard practice of optimizing with respect to the (scalar) log of the scaling parameters. Our experience suggests that the matrix log similarly increases the accuracy when working with the lower triangular parameterization. For the first three classes we have simple expressions for the derivatives, but for the lower triangular parameterization we use complex step approximation to compute the derivative [MN10]. With this derivative information, we solve the optimization problem using L-BFGS as implemented in scipy.

Parameters:structure (['tril', 'diag', 'const', 'scalar_mult']) –

Structure of the matrix L, either

  • const: constant * eye
  • scalar_mult
  • tril: lower triangular
  • diag: diagonal
Returns:
  • L (np.ndarray(m,m)) – Distance matrix
  • alpha (np.ndarray(M)) – Weights for the Gaussian process kernel
  • beta (np.ndarray(N)) – Weights for the polynomial component in the LegendreTensorBasis
  • obj (float) – Log-likelihood objective function value

References

[RW06]Gaussian Processes for Machine Learning, Carl Edward Rasmussen and Christopher K. I. Williams, 2006 MIT Press
[Jon01]A Taxonomy of Global Optimization Methods Based on Response Surfaces, Donald R. Jones, Journal of Global Optimization, 21, pp. 345–383, 2001.
[MN10]“The complex step approximation to the Frechet derivative of a matrix function”, Awad H. Al-Mohy and Nicholas J. Higham, Numerical Algorithms, 2010 (53), pp. 133–148.
fit(X, y, L0=None)

Fit a Gaussian process model

Parameters:
  • X (array-like (M, m)) – M input coordinates of dimension m
  • y (array-like (M,)) – y[i] is the output at X[i]