Subspace-based Dimension Reduction

Here our goal is to find a subspace along which our quantity of interest varies the most, for various definitions of most.

class psdr.SubspaceBasedDimensionReduction[source]

Abstract base class for Subspace-Based Dimension Reduction

Given a function \(f : \mathcal{D} \to \mathbb{R}\), subspace-based dimension reduction identifies a subspace, described by a matrix \(\mathbf{U} \in \mathbb{R}^{m\times n}\) with orthonormal columns for some \(n \le m\).

U

A matrix defining the ‘important’ directions

Returns:Matrix with orthonormal columns defining the important directions in decreasing order of precidence.
Return type:np.ndarray (m, n)
approximate_lipschitz(X=None, fX=None, grads=None, dim=None)[source]

Approximate the Lipschitz matrix on the low-dimensional subspace

shadow_envelope(X, fX, ax=None, ngrid=None, pgfname=None, verbose=True, U=None, **kwargs)[source]

Draw a 1-d shadow plot of a large number of function samples

Returns:
  • y (np.ndarray) – Projected coordinates
  • lb (np.ndarray) – piecewise linear lower bound values
  • ub (np.ndarray) – piecewise linear upper bound values
shadow_plot(X=None, fX=None, dim=1, U=None, ax='auto', pgfname=None)[source]

Draw a shadow plot

Parameters:
  • X (array-like (N,m)) – Input coordinates for function samples
  • fX (array-like (N,)) – Values of function at sample points
  • dim (int, [1,2]) – Dimension of shadow plot
  • U (array-like (?,m); optional) – Subspace onto which to project the data; defaults to the subspace identifed by this class
  • ax ('auto', matplotlib.pyplot.axis, or None) – Axis on which to draw the shadow plot
Returns:

ax – Axis on which the plot is drawn

Return type:

matplotlib.pyplot.axis

Active Subspace

class psdr.ActiveSubspace[source]

Computes the active subspace gradient samples

Given the function \(f:\mathcal{D} \to \mathbb{R}\), the active subspace is defined as the eigenvectors corresponding to the largest eigenvalues of the average outer-product of gradients:

\[\mathbf{C} := \int_{\mathbf{x}\in \mathcal{D}} \nabla f(\mathbf{x}) \nabla f(\mathbf{x})^\top \ \mathrm{d}\mathbf{x} \in \mathbb{R}^{m\times m}.\]

By default, this class assumes that we are provided with gradients evaluated at random samples over the domain and estimates the matrix \(\mathbf{C}\) using Monte-Carlo integration. However, if provided a weight corresponding to a quadrature rule, this will be used instead to approximate this matrix; i.e.,

\[\mathbf{C} \approx \sum_{i=1}^N w_i \nabla f(\mathbf{x}_i) \nabla f(\mathbf{x}_i)^\top.\]
fit(grads, weights=None)[source]

Find the active subspace

Parameters:
  • grads (array-like (N,m)) – Gradient samples of function (tacitly assumed to be uniform on the domain or from a quadrature rule with corresponding weight).
  • weights (array-like (N,), optional) – Weights corresponding to a quadrature rule associated with the samples of the gradient.

Outer Product Gradient

class psdr.OuterProductGradient(standardize=True, perplexity=None, bandwidth=None)[source]

The Outer Product Gradient approach of Hardle and Stoker

Given pairs of inputs \(\mathbf{x}_j\) and outputs \(f(\mathbf{x}_j)\), this method estimates the active subspace by (1) estimating gradients from these samples and then (2) using the eigendecomposition of the outer product of these approximate gradients The gradients are estimated by fitting a linear model at each input point \(\mathbf{x}_j\) weighting the other points based on a kernel \(k\) that decreases with increasing distance; i.e., at \(\mathbf{x}_j\) we construct the linear model \(\mathbf{a}^\top \mathbf{x} + \beta\): by solving a linear system:

\[\begin{split}\min_{\mathbf{a}, \beta} \left\| \begin{bmatrix} \sqrt{k(\| \mathbf{x}_1 - \mathbf{x}_j\|)} & & & \\ & \sqrt{k(\| \mathbf{x}_2 - \mathbf{x}_j\|)} & & \\ & & \ddots & \\ & & & \sqrt{k(\|\mathbf{x}_M - \mathbf{x}_j\|)} \end{bmatrix} \left[ \begin{bmatrix} \mathbf{x}_1^\top - \mathbf{x}_j^\top & 1 \\ \mathbf{x}_2^\top - \mathbf{x}_j^\top & 1 \\ \vdots & \vdots \\ \mathbf{x}_M^\top - \mathbf{x}_j^\top & 1 \\ \end{bmatrix} \begin{bmatrix} \mathbf{a} \\ \beta \end{bmatrix} - \begin{bmatrix} f(\mathbf{x}_1) \\ f(\mathbf{x}_2) \\ \vdots \\ f(\mathbf{x}_M) \end{bmatrix} \right] \right\|_2.\end{split}\]

Our implementation is a modification of the algorithm presented by Li [Li18]. Gradients are estimated by calling psdr.local_linear_grad which uses a more numerically stable approach for estimating gradients than that above. Additionally, by default we use a per-location bandwidth based on perplexity.

Parameters:
  • standardize (bool) – If True, standardize input X to have zero mean and identity covariance.
  • perplexity (None or float) – Entropy target for choosing bandwidth
  • bandwidth (None, 'xia' or positive float) – If specified, set a global bandwidth. ‘xia’ uses the heuristic suggested in [Li18].

References

[HS89]W. Hardle and T. M. Stoker. Investigating Smooth Multiple Regression by the Method of Average Derivatives J. Am. Stat. Ass. Vol 84, No 408, (1989). pp 986–995, DOI:0.1080/01621459.1989.10478863
[Li18](1, 2) Bing Li. Sufficient Dimension Reduction: Methods and Applications with R. CRC Press, 2018.
fit(X, fX)[source]
Parameters:
  • X (array-like (M,m)) – Location of function samples
  • fX (array-like (M,)) – Function samples where fX[j] = f(X[j])

Lipschitz Matrix

class psdr.LipschitzMatrix(epsilon=None, method='cvxopt', L=None, **kwargs)[source]

Constructs the subspace-based dimension reduction from the Lipschitz Matrix.

The Lipschitz matrix \(\mathbf{L} \in \mathbb{R}^{m \times m}\) a matrix that acts analogously to the Lipschitz constant, defining a function class where

\[\lbrace f: \mathbb{R}^m\to \mathbb{R}: |f(\mathbf{x}_1) - f(\mathbf{x}_2)| \le \|\mathbf{L}(\mathbf{x}_1 - \mathbf{x}_2\|_2 \rbrace.\]

In general we cannot determine the Lipschitz matrix analytically. Instead we seek to estimate it via the lower bound based on samples \(\lbrace \mathbf{x}_i, f(\mathbf{x}_i)\rbrace_i\) and/or gradients \(\lbrace \nabla f(\mathbf{x}_i)\rbrace_i\). Here we do so by solving a semidefinite program for the symmetric positive definite matrix \(\mathbf{M} \in \mathbb{R}^{m\times m}\):

\[\begin{split}\min_{\mathbf{H} \in \mathbb{S}^{m}_+} & \ \text{Trace } \mathbf{H} \\ \text{such that} & \ |f(\mathbf{x}_i) - f(\mathbf{x}_j)|^2 \le (\mathbf{x}_i - \mathbf{x}_j)^\top \mathbf{H} (\mathbf{x}_i - \mathbf{x}_j) \\ & \ \nabla f(\mathbf{x}_k) \nabla f(\mathbf{x}_k)^\top \preceq \mathbf{H}\end{split}\]

If the parameter epsilon is passed, the \(\epsilon\)-Lipschitz matrix is computed instead by solving the semidefinite program

\[\begin{split}\min_{\mathbf{H} \in \mathbb{S}^{m}_+} & \ \text{Trace } \mathbf{H} \\ \text{such that} & \ (|f(\mathbf{x}_i) - f(\mathbf{x}_j)| -\epsilon)^2 \le (\mathbf{x}_i - \mathbf{x}_j)^\top \mathbf{H} (\mathbf{x}_i - \mathbf{x}_j).\end{split}\]

Note that when computing the \(\epsilon\)-Lipschitz matrix, gradient information cannot be used because gradients no longer constrain the Lipschitz matrix.

Parameters:
  • epsilon (float or None) – Tolerance to be used if find the \(\epsilon\)-Lipschitz matrix
  • **kwargs (dict (optional)) – Additional parameters to pass to cvxpy
H

The symmetric positive definite solution to the semidefinite program

L

The Lipschitz matrix estimate based on samples

fit(X=None, fX=None, grads=None)[source]

Estimate the Lipschitz matrix from data

Parameters:
  • X (array-like (N, m), optional) – Input coordinates for function samples
  • fX (array-like (N,), optional) – Values of the function at X[i]
  • grads (array-like (N,m), optional) – Gradients of the function evaluated anywhere
shadow_uncertainty(domain, X, fX, ax=None, ngrid=50, dim=1, U=None, pgfname=None, plot_kwargs={}, progress=False, **kwargs)[source]

Plot the uncertainty intervals given data

uncertainty(X, fX, Xtest)[source]

Compute range of possible values at test points.

Given pairs of inputs \(\widehat{\mathbf{x}}_i\) and outputs \(y_i := f(\widehat{\mathbf{x}}_i)\) as well as the Lipschitz matrix \(\mathbf{L}\), compute the uncertainty interval associated with the provided points \(\mathbf{x}_i\); namely,

\[\mathcal{U}(\mathbf{x}) = \left[ \max_{j=1,\ldots,M} y_j - \|\mathbf{L} (\mathbf{x} - \widehat{\mathbf{x}}_i)\|_2, \ \min_{j=1,\ldots,M} y_j + \|\mathbf{L} (\mathbf{x} - \widehat{\mathbf{x}}_i)\|_2 \right].\]
Parameters:
  • X (array-like (M, m)) – Array of input coordinates
  • fX (array-like (M,)) – Array of function values
  • Xtest (array-like (N, m)) – Array of places at which to determine uncertainty
Returns:

  • lb (array-like (N,)) – Lower bounds
  • ub (array-like (N,)) – Upper bounds

uncertainty_domain(X, fX, domain, Nsamp=100, verbose=False, progress=False, tqdm_kwargs={}, **kwargs)[source]

Compute the uncertainty associated with a set inside the domain

This estimates the uncertainty associated with a subset of the domain, essentially finding points \(\mathbf{x} \in \mathcal{S}\) corresponding to extrema of the uncertainty interval

\[\mathcal{U}(\mathcal{S} ) = \left[ \min_{\mathbf{x} \in \mathcal{S}} \max_{j=1,\ldots,M} y_j - \|\mathbf{L} (\mathbf{x} - \widehat{\mathbf{x}}_i)\|_2, \ \max_{\mathbf{x} \in \mathcal{S}} \min_{j=1,\ldots,M} y_j + \|\mathbf{L} (\mathbf{x} - \widehat{\mathbf{x}}_i)\|_2 \right].\]
Parameters:
  • X (array-like (M, m)) – Array of input coordinates
  • fX (array-like (M,)) – Array of function values
  • domain (Domain) – Domain on which to find exterma of limits
  • Nsamp (int, optional) – Number of Voronoi vertices to sample to find exterma
  • verbose (bool (default: False)) – If True, print the iteration history for the optimization program for each initalization
  • progress (bool (default: False)) – If True, show a progress bar for trying different initializations
  • tqdm_kwargs (dict, optional) – Additional arguments to pass to tqdm for progress plotting
Returns:

  • lb (float) – Lower bound
  • ub (float) – Upper bound