Subspace-based Dimension Reduction¶
Here our goal is to find a subspace along which our quantity of interest varies the most, for various definitions of most.
-
class
psdr.
SubspaceBasedDimensionReduction
[source]¶ Abstract base class for Subspace-Based Dimension Reduction
Given a function \(f : \mathcal{D} \to \mathbb{R}\), subspace-based dimension reduction identifies a subspace, described by a matrix \(\mathbf{U} \in \mathbb{R}^{m\times n}\) with orthonormal columns for some \(n \le m\).
-
U
¶ A matrix defining the ‘important’ directions
Returns: Matrix with orthonormal columns defining the important directions in decreasing order of precidence. Return type: np.ndarray (m, n)
-
approximate_lipschitz
(X=None, fX=None, grads=None, dim=None)[source]¶ Approximate the Lipschitz matrix on the low-dimensional subspace
-
shadow_envelope
(X, fX, ax=None, ngrid=None, pgfname=None, verbose=True, U=None, **kwargs)[source]¶ Draw a 1-d shadow plot of a large number of function samples
Returns: - y (np.ndarray) – Projected coordinates
- lb (np.ndarray) – piecewise linear lower bound values
- ub (np.ndarray) – piecewise linear upper bound values
-
shadow_plot
(X=None, fX=None, dim=1, U=None, ax='auto', pgfname=None)[source]¶ Draw a shadow plot
Parameters: - X (array-like (N,m)) – Input coordinates for function samples
- fX (array-like (N,)) – Values of function at sample points
- dim (int, [1,2]) – Dimension of shadow plot
- U (array-like (?,m); optional) – Subspace onto which to project the data; defaults to the subspace identifed by this class
- ax ('auto', matplotlib.pyplot.axis, or None) – Axis on which to draw the shadow plot
Returns: ax – Axis on which the plot is drawn
Return type: matplotlib.pyplot.axis
-
Active Subspace¶
-
class
psdr.
ActiveSubspace
[source]¶ Computes the active subspace gradient samples
Given the function \(f:\mathcal{D} \to \mathbb{R}\), the active subspace is defined as the eigenvectors corresponding to the largest eigenvalues of the average outer-product of gradients:
\[\mathbf{C} := \int_{\mathbf{x}\in \mathcal{D}} \nabla f(\mathbf{x}) \nabla f(\mathbf{x})^\top \ \mathrm{d}\mathbf{x} \in \mathbb{R}^{m\times m}.\]By default, this class assumes that we are provided with gradients evaluated at random samples over the domain and estimates the matrix \(\mathbf{C}\) using Monte-Carlo integration. However, if provided a weight corresponding to a quadrature rule, this will be used instead to approximate this matrix; i.e.,
\[\mathbf{C} \approx \sum_{i=1}^N w_i \nabla f(\mathbf{x}_i) \nabla f(\mathbf{x}_i)^\top.\]-
fit
(grads, weights=None)[source]¶ Find the active subspace
Parameters: - grads (array-like (N,m)) – Gradient samples of function (tacitly assumed to be uniform on the domain or from a quadrature rule with corresponding weight).
- weights (array-like (N,), optional) – Weights corresponding to a quadrature rule associated with the samples of the gradient.
-
Outer Product Gradient¶
-
class
psdr.
OuterProductGradient
(standardize=True, perplexity=None, bandwidth=None)[source]¶ The Outer Product Gradient approach of Hardle and Stoker
Given pairs of inputs \(\mathbf{x}_j\) and outputs \(f(\mathbf{x}_j)\), this method estimates the active subspace by (1) estimating gradients from these samples and then (2) using the eigendecomposition of the outer product of these approximate gradients The gradients are estimated by fitting a linear model at each input point \(\mathbf{x}_j\) weighting the other points based on a kernel \(k\) that decreases with increasing distance; i.e., at \(\mathbf{x}_j\) we construct the linear model \(\mathbf{a}^\top \mathbf{x} + \beta\): by solving a linear system:
\[\begin{split}\min_{\mathbf{a}, \beta} \left\| \begin{bmatrix} \sqrt{k(\| \mathbf{x}_1 - \mathbf{x}_j\|)} & & & \\ & \sqrt{k(\| \mathbf{x}_2 - \mathbf{x}_j\|)} & & \\ & & \ddots & \\ & & & \sqrt{k(\|\mathbf{x}_M - \mathbf{x}_j\|)} \end{bmatrix} \left[ \begin{bmatrix} \mathbf{x}_1^\top - \mathbf{x}_j^\top & 1 \\ \mathbf{x}_2^\top - \mathbf{x}_j^\top & 1 \\ \vdots & \vdots \\ \mathbf{x}_M^\top - \mathbf{x}_j^\top & 1 \\ \end{bmatrix} \begin{bmatrix} \mathbf{a} \\ \beta \end{bmatrix} - \begin{bmatrix} f(\mathbf{x}_1) \\ f(\mathbf{x}_2) \\ \vdots \\ f(\mathbf{x}_M) \end{bmatrix} \right] \right\|_2.\end{split}\]Our implementation is a modification of the algorithm presented by Li [Li18]. Gradients are estimated by calling
psdr.local_linear_grad
which uses a more numerically stable approach for estimating gradients than that above. Additionally, by default we use a per-location bandwidth based on perplexity.Parameters: - standardize (bool) – If True, standardize input X to have zero mean and identity covariance.
- perplexity (None or float) – Entropy target for choosing bandwidth
- bandwidth (None, 'xia' or positive float) – If specified, set a global bandwidth. ‘xia’ uses the heuristic suggested in [Li18].
References
[HS89] W. Hardle and T. M. Stoker. Investigating Smooth Multiple Regression by the Method of Average Derivatives J. Am. Stat. Ass. Vol 84, No 408, (1989). pp 986–995, DOI:0.1080/01621459.1989.10478863 [Li18] (1, 2) Bing Li. Sufficient Dimension Reduction: Methods and Applications with R. CRC Press, 2018.
Lipschitz Matrix¶
-
class
psdr.
LipschitzMatrix
(epsilon=None, method='cvxopt', L=None, **kwargs)[source]¶ Constructs the subspace-based dimension reduction from the Lipschitz Matrix.
The Lipschitz matrix \(\mathbf{L} \in \mathbb{R}^{m \times m}\) a matrix that acts analogously to the Lipschitz constant, defining a function class where
\[\lbrace f: \mathbb{R}^m\to \mathbb{R}: |f(\mathbf{x}_1) - f(\mathbf{x}_2)| \le \|\mathbf{L}(\mathbf{x}_1 - \mathbf{x}_2\|_2 \rbrace.\]In general we cannot determine the Lipschitz matrix analytically. Instead we seek to estimate it via the lower bound based on samples \(\lbrace \mathbf{x}_i, f(\mathbf{x}_i)\rbrace_i\) and/or gradients \(\lbrace \nabla f(\mathbf{x}_i)\rbrace_i\). Here we do so by solving a semidefinite program for the symmetric positive definite matrix \(\mathbf{M} \in \mathbb{R}^{m\times m}\):
\[\begin{split}\min_{\mathbf{H} \in \mathbb{S}^{m}_+} & \ \text{Trace } \mathbf{H} \\ \text{such that} & \ |f(\mathbf{x}_i) - f(\mathbf{x}_j)|^2 \le (\mathbf{x}_i - \mathbf{x}_j)^\top \mathbf{H} (\mathbf{x}_i - \mathbf{x}_j) \\ & \ \nabla f(\mathbf{x}_k) \nabla f(\mathbf{x}_k)^\top \preceq \mathbf{H}\end{split}\]If the parameter
epsilon
is passed, the \(\epsilon\)-Lipschitz matrix is computed instead by solving the semidefinite program\[\begin{split}\min_{\mathbf{H} \in \mathbb{S}^{m}_+} & \ \text{Trace } \mathbf{H} \\ \text{such that} & \ (|f(\mathbf{x}_i) - f(\mathbf{x}_j)| -\epsilon)^2 \le (\mathbf{x}_i - \mathbf{x}_j)^\top \mathbf{H} (\mathbf{x}_i - \mathbf{x}_j).\end{split}\]Note that when computing the \(\epsilon\)-Lipschitz matrix, gradient information cannot be used because gradients no longer constrain the Lipschitz matrix.
Parameters: - epsilon (float or None) – Tolerance to be used if find the \(\epsilon\)-Lipschitz matrix
- **kwargs (dict (optional)) – Additional parameters to pass to cvxpy
-
H
¶ The symmetric positive definite solution to the semidefinite program
-
L
¶ The Lipschitz matrix estimate based on samples
-
fit
(X=None, fX=None, grads=None)[source]¶ Estimate the Lipschitz matrix from data
Parameters: - X (array-like (N, m), optional) – Input coordinates for function samples
- fX (array-like (N,), optional) – Values of the function at X[i]
- grads (array-like (N,m), optional) – Gradients of the function evaluated anywhere
-
shadow_uncertainty
(domain, X, fX, ax=None, ngrid=50, dim=1, U=None, pgfname=None, plot_kwargs={}, progress=False, **kwargs)[source]¶ Plot the uncertainty intervals given data
-
uncertainty
(X, fX, Xtest)[source]¶ Compute range of possible values at test points.
Given pairs of inputs \(\widehat{\mathbf{x}}_i\) and outputs \(y_i := f(\widehat{\mathbf{x}}_i)\) as well as the Lipschitz matrix \(\mathbf{L}\), compute the uncertainty interval associated with the provided points \(\mathbf{x}_i\); namely,
\[\mathcal{U}(\mathbf{x}) = \left[ \max_{j=1,\ldots,M} y_j - \|\mathbf{L} (\mathbf{x} - \widehat{\mathbf{x}}_i)\|_2, \ \min_{j=1,\ldots,M} y_j + \|\mathbf{L} (\mathbf{x} - \widehat{\mathbf{x}}_i)\|_2 \right].\]Parameters: - X (array-like (M, m)) – Array of input coordinates
- fX (array-like (M,)) – Array of function values
- Xtest (array-like (N, m)) – Array of places at which to determine uncertainty
Returns: - lb (array-like (N,)) – Lower bounds
- ub (array-like (N,)) – Upper bounds
-
uncertainty_domain
(X, fX, domain, Nsamp=100, verbose=False, progress=False, tqdm_kwargs={}, **kwargs)[source]¶ Compute the uncertainty associated with a set inside the domain
This estimates the uncertainty associated with a subset of the domain, essentially finding points \(\mathbf{x} \in \mathcal{S}\) corresponding to extrema of the uncertainty interval
\[\mathcal{U}(\mathcal{S} ) = \left[ \min_{\mathbf{x} \in \mathcal{S}} \max_{j=1,\ldots,M} y_j - \|\mathbf{L} (\mathbf{x} - \widehat{\mathbf{x}}_i)\|_2, \ \max_{\mathbf{x} \in \mathcal{S}} \min_{j=1,\ldots,M} y_j + \|\mathbf{L} (\mathbf{x} - \widehat{\mathbf{x}}_i)\|_2 \right].\]Parameters: - X (array-like (M, m)) – Array of input coordinates
- fX (array-like (M,)) – Array of function values
- domain (Domain) – Domain on which to find exterma of limits
- Nsamp (int, optional) – Number of Voronoi vertices to sample to find exterma
- verbose (bool (default: False)) – If True, print the iteration history for the optimization program for each initalization
- progress (bool (default: False)) – If True, show a progress bar for trying different initializations
- tqdm_kwargs (dict, optional) – Additional arguments to pass to tqdm for progress plotting
Returns: - lb (float) – Lower bound
- ub (float) – Upper bound