Class Interface
Full Reference Metrics
Structural Similarity (SSIM)
- class piq.SSIMLoss(kernel_size: int = 11, kernel_sigma: float = 1.5, k1: float = 0.01, k2: float = 0.03, downsample: bool = True, reduction: str = 'mean', data_range: Union[int, float] = 1.0)
Creates a criterion that measures the structural similarity index error between each element in the input \(x\) and target \(y\).
To match performance with skimage and tensorflow set
'downsample' = True
.The unreduced (i.e. with
reduction
set to'none'
) loss can be described as:\[\begin{split}SSIM = \{ssim_1,\dots,ssim_{N \times C}\}\\ ssim_{l}(x, y) = \frac{(2 \mu_x \mu_y + c_1) (2 \sigma_{xy} + c_2)} {(\mu_x^2 +\mu_y^2 + c_1)(\sigma_x^2 +\sigma_y^2 + c_2)},\end{split}\]where \(N\) is the batch size, C is the channel size. If
reduction
is not'none'
(default'mean'
), then:\[\begin{split}SSIMLoss(x, y) = \begin{cases} \operatorname{mean}(1 - SSIM), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(1 - SSIM), & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]\(x\) and \(y\) are tensors of arbitrary shapes with a total of \(n\) elements each.
The sum operation still operates over all the elements, and divides by \(n\). The division by \(n\) can be avoided if one sets
reduction = 'sum'
. In case of 5D input tensors, complex value is returned as a tensor of size 2.- Parameters:
kernel_size – By default, the mean and covariance of a pixel is obtained by convolution with given filter_size.
kernel_sigma – Standard deviation for Gaussian kernel.
k1 – Coefficient related to c1 in the above equation.
k2 – Coefficient related to c2 in the above equation.
downsample – Perform average pool before SSIM computation. Default: True
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
data_range – Maximum value range of images (usually 1.0 or 255).
Examples
>>> loss = SSIMLoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13, 600-612. https://ece.uwaterloo.ca/~z70wang/publications/ssim.pdf, DOI:10.1109/TIP.2003.819861
- forward(x: Tensor, y: Tensor) Tensor
Computation of Structural Similarity (SSIM) index as a loss function.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\) or \((N, C, H, W, 2)\).
y – A target tensor. Shape \((N, C, H, W)\) or \((N, C, H, W, 2)\).
- Returns:
Value of SSIM loss to be minimized, i.e
1 - ssim
in [0, 1] range. In case of 5D input tensors, complex value is returned as a tensor of size 2.
Multi-Scale Structural Similarity (MS-SSIM)
- class piq.MultiScaleSSIMLoss(kernel_size: int = 11, kernel_sigma: float = 1.5, k1: float = 0.01, k2: float = 0.03, scale_weights: Optional[Tensor] = None, reduction: str = 'mean', data_range: Union[int, float] = 1.0)
Creates a criterion that measures the multi-scale structural similarity index error between each element in the input \(x\) and target \(y\). The unreduced (i.e. with
reduction
set to'none'
) loss can be described as:\[\begin{split}MSSIM = \{mssim_1,\dots,mssim_{N \times C}\}, \\ mssim_{l}(x, y) = \frac{(2 \mu_{x,m} \mu_{y,m} + c_1) } {(\mu_{x,m}^2 +\mu_{y,m}^2 + c_1)} \prod_{j=1}^{m - 1} \frac{(2 \sigma_{xy,j} + c_2)}{(\sigma_{x,j}^2 +\sigma_{y,j}^2 + c_2)}\end{split}\]where \(N\) is the batch size, C is the channel size, m is the scale level (Default: 5). If
reduction
is not'none'
(default'mean'
), then:\[\begin{split}MultiscaleSSIMLoss(x, y) = \begin{cases} \operatorname{mean}(1 - MSSIM), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(1 - MSSIM), & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]For colour images channel order is RGB. In case of 5D input tensors, complex value is returned as a tensor of size 2.
- Parameters:
kernel_size – By default, the mean and covariance of a pixel is obtained by convolution with given filter_size. Must be an odd value.
kernel_sigma – Standard deviation for Gaussian kernel.
k1 – Coefficient related to c1 in the above equation.
k2 – Coefficient related to c2 in the above equation.
scale_weights – Weights for different scales. If
None
, default weights from the paper will be used. Default weights: (0.0448, 0.2856, 0.3001, 0.2363, 0.1333).reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
data_range – Maximum value range of images (usually 1.0 or 255).
Examples
>>> loss = MultiScaleSSIMLoss() >>> input = torch.rand(3, 3, 256, 256, requires_grad=True) >>> target = torch.rand(3, 3, 256, 256) >>> output = loss(input, target) >>> output.backward()
References
Wang, Z., Simoncelli, E. P., Bovik, A. C. (2003). Multi-scale Structural Similarity for Image Quality Assessment. IEEE Asilomar Conference on Signals, Systems and Computers, 37, https://ieeexplore.ieee.org/document/1292216 DOI:10.1109/ACSSC.2003.1292216
Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13, 600-612. https://ece.uwaterloo.ca/~z70wang/publications/ssim.pdf, DOI:10.1109/TIP.2003.819861
Note
The size of the image should be at least
(kernel_size - 1) * 2 ** (levels - 1) + 1
.- forward(x: Tensor, y: Tensor) Tensor
Computation of Multi-scale Structural Similarity (MS-SSIM) index as a loss function. For colour images channel order is RGB.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\) or \((N, C, H, W, 2)\).
y – A target tensor. Shape \((N, C, H, W)\) or \((N, C, H, W, 2)\).
- Returns:
Value of MS-SSIM loss to be minimized, i.e.
1 - ms_ssim
in [0, 1] range. In case of 5D tensor, complex value is returned as a tensor of size 2.
Information Content Weighted Structural Similarity (IW-SSIM)
- class piq.InformationWeightedSSIMLoss(data_range: Union[int, float] = 1.0, kernel_size: int = 11, kernel_sigma: float = 1.5, k1: float = 0.01, k2: float = 0.03, parent: bool = True, blk_size: int = 3, sigma_nsq: float = 0.4, scale_weights: Optional[Tensor] = None, reduction: str = 'mean')
Creates a criterion that measures the Interface of Information Content Weighted Structural Similarity (IW-SSIM) index error betweeneach element in the input \(x\) and target \(y\).
Inputs supposed to be in range
[0, data_range]
.If
reduction
is not'none'
(default'mean'
), then:\[\begin{split}InformationWeightedSSIMLoss(x, y) = \begin{cases} \operatorname{mean}(1 - IWSSIM), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(1 - IWSSIM), & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]- Parameters:
data_range – Maximum value range of images (usually 1.0 or 255).
kernel_size – The side-length of the sliding window used in comparison. Must be an odd value.
kernel_sigma – Sigma of normal distribution for sliding window used in comparison.
k1 – Algorithm parameter, K1 (small constant).
k2 – Algorithm parameter, K2 (small constant). Try a larger K2 constant (e.g. 0.4) if you get a negative or NaN results.
parent – Flag to control dependency on previous layer of pyramid.
blk_size – The side-length of the sliding window used in comparison for information content.
sigma_nsq – Sigma of normal distribution for sliding window used in comparison for information content.
scale_weights – Weights for scaling.
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
Examples
>>> loss = InformationWeightedSSIMLoss() >>> input = torch.rand(3, 3, 256, 256, requires_grad=True) >>> target = torch.rand(3, 3, 256, 256) >>> output = loss(input, target) >>> output.backward()
References
Wang, Zhou, and Qiang Li.. Information content weighting for perceptual image quality assessment. IEEE Transactions on image processing 20.5 (2011): 1185-1198. https://ece.uwaterloo.ca/~z70wang/publications/IWSSIM.pdf DOI:10.1109/TIP.2010.2092435
- forward(x: Tensor, y: Tensor) Tensor
Computation of Information Content Weighted Structural Similarity (IW-SSIM) index as a loss function. For colour images channel order is RGB.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Value of IW-SSIM loss to be minimized, i.e.
1 - information_weighted_ssim
in [0, 1] range.
Visual Information Fidelity (VIFp)
- class piq.VIFLoss(sigma_n_sq: float = 2.0, data_range: Union[int, float] = 1.0, reduction: str = 'mean')
Creates a criterion that measures the Visual Information Fidelity loss between predicted (x) and target (y) image. In order to be considered as a loss, value
1 - clip(VIF, min=0, max=1)
is returned.- Parameters:
sigma_n_sq – HVS model parameter (variance of the visual noise).
data_range – Maximum value range of images (usually 1.0 or 255).
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
Examples
>>> loss = VIFLoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430-444, Feb. 2006 https://ieeexplore.ieee.org/abstract/document/1576816/ DOI: 10.1109/TIP.2005.859378.
- forward(x: Tensor, y: Tensor) Tensor
Computation of Visual Information Fidelity (VIF) index as a loss function. Colour images are expected to have RGB channel order. Order of inputs is important! First tensor must contain distorted images, second reference images.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Value of VIF loss to be minimized in [0, 1] range.
Feature Similarity Index Measure (FSIM)
- class piq.FSIMLoss(reduction: str = 'mean', data_range: Union[int, float] = 1.0, chromatic: bool = True, scales: int = 4, orientations: int = 4, min_length: int = 6, mult: int = 2, sigma_f: float = 0.55, delta_theta: float = 1.2, k: float = 2.0)
Creates a criterion that measures the FSIM or FSIMc for input \(x\) and target \(y\).
In order to be considered as a loss, value
1 - clip(FSIM, min=0, max=1)
is returned. If you need FSIM value, use function fsim instead. Supports greyscale and colour images with RGB channel order.- Parameters:
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
data_range – Maximum value range of images (usually 1.0 or 255).
chromatic – Flag to compute FSIMc, which also takes into account chromatic components
scales – Number of wavelets used for computation of phase congruensy maps
orientations – Number of filter orientations used for computation of phase congruensy maps
min_length – Wavelength of smallest scale filter
mult – Scaling factor between successive filters
sigma_f – Ratio of the standard deviation of the Gaussian describing the log Gabor filter’s transfer function in the frequency domain to the filter center frequency.
delta_theta – Ratio of angular interval between filter orientations and the standard deviation of the angular Gaussian function used to construct filters in the frequency plane.
k – No of standard deviations of the noise energy beyond the mean at which we set the noise threshold point, below which phase congruency values get penalized.
Examples
>>> loss = FSIMLoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
L. Zhang, L. Zhang, X. Mou and D. Zhang, “FSIM: A Feature Similarity Index for Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378-2386, Aug. 2011, doi: 10.1109/TIP.2011.2109730. https://ieeexplore.ieee.org/document/5705575
- forward(x: Tensor, y: Tensor) Tensor
Computation of FSIM as a loss function.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Value of FSIM loss to be minimized in [0, 1] range.
Spectral Residual based Similarity Measure (SR-SIM)
- class piq.SRSIMLoss(reduction: str = 'mean', data_range: Union[int, float] = 1.0, chromatic: bool = False, scale: float = 0.25, kernel_size: int = 3, sigma: float = 3.8, gaussian_size: int = 10)
Creates a criterion that measures the SR-SIM or SR-SIMc for input \(x\) and target \(y\).
In order to be considered as a loss, value 1 - clip(SR-SIM, min=0, max=1) is returned. If you need SR-SIM value, use function srsim instead.
- Parameters:
reduction – Specifies the reduction to apply to the output:
'none'
|'mean'
|'sum'
.'none'
: no reduction will be applied,'mean'
: the sum of the output will be divided by the number of elements in the output,'sum'
: the output will be summed. Default:'mean'
data_range – The difference between the maximum and minimum of the pixel value, i.e., if for image x it holds min(x) = 0 and max(x) = 1, then data_range = 1. The pixel value interval of both input and output should remain the same.
chromatic – Flag to compute SRSIMc, which also takes into account chromatic components
scale – Resizing factor used in saliency map computation
kernel_size – Kernel size of average blur filter used in saliency map computation
sigma – Sigma of gaussian filter applied on saliency map
gaussian_size – Size of gaussian filter applied on saliency map
- Shape:
Input: Required to be 2D (H, W), 3D (C, H, W) or 4D (N, C, H, W). RGB channel order for colour images.
Target: Required to be 2D (H, W), 3D (C, H, W) or 4D (N, C, H, W). RGB channel order for colour images.
Examples:
>>> loss = SRSIMLoss() >>> prediction = torch.rand(3, 3, 256, 256, requires_grad=True) >>> target = torch.rand(3, 3, 256, 256) >>> output = loss(prediction, target) >>> output.backward()
References
https://sse.tongji.edu.cn/linzhang/ICIP12/ICIP-SR-SIM.pdf
- forward(prediction: Tensor, target: Tensor) Tensor
Computation of SR-SIM as a loss function. :param prediction: Tensor of prediction of the network. :param target: Reference tensor.
- Returns:
Value of SR-SIM loss to be minimized. 0 <= SR-SIM <= 1.
Gradient Magnitude Similarity Deviation (GMSD)
- class piq.GMSDLoss(reduction: str = 'mean', data_range: Union[int, float] = 1.0, t: float = 0.00261437908496732)
Creates a criterion that measures Gradient Magnitude Similarity Deviation between each element in the input and target.
- Parameters:
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
data_range – Maximum value range of images (usually 1.0 or 255).
t – Constant from the reference paper numerical stability of similarity map
Examples
>>> loss = GMSDLoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
Wufeng Xue et al. Gradient Magnitude Similarity Deviation (2013) https://arxiv.org/pdf/1308.3052.pdf
- forward(x: Tensor, y: Tensor) Tensor
Computation of Gradient Magnitude Similarity Deviation (GMSD) as a loss function. Supports greyscale and colour images with RGB channel order.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Value of GMSD loss to be minimized in [0, 1] range.
Multi-Scale Gradient Magnitude Similarity Deviation (MS-GMSD)
- class piq.MultiScaleGMSDLoss(reduction: str = 'mean', data_range: Union[int, float] = 1.0, scale_weights: Optional[Tensor] = None, chromatic: bool = False, alpha: float = 0.5, beta1: float = 0.01, beta2: float = 0.32, beta3: float = 15.0, t: float = 170)
Creates a criterion that measures multi scale Gradient Magnitude Similarity Deviation between each element in the input \(x\) and target \(y\).
- Parameters:
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
data_range – Maximum value range of images (usually 1.0 or 255).
scale_weights – Weights for different scales. Can contain any number of floating point values. By default weights are initialized with values from the paper.
chromatic – Flag to use MS-GMSDc algorithm from paper. It also evaluates chromatic components of the image. Default: True
beta1 – Algorithm parameter. Weight of chromatic component in the loss.
beta2 – Algorithm parameter. Small constant, references.
beta3 – Algorithm parameter. Small constant, references.
t – Constant from the reference paper numerical stability of similarity map
Examples
>>> loss = MultiScaleGMSDLoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
Bo Zhang et al. Gradient Magnitude Similarity Deviation on Multiple Scales (2017). http://www.cse.ust.hk/~psander/docs/gradsim.pdf
- forward(x: Tensor, y: Tensor) Tensor
Computation of Multi Scale GMSD index as a loss function. Supports greyscale and colour images with RGB channel order. The height and width should be at least 2 ** scales + 1.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Value of MS-GMSD loss to be minimized in [0, 1] range.
Visual Saliency-induced Index (VSI)
- class piq.VSILoss(reduction: str = 'mean', c1: float = 1.27, c2: float = 386.0, c3: float = 130.0, alpha: float = 0.4, beta: float = 0.02, data_range: Union[int, float] = 1.0, omega_0: float = 0.021, sigma_f: float = 1.34, sigma_d: float = 145.0, sigma_c: float = 0.001)
Creates a criterion that measures Visual Saliency-induced Index error between each element in the input and target.
The sum operation still operates over all the elements, and divides by \(n\).
The division by \(n\) can be avoided if one sets
reduction = 'sum'
.- Parameters:
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
data_range – Maximum value range of images (usually 1.0 or 255).
c1 – coefficient to calculate saliency component of VSI
c2 – coefficient to calculate gradient component of VSI
c3 – coefficient to calculate color component of VSI
alpha – power for gradient component of VSI
beta – power for color component of VSI
omega_0 – coefficient to get log Gabor filter at SDSP
sigma_f – coefficient to get log Gabor filter at SDSP
sigma_d – coefficient to get SDSP
sigma_c – coefficient to get SDSP
Examples
>>> loss = VSILoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
L. Zhang, Y. Shen and H. Li, “VSI: A Visual Saliency-Induced Index for Perceptual Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 23, no. 10, pp. 4270-4281, Oct. 2014, doi: 10.1109/TIP.2014.2346028 https://ieeexplore.ieee.org/document/6873260
- forward(x, y)
Computation of VSI as a loss function.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Value of VSI loss to be minimized in [0, 1] range.
Note
Both inputs are supposed to have RGB channels order in accordance with the original approach. Nevertheless, the method supports greyscale images, which they are converted to RGB by copying the grey channel 3 times.
DCT Subband Similarity Index (DSS)
- class piq.DSSLoss(reduction: str = 'mean', data_range: Union[int, float] = 1.0, dct_size: int = 8, sigma_weight: float = 1.55, kernel_size: int = 3, sigma_similarity: float = 1.5, percentile: float = 0.05)
Creates a criterion that measures the DSS for input \(x\) and target \(y\).
In order to be considered as a loss, value 1 - clip(DSS, min=0, max=1) is returned. If you need DSS value, use function dss instead.
- Parameters:
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
data_range – Maximum value range of images (usually 1.0 or 255).
dct_size – Size of blocks in 2D Discrete Cosine Transform. DCT sizes must be in (0, input size].
sigma_weight – STD of gaussian that determines the proportion of weight given to low freq and high freq. Default: 1.55
kernel_size – Size of gaussian kernel for computing subband similarity. Kernels size must be in (0, input size]. Default: 3
sigma_similarity – STD of gaussian kernel for computing subband similarity. Default: 1.5
percentile – % in (0,1] of worst similarity scores which should be kept. Default: 0.05
- Shape:
Input: Required to be 4D (N, C, H, W). RGB channel order for colour images.
Target: Required to be 4D (N, C, H, W). RGB channel order for colour images.
- Examples::
>>> loss = DSSLoss() >>> prediction = torch.rand(3, 3, 256, 256, requires_grad=True) >>> target = torch.rand(3, 3, 256, 256) >>> output = loss(prediction, target) >>> output.backward()
References
https://sse.tongji.edu.cn/linzhang/ICIP12/ICIP-SR-SIM.pdf
- forward(prediction: Tensor, target: Tensor) Tensor
Computation of DSS as a loss function.
- Parameters:
prediction – Tensor of prediction of the network.
target – Reference tensor.
- Returns:
Value of DSS loss to be minimized. 0 <= DSS <= 1.
Haar Perceptual Similarity Index (HaarPSI)
- class piq.HaarPSILoss(reduction: Optional[str] = 'mean', data_range: Union[int, float] = 1.0, scales: int = 3, subsample: bool = True, c: float = 30.0, alpha: float = 4.2)
Creates a criterion that measures Haar Wavelet-Based Perceptual Similarity loss between each element in the input and target.
The sum operation still operates over all the elements, and divides by \(n\). The division by \(n\) can be avoided if one sets
reduction = 'sum'
.- Parameters:
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
data_range – Maximum value range of images (usually 1.0 or 255).
scales – Number of Haar wavelets used for image decomposition.
subsample – Flag to apply average pooling before HaarPSI computation. See references for details.
c – Constant from the paper. See references for details
alpha – Exponent used for similarity maps weightning. See references for details
Examples
>>> loss = HaarPSILoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
R. Reisenhofer, S. Bosse, G. Kutyniok & T. Wiegand (2017) ‘A Haar Wavelet-Based Perceptual Similarity Index for Image Quality Assessment’ http://www.math.uni-bremen.de/cda/HaarPSI/publications/HaarPSI_preprint_v4.pdf
- forward(x: Tensor, y: Tensor) Tensor
Computation of HaarPSI as a loss function.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Value of HaarPSI loss to be minimized in [0, 1] range.
Mean Deviation Similarity Index (MDSI)
- class piq.MDSILoss(data_range: Union[int, float] = 1.0, reduction: str = 'mean', c1: float = 140.0, c2: float = 55.0, c3: float = 550.0, alpha: float = 0.6, rho: float = 1.0, q: float = 0.25, o: float = 0.25, combination: str = 'sum', beta: float = 0.1, gamma: float = 0.2)
Creates a criterion that measures Mean Deviation Similarity Index (MDSI) error between the prediction \(x\) and target \(y\). Supports greyscale and colour images with RGB channel order.
- Parameters:
data_range – Maximum value range of images (usually 1.0 or 255).
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
c1 – coefficient to calculate gradient similarity. Default: 140.
c2 – coefficient to calculate gradient similarity. Default: 55.
c3 – coefficient to calculate chromaticity similarity. Default: 550.
combination – mode to combine gradient similarity and chromaticity similarity:
'sum'
|'mult'
.alpha – coefficient to combine gradient similarity and chromaticity similarity using summation.
beta – power to combine gradient similarity with chromaticity similarity using multiplication.
gamma – to combine gradient similarity and chromaticity similarity using multiplication.
rho – order of the Minkowski distance
q – coefficient to adjusts the emphasis of the values in image and MCT
o – the power pooling applied on the final value of the deviation
Examples
>>> loss = MDSILoss(data_range=1.) >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
Nafchi, Hossein Ziaei and Shahkolaei, Atena and Hedjam, Rachid and Cheriet, Mohamed (2016). Mean deviation similarity index: Efficient and reliable full-reference image quality evaluator. IEEE Ieee Access, 4, 5579–5590. https://arxiv.org/pdf/1608.07433.pdf DOI:10.1109/ACCESS.2016.2604042
Note
The ratio between constants is usually equal \(c_3 = 4c_1 = 10c_2\)
- forward(x: Tensor, y: Tensor) Tensor
Computation of Mean Deviation Similarity Index (MDSI) as a loss function.
Both inputs are supposed to have RGB channels order. Greyscale images converted to RGB by copying the grey channel 3 times.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Value of MDSI loss to be minimized in [0, 1] range.
Note
Both inputs are supposed to have RGB channels order in accordance with the original approach. Nevertheless, the method supports greyscale images, which are converted to RGB by copying the grey channel 3 times.
Learned Perceptual Image Patch Similarity (LPIPS)
- class piq.LPIPS(replace_pooling: bool = False, distance: str = 'mse', reduction: str = 'mean', mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])
Learned Perceptual Image Patch Similarity metric. Only VGG16 learned weights are supported.
By default expects input to be in range [0, 1], which is then normalized by ImageNet statistics into range [-1, 1]. If no normalisation is required, change mean and std values accordingly.
- Parameters:
replace_pooling – Flag to replace MaxPooling layer with AveragePooling. See references for details.
distance – Method to compute distance between features:
'mse'
|'mae'
.reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
mean – List of float values used for data standardization. Default: ImageNet mean. If there is no need to normalize data, use [0., 0., 0.].
std – List of float values used for data standardization. Default: ImageNet std. If there is no need to normalize data, use [1., 1., 1.].
Examples
>>> loss = LPIPS() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
Gatys, Leon and Ecker, Alexander and Bethge, Matthias (2016). A Neural Algorithm of Artistic Style Association for Research in Vision and Ophthalmology (ARVO) https://arxiv.org/abs/1508.06576
Zhang, Richard and Isola, Phillip and Efros, et al. (2018) The Unreasonable Effectiveness of Deep Features as a Perceptual Metric IEEE/CVF Conference on Computer Vision and Pattern Recognition https://arxiv.org/abs/1801.03924 https://github.com/richzhang/PerceptualSimilarity
Perceptual Image-Error Assessment through Pairwise Preference(PieAPP)
- class piq.PieAPP(reduction: str = 'mean', data_range: Union[int, float] = 1.0, stride: int = 27, enable_grad: bool = False)
Implementation of Perceptual Image-Error Assessment through Pairwise Preference.
Expects input to be in range
[0, data_range]
with no normalization and RGB channel order. Input images are cropped into smaller patches. Score for each individual image is mean of it’s patch scores.- Parameters:
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
data_range – Maximum value range of images (usually 1.0 or 255).
stride – Step between cropped patches. Smaller values lead to better quality, but cause higher memory consumption. Default: 27 (sparse sampling in original implementation)
enable_grad – Flag to compute gradients. Useful when PieAPP used as a loss. Default: False.
Examples
>>> loss = PieAPP() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
Ekta Prashnani, Hong Cai, Yasamin Mostofi, Pradeep Sen (2018). PieAPP: Perceptual Image-Error Assessment through Pairwise Preference https://arxiv.org/abs/1806.02067
https://github.com/prashnani/PerceptualImageError
- forward(x: Tensor, y: Tensor) Tensor
Computation of PieAPP between feature representations of prediction \(x\) and target \(y\) tensors.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Perceptual Image-Error Assessment through Pairwise Preference
- get_features(x: Tensor) Tuple[Tensor, Tensor]
- Parameters:
x – Tensor. Shape \((N, C, H, W)\).
- Returns:
List of features extracted from intermediate layers weights
Deep Image Structure and Texture Similarity(DISTS)
- class piq.DISTS(reduction: str = 'mean', mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])
Deep Image Structure and Texture Similarity metric.
By default expects input to be in range [0, 1], which is then normalized by ImageNet statistics into range [-1, 1]. If no normalisation is required, change mean and std values accordingly.
- Parameters:
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
mean – List of float values used for data standardization. Default: ImageNet mean. If there is no need to normalize data, use [0., 0., 0.].
std – List of float values used for data standardization. Default: ImageNet std. If there is no need to normalize data, use [1., 1., 1.].
Examples
>>> loss = DISTS() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
Keyan Ding, Kede Ma, Shiqi Wang, Eero P. Simoncelli (2020). Image Quality Assessment: Unifying Structure and Texture Similarity. https://arxiv.org/abs/2004.07728 https://github.com/dingkeyan93/DISTS
- compute_distance(x_features: Tensor, y_features: Tensor) List[Tensor]
Compute structure similarity between feature maps
- Parameters:
x_features – Features of the input tensor.
y_features – Features of the target tensor.
- Returns:
Structural similarity distance between feature maps
- forward(x: Tensor, y: Tensor) Tensor
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Deep Image Structure and Texture Similarity loss, i.e.
1-DISTS
in range [0, 1].
- get_features(x: Tensor) List[Tensor]
- Parameters:
x – Input tensor
- Returns:
List of features extracted from input tensor
- replace_pooling(module: Module) Module
Turn All MaxPool layers into L2Pool
- Parameters:
module – Module to change MaxPool into L2Pool
- Returns:
Module with L2Pool instead of MaxPool
Style Score
- class piq.StyleLoss(feature_extractor: Union[str, Module] = 'vgg16', layers: Collection[str] = ('relu3_3',), weights: List[Union[float, Tensor]] = [1.0], replace_pooling: bool = False, distance: str = 'mse', reduction: str = 'mean', mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225], normalize_features: bool = False, allow_layers_weights_mismatch: bool = False)
Creates Style loss that can be used for image style transfer or as a measure in image to image tasks. Computes distance between Gram matrices of feature maps. Uses pretrained VGG models from torchvision.
By default expects input to be in range [0, 1], which is then normalized by ImageNet statistics into range [-1, 1]. If no normalisation is required, change mean and std values accordingly.
- Parameters:
feature_extractor – Model to extract features or model name:
'vgg16'
|'vgg19'
.layers – List of strings with layer names. Default:
'relu3_3'
weights – List of float weight to balance different layers
replace_pooling – Flag to replace MaxPooling layer with AveragePooling. See references for details.
distance – Method to compute distance between features:
'mse'
|'mae'
.reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
mean – List of float values used for data standardization. Default: ImageNet mean. If there is no need to normalize data, use [0., 0., 0.].
std – List of float values used for data standardization. Default: ImageNet std. If there is no need to normalize data, use [1., 1., 1.].
normalize_features – If true, unit-normalize each feature in channel dimension before scaling and computing distance. See references for details.
Examples
>>> loss = StyleLoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
Gatys, Leon and Ecker, Alexander and Bethge, Matthias (2016). A Neural Algorithm of Artistic Style Association for Research in Vision and Ophthalmology (ARVO) https://arxiv.org/abs/1508.06576
Zhang, Richard and Isola, Phillip and Efros, et al. (2018) The Unreasonable Effectiveness of Deep Features as a Perceptual Metric IEEE/CVF Conference on Computer Vision and Pattern Recognition https://arxiv.org/abs/1801.03924
- compute_distance(x_features: Tensor, y_features: Tensor)
Take L2 or L1 distance between Gram matrices of feature maps depending on
distance
.- Parameters:
x_features – Features of the input tensor.
y_features – Features of the target tensor.
- Returns:
Distance between Gram matrices
- static gram_matrix(x: Tensor) Tensor
Compute Gram matrix for batch of features.
- Parameters:
x – Tensor. Shape \((N, C, H, W)\).
- Returns:
Gram matrix for given input
Content Score
- class piq.ContentLoss(feature_extractor: Union[str, Module] = 'vgg16', layers: Collection[str] = ('relu3_3',), weights: List[Union[float, Tensor]] = [1.0], replace_pooling: bool = False, distance: str = 'mse', reduction: str = 'mean', mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225], normalize_features: bool = False, allow_layers_weights_mismatch: bool = False)
Creates Content loss that can be used for image style transfer or as a measure for image to image tasks. Uses pretrained VGG models from torchvision. Expects input to be in range [0, 1] or normalized with ImageNet statistics into range [-1, 1]
- Parameters:
feature_extractor – Model to extract features or model name:
'vgg16'
|'vgg19'
.layers – List of strings with layer names. Default:
'relu3_3'
weights – List of float weight to balance different layers
replace_pooling – Flag to replace MaxPooling layer with AveragePooling. See references for details.
distance – Method to compute distance between features:
'mse'
|'mae'
.reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
mean – List of float values used for data standardization. Default: ImageNet mean. If there is no need to normalize data, use [0., 0., 0.].
std – List of float values used for data standardization. Default: ImageNet std. If there is no need to normalize data, use [1., 1., 1.].
normalize_features – If true, unit-normalize each feature in channel dimension before scaling and computing distance. See references for details.
Examples
>>> loss = ContentLoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> y = torch.rand(3, 3, 256, 256) >>> output = loss(x, y) >>> output.backward()
References
Gatys, Leon and Ecker, Alexander and Bethge, Matthias (2016). A Neural Algorithm of Artistic Style Association for Research in Vision and Ophthalmology (ARVO) https://arxiv.org/abs/1508.06576
Zhang, Richard and Isola, Phillip and Efros, et al. (2018) The Unreasonable Effectiveness of Deep Features as a Perceptual Metric IEEE/CVF Conference on Computer Vision and Pattern Recognition https://arxiv.org/abs/1801.03924
- compute_distance(x_features: List[Tensor], y_features: List[Tensor]) List[Tensor]
Take L2 or L1 distance between feature maps depending on
distance
.- Parameters:
x_features – Features of the input tensor.
y_features – Features of the target tensor.
- Returns:
Distance between feature maps
- forward(x: Tensor, y: Tensor) Tensor
Computation of Content loss between feature representations of prediction \(x\) and target \(y\) tensors.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
y – A target tensor. Shape \((N, C, H, W)\).
- Returns:
Content loss between feature representations
- get_features(x: Tensor) List[Tensor]
- Parameters:
x – Tensor. Shape \((N, C, H, W)\).
- Returns:
List of features extracted from intermediate layers
- static normalize(x: Tensor) Tensor
Normalize feature maps in channel direction to unit length.
- Parameters:
x – Tensor. Shape \((N, C, H, W)\).
- Returns:
Normalized input
- replace_pooling(module: Module) Module
Turn All MaxPool layers into AveragePool
- Parameters:
module – Module to change MaxPool int AveragePool
- Returns:
Module with AveragePool instead MaxPool
No Reference Metrics
Total Variation
- class piq.TVLoss(norm_type: str = 'l2', reduction: str = 'mean')
Creates a criterion that measures the total variation of the the given input \(x\).
If
norm_type
set to'l2'
the loss can be described as:\[TV(x) = \sum_{N}\sqrt{\sum_{H, W, C}(|x_{:, :, i+1, j} - x_{:, :, i, j}|^2 + |x_{:, :, i, j+1} - x_{:, :, i, j}|^2)}\]Else if
norm_type
set to'l1'
:\[TV(x) = \sum_{N}\sum_{H, W, C}(|x_{:, :, i+1, j} - x_{:, :, i, j}| + |x_{:, :, i, j+1} - x_{:, :, i, j}|)\]where \(N\) is the batch size, C is the channel size.
- Parameters:
norm_type – one of
'l1'
|'l2'
|'l2_squared'
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
Examples
>>> loss = TVLoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> output = loss(x) >>> output.backward()
References
https://www.wikiwand.com/en/Total_variation_denoising
https://remi.flamary.com/demos/proxtv.html
- forward(x: Tensor) Tensor
Computation of Total Variation (TV) index as a loss function.
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\).
- Returns:
Value of TV loss to be minimized.
Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE)
- class piq.BRISQUELoss(kernel_size: int = 7, kernel_sigma: float = 1.1666666666666667, data_range: Union[int, float] = 1.0, reduction: str = 'mean', interpolation: str = 'nearest')
Creates a criterion that measures the BRISQUE score for input \(x\). \(x\) is 4D tensor (N, C, H, W). The sum operation still operates over all the elements, and divides by \(n\). The division by \(n\) can be avoided by setting
reduction = 'sum'
.- Parameters:
kernel_size – By default, the mean and covariance of a pixel is obtained by convolution with given filter_size. Must be an odd value.
kernel_sigma – Standard deviation for Gaussian kernel.
data_range – Maximum value range of images (usually 1.0 or 255).
reduction – Specifies the reduction type:
'none'
|'mean'
|'sum'
. Default:'mean'
Examples
>>> loss = BRISQUELoss() >>> x = torch.rand(3, 3, 256, 256, requires_grad=True) >>> output = loss(x) >>> output.backward()
References
Anish Mittal et al. “No-Reference Image Quality Assessment in the Spatial Domain”, https://live.ece.utexas.edu/publications/2012/TIP%20BRISQUE.pdf
Note
The back propagation is not available using
torch=1.5.0
due to bug inargmin
andargmax
backpropagation. Update the torch and torchvision to the latest versions.- forward(x: Tensor) Tensor
Computation of BRISQUE score as a loss function.
- Parameters:
x – An input tensor with (N, C, H, W) shape. RGB channel order for colour images.
- Returns:
Value of BRISQUE loss to be minimized.
CLIP-IQA
- class piq.CLIPIQA(data_range: Union[float, int] = 1.0)
Creates a criterion that measures image quality based on a general notion of text-to-image similarity learned by the CLIP model (Radford et al., 2021) during its large-scale pre-training on a large dataset with paired texts and images.
The method is based on the idea that two antonyms (“Good photo” and “Bad photo”) can be used as anchors in the text embedding space representing good and bad images in terms of their image quality.
After the anchors are defined, one can use them to determine the quality of a given image in the following way: 1. Compute the image embedding of the image of interest using the pre-trained CLIP model; 2. Compute the text embeddings of the selected anchor antonyms; 3. Compute the angle (cosine similarity) between the image embedding (1) and both text embeddings (2); 4. Compute the Softmax of cosine similarities (3) -> CLIP-IQA score (Wang et al., 2022).
This method is proposed to eliminate the linguistic ambiguity of the naive approach (using a single prompt, e.g., “Good photo”).
This method has an extension called CLIP-IQA+ proposed in the same research paper. It uses the same approach but also fine-tunes the CLIP weights using the CoOp fine-tuning algorithm (Zhou et al., 2022).
Note
The initial computation of the metric is performed in float32 and other dtypes (i.e. float16, float64) are not supported. We preserve this behaviour for reproducibility perposes. Also, at the time of writing conv2d is not supported for float16 tensors on CPU.
Warning
In order to avoid implicit dtype conversion and normalization of input tensors, they are copied. Note that it may consume extra memory, which might be noticeable on large batch sizes.
- Parameters:
data_range – Maximum value range of images (usually 1.0 or 255).
Examples
>>> from piq import CLIPIQA >>> clipiqa = CLIPIQA() >>> x = torch.rand(1, 3, 224, 224) >>> score = clipiqa(x)
References
Radford, Alec, et al. “Learning transferable visual models from natural language supervision.” International conference on machine learning. PMLR, 2021.
Wang, Jianyi, Kelvin CK Chan, and Chen Change Loy. “Exploring CLIP for Assessing the Look and Feel of Images.” arXiv preprint arXiv:2207.12396 (2022).
Zhou, Kaiyang, et al. “Learning to prompt for vision-language models.” International Journal of Computer Vision 130.9 (2022): 2337-2348.
- forward(x_input: Tensor) Tensor
Computation of CLIP-IQA metric for a given image \(x\).
- Parameters:
x – An input tensor. Shape \((N, C, H, W)\). The metric is designed in such a way that it expects: - A 4D PyTorch tensor; - The tensor might have flexible data ranges depending on data_range value; - The tensor must have channels first format.
- Returns:
The value of CLI-IQA score in [0, 1] range.
Feature Metrics
Inseption Score (IS)
- class piq.IS(num_splits: int = 10, distance: str = 'l1')
Creates a criterion that measures difference of Inception Score between two datasets.
IS is computed separately for predicted \(x\) and target \(y\) features and expects raw InceptionV3 model logits as inputs.
- Parameters:
num_splits – Number of parts to divide features. IS is computed for them separately and results are then averaged.
distance – How to measure distance between scores:
'l1'
|'l2'
. Default:'l1'
.
Examples
>>> is_metric = IS() >>> x_feats = torch.rand(10000, 1024) >>> y_feats = torch.rand(10000, 1024) >>> is: torch.Tensor = is_metric(x_feats, y_feats)
References
“A Note on the Inception Score” https://arxiv.org/pdf/1801.01973.pdf
- compute_metric(x_features: Tensor, y_features: Tensor) Tensor
Compute IS.
Both features should have shape (N_samples, encoder_dim).
- Parameters:
x_features – Samples from data distribution. Shape \((N_x, D)\)
y_features – Samples from data distribution. Shape \((N_y, D)\)
- Returns:
L1 or L2 distance between scores for datasets \(x\) and \(y\).
Frechet Inception Distance (FID)
- class piq.FID
Interface of Frechet Inception Distance. It’s computed for a whole set of data and uses features from encoder instead of images itself to decrease computation cost. FID can compare two data distributions with different number of samples. But dimensionalities should match, otherwise it won’t be possible to correctly compute statistics.
Examples
>>> fid_metric = FID() >>> x_feats = torch.rand(10000, 1024) >>> y_feats = torch.rand(10000, 1024) >>> fid: torch.Tensor = fid_metric(x_feats, y_feats)
References
Heusel M. et al. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, https://arxiv.org/abs/1706.08500
- compute_metric(x_features: Tensor, y_features: Tensor) Tensor
Fits multivariate Gaussians: \(X \sim \mathcal{N}(\mu_x, \sigma_x)\) and \(Y \sim \mathcal{N}(\mu_y, \sigma_y)\) to image stacks. Then computes FID as \(d^2 = ||\mu_x - \mu_y||^2 + Tr(\sigma_x + \sigma_y - 2\sqrt{\sigma_x \sigma_y})\).
- Parameters:
x_features – Samples from data distribution. Shape \((N_x, D)\)
y_features – Samples from data distribution. Shape \((N_y, D)\)
- Returns:
The Frechet Distance.
Geometry Score (GS)
- class piq.GS(sample_size: int = 64, num_iters: int = 1000, gamma: Optional[float] = None, i_max: int = 100, num_workers: int = 4)
Interface of Geometry Score. It’s computed for a whole set of data and can use features from encoder instead of images itself to decrease computation cost. GS can compare two data distributions with different number of samples. Dimensionalities of features should match, otherwise it won’t be possible to correctly compute statistics.
- Parameters:
sample_size – Number of landmarks to use on each iteration. Higher values can give better accuracy, but increase computation cost.
num_iters – Number of iterations. Higher values can reduce variance, but increase computation cost.
gamma – Parameter determining maximum persistence value. Default is
1.0 / 128 * N_imgs / 5000
i_max – Upper bound on i in RLT(i, 1, X, L)
num_workers – Number of processes used for GS computation.
Examples
>>> gs_metric = GS() >>> x_feats = torch.rand(10000, 1024) >>> y_feats = torch.rand(10000, 1024) >>> gs: torch.Tensor = gs_metric(x_feats, y_feats)
References
Khrulkov V., Oseledets I. (2018). Geometry score: A method for comparing generative adversarial networks. arXiv preprint, 2018. https://arxiv.org/abs/1802.02664
Note
Computation is heavily CPU dependent, adjust
num_workers
parameter according to your system configuration. GS metric requiersgudhi
library which is not installed by default. For conda, write:conda install -c conda-forge gudhi
, otherwise follow installation guide: http://gudhi.gforge.inria.fr/python/latest/installation.html- compute_metric(x_features: Tensor, y_features: Tensor) Tensor
Implements Algorithm 2 from the paper.
- Parameters:
x_features – Samples from data distribution. Shape \((N_x, D)\)
y_features – Samples from data distribution. Shape \((N_y, D)\)
- Returns:
Scalar value of the distance between distributions.
Kernel Inception Distance (KID)
- class piq.KID(degree: int = 3, gamma: Optional[float] = None, coef0: int = 1, var_at_m: Optional[int] = None, average: bool = False, n_subsets: int = 50, subset_size: Optional[int] = 1000, ret_var: bool = False)
Interface of Kernel Inception Distance. It’s computed for a whole set of data and uses features from encoder instead of images itself to decrease computation cost. KID can compare two data distributions with different number of samples. But dimensionalities should match, otherwise it won’t be possible to correctly compute statistics.
- Parameters:
degree – Degree of a polynomial functions used in kernels. Default: 3
gamma – Kernel parameter. See paper for details
coef0 – Kernel parameter. See paper for details
var_at_m – Kernel variance. Default is None
average – If True recomputes metric n_subsets times using subset_size elements.
n_subsets – Number of repeats. Ignored if average is False
subset_size – Size of each subset for repeat. Ignored if average is False
ret_var – Whether to return variance after the distance is computed. This function will return
Tuple[torch.Tensor, torch.Tensor]
in this case. Default: False
Examples
>>> kid_metric = KID() >>> x_feats = torch.rand(10000, 1024) >>> y_feats = torch.rand(10000, 1024) >>> kid: torch.Tensor = kid_metric(x_feats, y_feats)
References
Demystifying MMD GANs https://arxiv.org/abs/1801.01401
- compute_metric(x_features: Tensor, y_features: Tensor) Union[Tensor, Tuple[Tensor, Tensor]]
Computes KID (polynomial MMD) for given sets of features, obtained from Inception net or any other feature extractor. Samples must be in range [0, 1].
- Parameters:
x_features – Samples from data distribution. Shape \((N_x, D)\)
y_features – Samples from data distribution. Shape \((N_y, D)\)
- Returns:
KID score and variance (optional).
Multi-Scale Intrinsic Distance (MSID)
- class piq.MSID(ts: Optional[Tensor] = None, k: int = 5, m: int = 10, niters: int = 100, rademacher: bool = False, normalized_laplacian: bool = True, normalize: str = 'empty', msid_mode: str = 'max')
Creates a criterion that measures MSID score for two batches of images It’s computed for a whole set of data and uses features from encoder instead of images itself to decrease computation cost. MSID can compare two data distributions with different number of samples or different dimensionalities.
- Parameters:
ts – Temperature values. If
None
, the default valuetorch.logspace(-1, 1, 256)
is used.k – Number of neighbours for graph construction.
m – Lanczos steps in SLQ.
niters – Number of starting random vectors for SLQ.
rademacher – True to use Rademacher distribution, False - standard normal for random vectors in Hutchinson.
normalized_laplacian – if True, use normalized Laplacian.
normalize –
'empty'
for average heat kernel (corresponds to the empty graph normalization of NetLSD),'complete'
for the complete,'er'
for Erdos-Renyi normalization,'none'
for no normalizationmsid_mode –
'l2'
to compute the L2 norm of the distance between msid1 and msid2;'max'
to find the maximum absolute difference between two descriptors over temperature
Examples
>>> msid_metric = MSID() >>> x_feats = torch.rand(10000, 1024) >>> y_feats = torch.rand(10000, 1024) >>> msid: torch.Tensor = msid_metric(x_feats, y_feats)
References
Tsitsulin, A., Munkhoeva, M., Mottin, D., Karras, P., Bronstein, A., Oseledets, I., & Müller, E. (2019). The shape of data: Intrinsic distance for data distributions. https://arxiv.org/abs/1905.11141
- compute_metric(x_features: Tensor, y_features: Tensor) Tensor
Compute MSID score between two sets of samples.
- Parameters:
x_features – Samples from data distribution. Shape \((N_x, D_x)\)
y_features – Samples from data distribution. Shape \((N_y, D_y)\)
- Returns:
Scalar value of the distance between distributions.
Improved Precision and Recall (P&R)
- class piq.PR(nearest_k: int = 5)
Interface of Improved Precision and Recall. It’s computed for a whole set of data and uses features from encoder instead of images itself to decrease computation cost. Precision and Recall can compare two data distributions with different number of samples. But dimensionalities should match, otherwise it won’t be possible to correctly compute statistics.
- Parameters:
nearest_k – Nearest neighbor to compute the non-parametric representation. Shape \(1\)
Examples
>>> pr_metric = PR() >>> x_feats = torch.rand(10000, 1024) >>> y_feats = torch.rand(10000, 1024) >>> precision, recall = pr_metric(x_feats, y_feats)
References
Kynkäänniemi T. et al. (2019). Improved Precision and Recall Metric for Assessing Generative Models. Advances in Neural Information Processing Systems, https://arxiv.org/abs/1904.06991
- compute_metric(real_features: Tensor, fake_features: Tensor) Tuple[Tensor, Tensor]
Creates non-parametric representations of the manifolds of real and generated data and computes the precision and recall between them.
- Parameters:
real_features – Samples from data distribution. Shape \((N_x, D)\)
fake_features – Samples from fake distribution. Shape \((N_x, D)\)
- Returns:
Scalar value of the precision of the generated images.
Scalar value of the recall of the generated images.