Class Interface

Full Reference Metrics

Structural Similarity (SSIM)

class piq.SSIMLoss(kernel_size: int = 11, kernel_sigma: float = 1.5, k1: float = 0.01, k2: float = 0.03, downsample: bool = True, reduction: str = 'mean', data_range: Union[int, float] = 1.0)

Creates a criterion that measures the structural similarity index error between each element in the input \(x\) and target \(y\).

To match performance with skimage and tensorflow set 'downsample' = True.

The unreduced (i.e. with reduction set to 'none') loss can be described as:

\[\begin{split}SSIM = \{ssim_1,\dots,ssim_{N \times C}\}\\ ssim_{l}(x, y) = \frac{(2 \mu_x \mu_y + c_1) (2 \sigma_{xy} + c_2)} {(\mu_x^2 +\mu_y^2 + c_1)(\sigma_x^2 +\sigma_y^2 + c_2)},\end{split}\]

where \(N\) is the batch size, C is the channel size. If reduction is not 'none' (default 'mean'), then:

\[\begin{split}SSIMLoss(x, y) = \begin{cases} \operatorname{mean}(1 - SSIM), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(1 - SSIM), & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]

\(x\) and \(y\) are tensors of arbitrary shapes with a total of \(n\) elements each.

The sum operation still operates over all the elements, and divides by \(n\). The division by \(n\) can be avoided if one sets reduction = 'sum'. In case of 5D input tensors, complex value is returned as a tensor of size 2.

Parameters:
  • kernel_size – By default, the mean and covariance of a pixel is obtained by convolution with given filter_size.

  • kernel_sigma – Standard deviation for Gaussian kernel.

  • k1 – Coefficient related to c1 in the above equation.

  • k2 – Coefficient related to c2 in the above equation.

  • downsample – Perform average pool before SSIM computation. Default: True

  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • data_range – Maximum value range of images (usually 1.0 or 255).

Examples

>>> loss = SSIMLoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13, 600-612. https://ece.uwaterloo.ca/~z70wang/publications/ssim.pdf, DOI:10.1109/TIP.2003.819861

forward(x: Tensor, y: Tensor) Tensor

Computation of Structural Similarity (SSIM) index as a loss function.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\) or \((N, C, H, W, 2)\).

  • y – A target tensor. Shape \((N, C, H, W)\) or \((N, C, H, W, 2)\).

Returns:

Value of SSIM loss to be minimized, i.e 1 - ssim in [0, 1] range. In case of 5D input tensors, complex value is returned as a tensor of size 2.

Multi-Scale Structural Similarity (MS-SSIM)

class piq.MultiScaleSSIMLoss(kernel_size: int = 11, kernel_sigma: float = 1.5, k1: float = 0.01, k2: float = 0.03, scale_weights: Optional[Tensor] = None, reduction: str = 'mean', data_range: Union[int, float] = 1.0)

Creates a criterion that measures the multi-scale structural similarity index error between each element in the input \(x\) and target \(y\). The unreduced (i.e. with reduction set to 'none') loss can be described as:

\[\begin{split}MSSIM = \{mssim_1,\dots,mssim_{N \times C}\}, \\ mssim_{l}(x, y) = \frac{(2 \mu_{x,m} \mu_{y,m} + c_1) } {(\mu_{x,m}^2 +\mu_{y,m}^2 + c_1)} \prod_{j=1}^{m - 1} \frac{(2 \sigma_{xy,j} + c_2)}{(\sigma_{x,j}^2 +\sigma_{y,j}^2 + c_2)}\end{split}\]

where \(N\) is the batch size, C is the channel size, m is the scale level (Default: 5). If reduction is not 'none' (default 'mean'), then:

\[\begin{split}MultiscaleSSIMLoss(x, y) = \begin{cases} \operatorname{mean}(1 - MSSIM), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(1 - MSSIM), & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]

For colour images channel order is RGB. In case of 5D input tensors, complex value is returned as a tensor of size 2.

Parameters:
  • kernel_size – By default, the mean and covariance of a pixel is obtained by convolution with given filter_size. Must be an odd value.

  • kernel_sigma – Standard deviation for Gaussian kernel.

  • k1 – Coefficient related to c1 in the above equation.

  • k2 – Coefficient related to c2 in the above equation.

  • scale_weights – Weights for different scales. If None, default weights from the paper will be used. Default weights: (0.0448, 0.2856, 0.3001, 0.2363, 0.1333).

  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default: 'mean'

  • data_range – Maximum value range of images (usually 1.0 or 255).

Examples

>>> loss = MultiScaleSSIMLoss()
>>> input = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> target = torch.rand(3, 3, 256, 256)
>>> output = loss(input, target)
>>> output.backward()

References

Wang, Z., Simoncelli, E. P., Bovik, A. C. (2003). Multi-scale Structural Similarity for Image Quality Assessment. IEEE Asilomar Conference on Signals, Systems and Computers, 37, https://ieeexplore.ieee.org/document/1292216 DOI:10.1109/ACSSC.2003.1292216

Wang, Z., Bovik, A. C., Sheikh, H. R., & Simoncelli, E. P. (2004). Image quality assessment: From error visibility to structural similarity. IEEE Transactions on Image Processing, 13, 600-612. https://ece.uwaterloo.ca/~z70wang/publications/ssim.pdf, DOI:10.1109/TIP.2003.819861

Note

The size of the image should be at least (kernel_size - 1) * 2 ** (levels - 1) + 1.

forward(x: Tensor, y: Tensor) Tensor

Computation of Multi-scale Structural Similarity (MS-SSIM) index as a loss function. For colour images channel order is RGB.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\) or \((N, C, H, W, 2)\).

  • y – A target tensor. Shape \((N, C, H, W)\) or \((N, C, H, W, 2)\).

Returns:

Value of MS-SSIM loss to be minimized, i.e. 1 - ms_ssim in [0, 1] range. In case of 5D tensor, complex value is returned as a tensor of size 2.

Information Content Weighted Structural Similarity (IW-SSIM)

class piq.InformationWeightedSSIMLoss(data_range: Union[int, float] = 1.0, kernel_size: int = 11, kernel_sigma: float = 1.5, k1: float = 0.01, k2: float = 0.03, parent: bool = True, blk_size: int = 3, sigma_nsq: float = 0.4, scale_weights: Optional[Tensor] = None, reduction: str = 'mean')

Creates a criterion that measures the Interface of Information Content Weighted Structural Similarity (IW-SSIM) index error betweeneach element in the input \(x\) and target \(y\).

Inputs supposed to be in range [0, data_range].

If reduction is not 'none' (default 'mean'), then:

\[\begin{split}InformationWeightedSSIMLoss(x, y) = \begin{cases} \operatorname{mean}(1 - IWSSIM), & \text{if reduction} = \text{'mean';}\\ \operatorname{sum}(1 - IWSSIM), & \text{if reduction} = \text{'sum'.} \end{cases}\end{split}\]
Parameters:
  • data_range – Maximum value range of images (usually 1.0 or 255).

  • kernel_size – The side-length of the sliding window used in comparison. Must be an odd value.

  • kernel_sigma – Sigma of normal distribution for sliding window used in comparison.

  • k1 – Algorithm parameter, K1 (small constant).

  • k2 – Algorithm parameter, K2 (small constant). Try a larger K2 constant (e.g. 0.4) if you get a negative or NaN results.

  • parent – Flag to control dependency on previous layer of pyramid.

  • blk_size – The side-length of the sliding window used in comparison for information content.

  • sigma_nsq – Sigma of normal distribution for sliding window used in comparison for information content.

  • scale_weights – Weights for scaling.

  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

Examples

>>> loss = InformationWeightedSSIMLoss()
>>> input = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> target = torch.rand(3, 3, 256, 256)
>>> output = loss(input, target)
>>> output.backward()

References

Wang, Zhou, and Qiang Li.. Information content weighting for perceptual image quality assessment. IEEE Transactions on image processing 20.5 (2011): 1185-1198. https://ece.uwaterloo.ca/~z70wang/publications/IWSSIM.pdf DOI:10.1109/TIP.2010.2092435

forward(x: Tensor, y: Tensor) Tensor

Computation of Information Content Weighted Structural Similarity (IW-SSIM) index as a loss function. For colour images channel order is RGB.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Value of IW-SSIM loss to be minimized, i.e. 1 - information_weighted_ssim in [0, 1] range.

Visual Information Fidelity (VIFp)

class piq.VIFLoss(sigma_n_sq: float = 2.0, data_range: Union[int, float] = 1.0, reduction: str = 'mean')

Creates a criterion that measures the Visual Information Fidelity loss between predicted (x) and target (y) image. In order to be considered as a loss, value 1 - clip(VIF, min=0, max=1) is returned.

Parameters:
  • sigma_n_sq – HVS model parameter (variance of the visual noise).

  • data_range – Maximum value range of images (usually 1.0 or 255).

  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

Examples

>>> loss = VIFLoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

H. R. Sheikh and A. C. Bovik, “Image information and visual quality,” IEEE Transactions on Image Processing, vol. 15, no. 2, pp. 430-444, Feb. 2006 https://ieeexplore.ieee.org/abstract/document/1576816/ DOI: 10.1109/TIP.2005.859378.

forward(x: Tensor, y: Tensor) Tensor

Computation of Visual Information Fidelity (VIF) index as a loss function. Colour images are expected to have RGB channel order. Order of inputs is important! First tensor must contain distorted images, second reference images.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Value of VIF loss to be minimized in [0, 1] range.

Feature Similarity Index Measure (FSIM)

class piq.FSIMLoss(reduction: str = 'mean', data_range: Union[int, float] = 1.0, chromatic: bool = True, scales: int = 4, orientations: int = 4, min_length: int = 6, mult: int = 2, sigma_f: float = 0.55, delta_theta: float = 1.2, k: float = 2.0)

Creates a criterion that measures the FSIM or FSIMc for input \(x\) and target \(y\).

In order to be considered as a loss, value 1 - clip(FSIM, min=0, max=1) is returned. If you need FSIM value, use function fsim instead. Supports greyscale and colour images with RGB channel order.

Parameters:
  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • data_range – Maximum value range of images (usually 1.0 or 255).

  • chromatic – Flag to compute FSIMc, which also takes into account chromatic components

  • scales – Number of wavelets used for computation of phase congruensy maps

  • orientations – Number of filter orientations used for computation of phase congruensy maps

  • min_length – Wavelength of smallest scale filter

  • mult – Scaling factor between successive filters

  • sigma_f – Ratio of the standard deviation of the Gaussian describing the log Gabor filter’s transfer function in the frequency domain to the filter center frequency.

  • delta_theta – Ratio of angular interval between filter orientations and the standard deviation of the angular Gaussian function used to construct filters in the frequency plane.

  • k – No of standard deviations of the noise energy beyond the mean at which we set the noise threshold point, below which phase congruency values get penalized.

Examples

>>> loss = FSIMLoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

L. Zhang, L. Zhang, X. Mou and D. Zhang, “FSIM: A Feature Similarity Index for Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 20, no. 8, pp. 2378-2386, Aug. 2011, doi: 10.1109/TIP.2011.2109730. https://ieeexplore.ieee.org/document/5705575

forward(x: Tensor, y: Tensor) Tensor

Computation of FSIM as a loss function.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Value of FSIM loss to be minimized in [0, 1] range.

Spectral Residual based Similarity Measure (SR-SIM)

class piq.SRSIMLoss(reduction: str = 'mean', data_range: Union[int, float] = 1.0, chromatic: bool = False, scale: float = 0.25, kernel_size: int = 3, sigma: float = 3.8, gaussian_size: int = 10)

Creates a criterion that measures the SR-SIM or SR-SIMc for input \(x\) and target \(y\).

In order to be considered as a loss, value 1 - clip(SR-SIM, min=0, max=1) is returned. If you need SR-SIM value, use function srsim instead.

Parameters:
  • reduction – Specifies the reduction to apply to the output: 'none' | 'mean' | 'sum'. 'none': no reduction will be applied, 'mean': the sum of the output will be divided by the number of elements in the output, 'sum': the output will be summed. Default: 'mean'

  • data_range – The difference between the maximum and minimum of the pixel value, i.e., if for image x it holds min(x) = 0 and max(x) = 1, then data_range = 1. The pixel value interval of both input and output should remain the same.

  • chromatic – Flag to compute SRSIMc, which also takes into account chromatic components

  • scale – Resizing factor used in saliency map computation

  • kernel_size – Kernel size of average blur filter used in saliency map computation

  • sigma – Sigma of gaussian filter applied on saliency map

  • gaussian_size – Size of gaussian filter applied on saliency map

Shape:
  • Input: Required to be 2D (H, W), 3D (C, H, W) or 4D (N, C, H, W). RGB channel order for colour images.

  • Target: Required to be 2D (H, W), 3D (C, H, W) or 4D (N, C, H, W). RGB channel order for colour images.

Examples:

>>> loss = SRSIMLoss()
>>> prediction = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> target = torch.rand(3, 3, 256, 256)
>>> output = loss(prediction, target)
>>> output.backward()

References

https://sse.tongji.edu.cn/linzhang/ICIP12/ICIP-SR-SIM.pdf

forward(prediction: Tensor, target: Tensor) Tensor

Computation of SR-SIM as a loss function. :param prediction: Tensor of prediction of the network. :param target: Reference tensor.

Returns:

Value of SR-SIM loss to be minimized. 0 <= SR-SIM <= 1.

Gradient Magnitude Similarity Deviation (GMSD)

class piq.GMSDLoss(reduction: str = 'mean', data_range: Union[int, float] = 1.0, t: float = 0.00261437908496732)

Creates a criterion that measures Gradient Magnitude Similarity Deviation between each element in the input and target.

Parameters:
  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • data_range – Maximum value range of images (usually 1.0 or 255).

  • t – Constant from the reference paper numerical stability of similarity map

Examples

>>> loss = GMSDLoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

Wufeng Xue et al. Gradient Magnitude Similarity Deviation (2013) https://arxiv.org/pdf/1308.3052.pdf

forward(x: Tensor, y: Tensor) Tensor

Computation of Gradient Magnitude Similarity Deviation (GMSD) as a loss function. Supports greyscale and colour images with RGB channel order.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Value of GMSD loss to be minimized in [0, 1] range.

Multi-Scale Gradient Magnitude Similarity Deviation (MS-GMSD)

class piq.MultiScaleGMSDLoss(reduction: str = 'mean', data_range: Union[int, float] = 1.0, scale_weights: Optional[Tensor] = None, chromatic: bool = False, alpha: float = 0.5, beta1: float = 0.01, beta2: float = 0.32, beta3: float = 15.0, t: float = 170)

Creates a criterion that measures multi scale Gradient Magnitude Similarity Deviation between each element in the input \(x\) and target \(y\).

Parameters:
  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • data_range – Maximum value range of images (usually 1.0 or 255).

  • scale_weights – Weights for different scales. Can contain any number of floating point values. By default weights are initialized with values from the paper.

  • chromatic – Flag to use MS-GMSDc algorithm from paper. It also evaluates chromatic components of the image. Default: True

  • beta1 – Algorithm parameter. Weight of chromatic component in the loss.

  • beta2 – Algorithm parameter. Small constant, references.

  • beta3 – Algorithm parameter. Small constant, references.

  • t – Constant from the reference paper numerical stability of similarity map

Examples

>>> loss = MultiScaleGMSDLoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

Bo Zhang et al. Gradient Magnitude Similarity Deviation on Multiple Scales (2017). http://www.cse.ust.hk/~psander/docs/gradsim.pdf

forward(x: Tensor, y: Tensor) Tensor

Computation of Multi Scale GMSD index as a loss function. Supports greyscale and colour images with RGB channel order. The height and width should be at least 2 ** scales + 1.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Value of MS-GMSD loss to be minimized in [0, 1] range.

Visual Saliency-induced Index (VSI)

class piq.VSILoss(reduction: str = 'mean', c1: float = 1.27, c2: float = 386.0, c3: float = 130.0, alpha: float = 0.4, beta: float = 0.02, data_range: Union[int, float] = 1.0, omega_0: float = 0.021, sigma_f: float = 1.34, sigma_d: float = 145.0, sigma_c: float = 0.001)

Creates a criterion that measures Visual Saliency-induced Index error between each element in the input and target.

The sum operation still operates over all the elements, and divides by \(n\).

The division by \(n\) can be avoided if one sets reduction = 'sum'.

Parameters:
  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • data_range – Maximum value range of images (usually 1.0 or 255).

  • c1 – coefficient to calculate saliency component of VSI

  • c2 – coefficient to calculate gradient component of VSI

  • c3 – coefficient to calculate color component of VSI

  • alpha – power for gradient component of VSI

  • beta – power for color component of VSI

  • omega_0 – coefficient to get log Gabor filter at SDSP

  • sigma_f – coefficient to get log Gabor filter at SDSP

  • sigma_d – coefficient to get SDSP

  • sigma_c – coefficient to get SDSP

Examples

>>> loss = VSILoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

L. Zhang, Y. Shen and H. Li, “VSI: A Visual Saliency-Induced Index for Perceptual Image Quality Assessment,” IEEE Transactions on Image Processing, vol. 23, no. 10, pp. 4270-4281, Oct. 2014, doi: 10.1109/TIP.2014.2346028 https://ieeexplore.ieee.org/document/6873260

forward(x, y)

Computation of VSI as a loss function.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Value of VSI loss to be minimized in [0, 1] range.

Note

Both inputs are supposed to have RGB channels order in accordance with the original approach. Nevertheless, the method supports greyscale images, which they are converted to RGB by copying the grey channel 3 times.

DCT Subband Similarity Index (DSS)

class piq.DSSLoss(reduction: str = 'mean', data_range: Union[int, float] = 1.0, dct_size: int = 8, sigma_weight: float = 1.55, kernel_size: int = 3, sigma_similarity: float = 1.5, percentile: float = 0.05)

Creates a criterion that measures the DSS for input \(x\) and target \(y\).

In order to be considered as a loss, value 1 - clip(DSS, min=0, max=1) is returned. If you need DSS value, use function dss instead.

Parameters:
  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • data_range – Maximum value range of images (usually 1.0 or 255).

  • dct_size – Size of blocks in 2D Discrete Cosine Transform. DCT sizes must be in (0, input size].

  • sigma_weight – STD of gaussian that determines the proportion of weight given to low freq and high freq. Default: 1.55

  • kernel_size – Size of gaussian kernel for computing subband similarity. Kernels size must be in (0, input size]. Default: 3

  • sigma_similarity – STD of gaussian kernel for computing subband similarity. Default: 1.5

  • percentile – % in (0,1] of worst similarity scores which should be kept. Default: 0.05

Shape:
  • Input: Required to be 4D (N, C, H, W). RGB channel order for colour images.

  • Target: Required to be 4D (N, C, H, W). RGB channel order for colour images.

Examples::
>>> loss = DSSLoss()
>>> prediction = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> target = torch.rand(3, 3, 256, 256)
>>> output = loss(prediction, target)
>>> output.backward()

References

https://sse.tongji.edu.cn/linzhang/ICIP12/ICIP-SR-SIM.pdf

forward(prediction: Tensor, target: Tensor) Tensor

Computation of DSS as a loss function.

Parameters:
  • prediction – Tensor of prediction of the network.

  • target – Reference tensor.

Returns:

Value of DSS loss to be minimized. 0 <= DSS <= 1.

Haar Perceptual Similarity Index (HaarPSI)

class piq.HaarPSILoss(reduction: Optional[str] = 'mean', data_range: Union[int, float] = 1.0, scales: int = 3, subsample: bool = True, c: float = 30.0, alpha: float = 4.2)

Creates a criterion that measures Haar Wavelet-Based Perceptual Similarity loss between each element in the input and target.

The sum operation still operates over all the elements, and divides by \(n\). The division by \(n\) can be avoided if one sets reduction = 'sum'.

Parameters:
  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • data_range – Maximum value range of images (usually 1.0 or 255).

  • scales – Number of Haar wavelets used for image decomposition.

  • subsample – Flag to apply average pooling before HaarPSI computation. See references for details.

  • c – Constant from the paper. See references for details

  • alpha – Exponent used for similarity maps weightning. See references for details

Examples

>>> loss = HaarPSILoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

R. Reisenhofer, S. Bosse, G. Kutyniok & T. Wiegand (2017) ‘A Haar Wavelet-Based Perceptual Similarity Index for Image Quality Assessment’ http://www.math.uni-bremen.de/cda/HaarPSI/publications/HaarPSI_preprint_v4.pdf

forward(x: Tensor, y: Tensor) Tensor

Computation of HaarPSI as a loss function.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Value of HaarPSI loss to be minimized in [0, 1] range.

Mean Deviation Similarity Index (MDSI)

class piq.MDSILoss(data_range: Union[int, float] = 1.0, reduction: str = 'mean', c1: float = 140.0, c2: float = 55.0, c3: float = 550.0, alpha: float = 0.6, rho: float = 1.0, q: float = 0.25, o: float = 0.25, combination: str = 'sum', beta: float = 0.1, gamma: float = 0.2)

Creates a criterion that measures Mean Deviation Similarity Index (MDSI) error between the prediction \(x\) and target \(y\). Supports greyscale and colour images with RGB channel order.

Parameters:
  • data_range – Maximum value range of images (usually 1.0 or 255).

  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • c1 – coefficient to calculate gradient similarity. Default: 140.

  • c2 – coefficient to calculate gradient similarity. Default: 55.

  • c3 – coefficient to calculate chromaticity similarity. Default: 550.

  • combination – mode to combine gradient similarity and chromaticity similarity: 'sum' | 'mult'.

  • alpha – coefficient to combine gradient similarity and chromaticity similarity using summation.

  • beta – power to combine gradient similarity with chromaticity similarity using multiplication.

  • gamma – to combine gradient similarity and chromaticity similarity using multiplication.

  • rho – order of the Minkowski distance

  • q – coefficient to adjusts the emphasis of the values in image and MCT

  • o – the power pooling applied on the final value of the deviation

Examples

>>> loss = MDSILoss(data_range=1.)
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

Nafchi, Hossein Ziaei and Shahkolaei, Atena and Hedjam, Rachid and Cheriet, Mohamed (2016). Mean deviation similarity index: Efficient and reliable full-reference image quality evaluator. IEEE Ieee Access, 4, 5579–5590. https://arxiv.org/pdf/1608.07433.pdf DOI:10.1109/ACCESS.2016.2604042

Note

The ratio between constants is usually equal \(c_3 = 4c_1 = 10c_2\)

forward(x: Tensor, y: Tensor) Tensor

Computation of Mean Deviation Similarity Index (MDSI) as a loss function.

Both inputs are supposed to have RGB channels order. Greyscale images converted to RGB by copying the grey channel 3 times.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Value of MDSI loss to be minimized in [0, 1] range.

Note

Both inputs are supposed to have RGB channels order in accordance with the original approach. Nevertheless, the method supports greyscale images, which are converted to RGB by copying the grey channel 3 times.

Learned Perceptual Image Patch Similarity (LPIPS)

class piq.LPIPS(replace_pooling: bool = False, distance: str = 'mse', reduction: str = 'mean', mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])

Learned Perceptual Image Patch Similarity metric. Only VGG16 learned weights are supported.

By default expects input to be in range [0, 1], which is then normalized by ImageNet statistics into range [-1, 1]. If no normalisation is required, change mean and std values accordingly.

Parameters:
  • replace_pooling – Flag to replace MaxPooling layer with AveragePooling. See references for details.

  • distance – Method to compute distance between features: 'mse' | 'mae'.

  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • mean – List of float values used for data standardization. Default: ImageNet mean. If there is no need to normalize data, use [0., 0., 0.].

  • std – List of float values used for data standardization. Default: ImageNet std. If there is no need to normalize data, use [1., 1., 1.].

Examples

>>> loss = LPIPS()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

Gatys, Leon and Ecker, Alexander and Bethge, Matthias (2016). A Neural Algorithm of Artistic Style Association for Research in Vision and Ophthalmology (ARVO) https://arxiv.org/abs/1508.06576

Zhang, Richard and Isola, Phillip and Efros, et al. (2018) The Unreasonable Effectiveness of Deep Features as a Perceptual Metric IEEE/CVF Conference on Computer Vision and Pattern Recognition https://arxiv.org/abs/1801.03924 https://github.com/richzhang/PerceptualSimilarity

Perceptual Image-Error Assessment through Pairwise Preference(PieAPP)

class piq.PieAPP(reduction: str = 'mean', data_range: Union[int, float] = 1.0, stride: int = 27, enable_grad: bool = False)

Implementation of Perceptual Image-Error Assessment through Pairwise Preference.

Expects input to be in range [0, data_range] with no normalization and RGB channel order. Input images are cropped into smaller patches. Score for each individual image is mean of it’s patch scores.

Parameters:
  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • data_range – Maximum value range of images (usually 1.0 or 255).

  • stride – Step between cropped patches. Smaller values lead to better quality, but cause higher memory consumption. Default: 27 (sparse sampling in original implementation)

  • enable_grad – Flag to compute gradients. Useful when PieAPP used as a loss. Default: False.

Examples

>>> loss = PieAPP()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

Ekta Prashnani, Hong Cai, Yasamin Mostofi, Pradeep Sen (2018). PieAPP: Perceptual Image-Error Assessment through Pairwise Preference https://arxiv.org/abs/1806.02067

https://github.com/prashnani/PerceptualImageError

forward(x: Tensor, y: Tensor) Tensor

Computation of PieAPP between feature representations of prediction \(x\) and target \(y\) tensors.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Perceptual Image-Error Assessment through Pairwise Preference

get_features(x: Tensor) Tuple[Tensor, Tensor]
Parameters:

x – Tensor. Shape \((N, C, H, W)\).

Returns:

List of features extracted from intermediate layers weights

Deep Image Structure and Texture Similarity(DISTS)

class piq.DISTS(reduction: str = 'mean', mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225])

Deep Image Structure and Texture Similarity metric.

By default expects input to be in range [0, 1], which is then normalized by ImageNet statistics into range [-1, 1]. If no normalisation is required, change mean and std values accordingly.

Parameters:
  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • mean – List of float values used for data standardization. Default: ImageNet mean. If there is no need to normalize data, use [0., 0., 0.].

  • std – List of float values used for data standardization. Default: ImageNet std. If there is no need to normalize data, use [1., 1., 1.].

Examples

>>> loss = DISTS()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

Keyan Ding, Kede Ma, Shiqi Wang, Eero P. Simoncelli (2020). Image Quality Assessment: Unifying Structure and Texture Similarity. https://arxiv.org/abs/2004.07728 https://github.com/dingkeyan93/DISTS

compute_distance(x_features: Tensor, y_features: Tensor) List[Tensor]

Compute structure similarity between feature maps

Parameters:
  • x_features – Features of the input tensor.

  • y_features – Features of the target tensor.

Returns:

Structural similarity distance between feature maps

forward(x: Tensor, y: Tensor) Tensor
Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Deep Image Structure and Texture Similarity loss, i.e. 1-DISTS in range [0, 1].

get_features(x: Tensor) List[Tensor]
Parameters:

x – Input tensor

Returns:

List of features extracted from input tensor

replace_pooling(module: Module) Module

Turn All MaxPool layers into L2Pool

Parameters:

module – Module to change MaxPool into L2Pool

Returns:

Module with L2Pool instead of MaxPool

Style Score

class piq.StyleLoss(feature_extractor: Union[str, Module] = 'vgg16', layers: Collection[str] = ('relu3_3',), weights: List[Union[float, Tensor]] = [1.0], replace_pooling: bool = False, distance: str = 'mse', reduction: str = 'mean', mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225], normalize_features: bool = False, allow_layers_weights_mismatch: bool = False)

Creates Style loss that can be used for image style transfer or as a measure in image to image tasks. Computes distance between Gram matrices of feature maps. Uses pretrained VGG models from torchvision.

By default expects input to be in range [0, 1], which is then normalized by ImageNet statistics into range [-1, 1]. If no normalisation is required, change mean and std values accordingly.

Parameters:
  • feature_extractor – Model to extract features or model name: 'vgg16' | 'vgg19'.

  • layers – List of strings with layer names. Default: 'relu3_3'

  • weights – List of float weight to balance different layers

  • replace_pooling – Flag to replace MaxPooling layer with AveragePooling. See references for details.

  • distance – Method to compute distance between features: 'mse' | 'mae'.

  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • mean – List of float values used for data standardization. Default: ImageNet mean. If there is no need to normalize data, use [0., 0., 0.].

  • std – List of float values used for data standardization. Default: ImageNet std. If there is no need to normalize data, use [1., 1., 1.].

  • normalize_features – If true, unit-normalize each feature in channel dimension before scaling and computing distance. See references for details.

Examples

>>> loss = StyleLoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

Gatys, Leon and Ecker, Alexander and Bethge, Matthias (2016). A Neural Algorithm of Artistic Style Association for Research in Vision and Ophthalmology (ARVO) https://arxiv.org/abs/1508.06576

Zhang, Richard and Isola, Phillip and Efros, et al. (2018) The Unreasonable Effectiveness of Deep Features as a Perceptual Metric IEEE/CVF Conference on Computer Vision and Pattern Recognition https://arxiv.org/abs/1801.03924

compute_distance(x_features: Tensor, y_features: Tensor)

Take L2 or L1 distance between Gram matrices of feature maps depending on distance.

Parameters:
  • x_features – Features of the input tensor.

  • y_features – Features of the target tensor.

Returns:

Distance between Gram matrices

static gram_matrix(x: Tensor) Tensor

Compute Gram matrix for batch of features.

Parameters:

x – Tensor. Shape \((N, C, H, W)\).

Returns:

Gram matrix for given input

Content Score

class piq.ContentLoss(feature_extractor: Union[str, Module] = 'vgg16', layers: Collection[str] = ('relu3_3',), weights: List[Union[float, Tensor]] = [1.0], replace_pooling: bool = False, distance: str = 'mse', reduction: str = 'mean', mean: List[float] = [0.485, 0.456, 0.406], std: List[float] = [0.229, 0.224, 0.225], normalize_features: bool = False, allow_layers_weights_mismatch: bool = False)

Creates Content loss that can be used for image style transfer or as a measure for image to image tasks. Uses pretrained VGG models from torchvision. Expects input to be in range [0, 1] or normalized with ImageNet statistics into range [-1, 1]

Parameters:
  • feature_extractor – Model to extract features or model name: 'vgg16' | 'vgg19'.

  • layers – List of strings with layer names. Default: 'relu3_3'

  • weights – List of float weight to balance different layers

  • replace_pooling – Flag to replace MaxPooling layer with AveragePooling. See references for details.

  • distance – Method to compute distance between features: 'mse' | 'mae'.

  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

  • mean – List of float values used for data standardization. Default: ImageNet mean. If there is no need to normalize data, use [0., 0., 0.].

  • std – List of float values used for data standardization. Default: ImageNet std. If there is no need to normalize data, use [1., 1., 1.].

  • normalize_features – If true, unit-normalize each feature in channel dimension before scaling and computing distance. See references for details.

Examples

>>> loss = ContentLoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> y = torch.rand(3, 3, 256, 256)
>>> output = loss(x, y)
>>> output.backward()

References

Gatys, Leon and Ecker, Alexander and Bethge, Matthias (2016). A Neural Algorithm of Artistic Style Association for Research in Vision and Ophthalmology (ARVO) https://arxiv.org/abs/1508.06576

Zhang, Richard and Isola, Phillip and Efros, et al. (2018) The Unreasonable Effectiveness of Deep Features as a Perceptual Metric IEEE/CVF Conference on Computer Vision and Pattern Recognition https://arxiv.org/abs/1801.03924

compute_distance(x_features: List[Tensor], y_features: List[Tensor]) List[Tensor]

Take L2 or L1 distance between feature maps depending on distance.

Parameters:
  • x_features – Features of the input tensor.

  • y_features – Features of the target tensor.

Returns:

Distance between feature maps

forward(x: Tensor, y: Tensor) Tensor

Computation of Content loss between feature representations of prediction \(x\) and target \(y\) tensors.

Parameters:
  • x – An input tensor. Shape \((N, C, H, W)\).

  • y – A target tensor. Shape \((N, C, H, W)\).

Returns:

Content loss between feature representations

get_features(x: Tensor) List[Tensor]
Parameters:

x – Tensor. Shape \((N, C, H, W)\).

Returns:

List of features extracted from intermediate layers

static normalize(x: Tensor) Tensor

Normalize feature maps in channel direction to unit length.

Parameters:

x – Tensor. Shape \((N, C, H, W)\).

Returns:

Normalized input

replace_pooling(module: Module) Module

Turn All MaxPool layers into AveragePool

Parameters:

module – Module to change MaxPool int AveragePool

Returns:

Module with AveragePool instead MaxPool

No Reference Metrics

Total Variation

class piq.TVLoss(norm_type: str = 'l2', reduction: str = 'mean')

Creates a criterion that measures the total variation of the the given input \(x\).

If norm_type set to 'l2' the loss can be described as:

\[TV(x) = \sum_{N}\sqrt{\sum_{H, W, C}(|x_{:, :, i+1, j} - x_{:, :, i, j}|^2 + |x_{:, :, i, j+1} - x_{:, :, i, j}|^2)}\]

Else if norm_type set to 'l1':

\[TV(x) = \sum_{N}\sum_{H, W, C}(|x_{:, :, i+1, j} - x_{:, :, i, j}| + |x_{:, :, i, j+1} - x_{:, :, i, j}|)\]

where \(N\) is the batch size, C is the channel size.

Parameters:
  • norm_type – one of 'l1' | 'l2' | 'l2_squared'

  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default:'mean'

Examples

>>> loss = TVLoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> output = loss(x)
>>> output.backward()

References

https://www.wikiwand.com/en/Total_variation_denoising

https://remi.flamary.com/demos/proxtv.html

forward(x: Tensor) Tensor

Computation of Total Variation (TV) index as a loss function.

Parameters:

x – An input tensor. Shape \((N, C, H, W)\).

Returns:

Value of TV loss to be minimized.

Blind/Referenceless Image Spatial Quality Evaluator (BRISQUE)

class piq.BRISQUELoss(kernel_size: int = 7, kernel_sigma: float = 1.1666666666666667, data_range: Union[int, float] = 1.0, reduction: str = 'mean', interpolation: str = 'nearest')

Creates a criterion that measures the BRISQUE score for input \(x\). \(x\) is 4D tensor (N, C, H, W). The sum operation still operates over all the elements, and divides by \(n\). The division by \(n\) can be avoided by setting reduction = 'sum'.

Parameters:
  • kernel_size – By default, the mean and covariance of a pixel is obtained by convolution with given filter_size. Must be an odd value.

  • kernel_sigma – Standard deviation for Gaussian kernel.

  • data_range – Maximum value range of images (usually 1.0 or 255).

  • reduction – Specifies the reduction type: 'none' | 'mean' | 'sum'. Default: 'mean'

Examples

>>> loss = BRISQUELoss()
>>> x = torch.rand(3, 3, 256, 256, requires_grad=True)
>>> output = loss(x)
>>> output.backward()

References

Anish Mittal et al. “No-Reference Image Quality Assessment in the Spatial Domain”, https://live.ece.utexas.edu/publications/2012/TIP%20BRISQUE.pdf

Note

The back propagation is not available using torch=1.5.0 due to bug in argmin and argmax backpropagation. Update the torch and torchvision to the latest versions.

forward(x: Tensor) Tensor

Computation of BRISQUE score as a loss function.

Parameters:

x – An input tensor with (N, C, H, W) shape. RGB channel order for colour images.

Returns:

Value of BRISQUE loss to be minimized.

CLIP-IQA

class piq.CLIPIQA(data_range: Union[float, int] = 1.0)

Creates a criterion that measures image quality based on a general notion of text-to-image similarity learned by the CLIP model (Radford et al., 2021) during its large-scale pre-training on a large dataset with paired texts and images.

The method is based on the idea that two antonyms (“Good photo” and “Bad photo”) can be used as anchors in the text embedding space representing good and bad images in terms of their image quality.

After the anchors are defined, one can use them to determine the quality of a given image in the following way: 1. Compute the image embedding of the image of interest using the pre-trained CLIP model; 2. Compute the text embeddings of the selected anchor antonyms; 3. Compute the angle (cosine similarity) between the image embedding (1) and both text embeddings (2); 4. Compute the Softmax of cosine similarities (3) -> CLIP-IQA score (Wang et al., 2022).

This method is proposed to eliminate the linguistic ambiguity of the naive approach (using a single prompt, e.g., “Good photo”).

This method has an extension called CLIP-IQA+ proposed in the same research paper. It uses the same approach but also fine-tunes the CLIP weights using the CoOp fine-tuning algorithm (Zhou et al., 2022).

Note

The initial computation of the metric is performed in float32 and other dtypes (i.e. float16, float64) are not supported. We preserve this behaviour for reproducibility perposes. Also, at the time of writing conv2d is not supported for float16 tensors on CPU.

Warning

In order to avoid implicit dtype conversion and normalization of input tensors, they are copied. Note that it may consume extra memory, which might be noticeable on large batch sizes.

Parameters:

data_range – Maximum value range of images (usually 1.0 or 255).

Examples

>>> from piq import CLIPIQA
>>> clipiqa = CLIPIQA()
>>> x = torch.rand(1, 3, 224, 224)
>>> score = clipiqa(x)

References

Radford, Alec, et al. “Learning transferable visual models from natural language supervision.” International conference on machine learning. PMLR, 2021.

Wang, Jianyi, Kelvin CK Chan, and Chen Change Loy. “Exploring CLIP for Assessing the Look and Feel of Images.” arXiv preprint arXiv:2207.12396 (2022).

Zhou, Kaiyang, et al. “Learning to prompt for vision-language models.” International Journal of Computer Vision 130.9 (2022): 2337-2348.

forward(x_input: Tensor) Tensor

Computation of CLIP-IQA metric for a given image \(x\).

Parameters:

x – An input tensor. Shape \((N, C, H, W)\). The metric is designed in such a way that it expects: - A 4D PyTorch tensor; - The tensor might have flexible data ranges depending on data_range value; - The tensor must have channels first format.

Returns:

The value of CLI-IQA score in [0, 1] range.

Feature Metrics

Inseption Score (IS)

class piq.IS(num_splits: int = 10, distance: str = 'l1')

Creates a criterion that measures difference of Inception Score between two datasets.

IS is computed separately for predicted \(x\) and target \(y\) features and expects raw InceptionV3 model logits as inputs.

Parameters:
  • num_splits – Number of parts to divide features. IS is computed for them separately and results are then averaged.

  • distance – How to measure distance between scores: 'l1' | 'l2'. Default: 'l1'.

Examples

>>> is_metric = IS()
>>> x_feats = torch.rand(10000, 1024)
>>> y_feats = torch.rand(10000, 1024)
>>> is: torch.Tensor = is_metric(x_feats, y_feats)

References

“A Note on the Inception Score” https://arxiv.org/pdf/1801.01973.pdf

compute_metric(x_features: Tensor, y_features: Tensor) Tensor

Compute IS.

Both features should have shape (N_samples, encoder_dim).

Parameters:
  • x_features – Samples from data distribution. Shape \((N_x, D)\)

  • y_features – Samples from data distribution. Shape \((N_y, D)\)

Returns:

L1 or L2 distance between scores for datasets \(x\) and \(y\).

Frechet Inception Distance (FID)

class piq.FID

Interface of Frechet Inception Distance. It’s computed for a whole set of data and uses features from encoder instead of images itself to decrease computation cost. FID can compare two data distributions with different number of samples. But dimensionalities should match, otherwise it won’t be possible to correctly compute statistics.

Examples

>>> fid_metric = FID()
>>> x_feats = torch.rand(10000, 1024)
>>> y_feats = torch.rand(10000, 1024)
>>> fid: torch.Tensor = fid_metric(x_feats, y_feats)

References

Heusel M. et al. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems, https://arxiv.org/abs/1706.08500

compute_metric(x_features: Tensor, y_features: Tensor) Tensor

Fits multivariate Gaussians: \(X \sim \mathcal{N}(\mu_x, \sigma_x)\) and \(Y \sim \mathcal{N}(\mu_y, \sigma_y)\) to image stacks. Then computes FID as \(d^2 = ||\mu_x - \mu_y||^2 + Tr(\sigma_x + \sigma_y - 2\sqrt{\sigma_x \sigma_y})\).

Parameters:
  • x_features – Samples from data distribution. Shape \((N_x, D)\)

  • y_features – Samples from data distribution. Shape \((N_y, D)\)

Returns:

The Frechet Distance.

Geometry Score (GS)

class piq.GS(sample_size: int = 64, num_iters: int = 1000, gamma: Optional[float] = None, i_max: int = 100, num_workers: int = 4)

Interface of Geometry Score. It’s computed for a whole set of data and can use features from encoder instead of images itself to decrease computation cost. GS can compare two data distributions with different number of samples. Dimensionalities of features should match, otherwise it won’t be possible to correctly compute statistics.

Parameters:
  • sample_size – Number of landmarks to use on each iteration. Higher values can give better accuracy, but increase computation cost.

  • num_iters – Number of iterations. Higher values can reduce variance, but increase computation cost.

  • gamma – Parameter determining maximum persistence value. Default is 1.0 / 128 * N_imgs / 5000

  • i_max – Upper bound on i in RLT(i, 1, X, L)

  • num_workers – Number of processes used for GS computation.

Examples

>>> gs_metric = GS()
>>> x_feats = torch.rand(10000, 1024)
>>> y_feats = torch.rand(10000, 1024)
>>> gs: torch.Tensor = gs_metric(x_feats, y_feats)

References

Khrulkov V., Oseledets I. (2018). Geometry score: A method for comparing generative adversarial networks. arXiv preprint, 2018. https://arxiv.org/abs/1802.02664

Note

Computation is heavily CPU dependent, adjust num_workers parameter according to your system configuration. GS metric requiers gudhi library which is not installed by default. For conda, write: conda install -c conda-forge gudhi, otherwise follow installation guide: http://gudhi.gforge.inria.fr/python/latest/installation.html

compute_metric(x_features: Tensor, y_features: Tensor) Tensor

Implements Algorithm 2 from the paper.

Parameters:
  • x_features – Samples from data distribution. Shape \((N_x, D)\)

  • y_features – Samples from data distribution. Shape \((N_y, D)\)

Returns:

Scalar value of the distance between distributions.

Kernel Inception Distance (KID)

class piq.KID(degree: int = 3, gamma: Optional[float] = None, coef0: int = 1, var_at_m: Optional[int] = None, average: bool = False, n_subsets: int = 50, subset_size: Optional[int] = 1000, ret_var: bool = False)

Interface of Kernel Inception Distance. It’s computed for a whole set of data and uses features from encoder instead of images itself to decrease computation cost. KID can compare two data distributions with different number of samples. But dimensionalities should match, otherwise it won’t be possible to correctly compute statistics.

Parameters:
  • degree – Degree of a polynomial functions used in kernels. Default: 3

  • gamma – Kernel parameter. See paper for details

  • coef0 – Kernel parameter. See paper for details

  • var_at_m – Kernel variance. Default is None

  • average – If True recomputes metric n_subsets times using subset_size elements.

  • n_subsets – Number of repeats. Ignored if average is False

  • subset_size – Size of each subset for repeat. Ignored if average is False

  • ret_var – Whether to return variance after the distance is computed. This function will return Tuple[torch.Tensor, torch.Tensor] in this case. Default: False

Examples

>>> kid_metric = KID()
>>> x_feats = torch.rand(10000, 1024)
>>> y_feats = torch.rand(10000, 1024)
>>> kid: torch.Tensor = kid_metric(x_feats, y_feats)

References

Demystifying MMD GANs https://arxiv.org/abs/1801.01401

compute_metric(x_features: Tensor, y_features: Tensor) Union[Tensor, Tuple[Tensor, Tensor]]

Computes KID (polynomial MMD) for given sets of features, obtained from Inception net or any other feature extractor. Samples must be in range [0, 1].

Parameters:
  • x_features – Samples from data distribution. Shape \((N_x, D)\)

  • y_features – Samples from data distribution. Shape \((N_y, D)\)

Returns:

KID score and variance (optional).

Multi-Scale Intrinsic Distance (MSID)

class piq.MSID(ts: Optional[Tensor] = None, k: int = 5, m: int = 10, niters: int = 100, rademacher: bool = False, normalized_laplacian: bool = True, normalize: str = 'empty', msid_mode: str = 'max')

Creates a criterion that measures MSID score for two batches of images It’s computed for a whole set of data and uses features from encoder instead of images itself to decrease computation cost. MSID can compare two data distributions with different number of samples or different dimensionalities.

Parameters:
  • ts – Temperature values. If None, the default value torch.logspace(-1, 1, 256) is used.

  • k – Number of neighbours for graph construction.

  • m – Lanczos steps in SLQ.

  • niters – Number of starting random vectors for SLQ.

  • rademacher – True to use Rademacher distribution, False - standard normal for random vectors in Hutchinson.

  • normalized_laplacian – if True, use normalized Laplacian.

  • normalize'empty' for average heat kernel (corresponds to the empty graph normalization of NetLSD), 'complete' for the complete, 'er' for Erdos-Renyi normalization, 'none' for no normalization

  • msid_mode'l2' to compute the L2 norm of the distance between msid1 and msid2; 'max' to find the maximum absolute difference between two descriptors over temperature

Examples

>>> msid_metric = MSID()
>>> x_feats = torch.rand(10000, 1024)
>>> y_feats = torch.rand(10000, 1024)
>>> msid: torch.Tensor = msid_metric(x_feats, y_feats)

References

Tsitsulin, A., Munkhoeva, M., Mottin, D., Karras, P., Bronstein, A., Oseledets, I., & Müller, E. (2019). The shape of data: Intrinsic distance for data distributions. https://arxiv.org/abs/1905.11141

compute_metric(x_features: Tensor, y_features: Tensor) Tensor

Compute MSID score between two sets of samples.

Parameters:
  • x_features – Samples from data distribution. Shape \((N_x, D_x)\)

  • y_features – Samples from data distribution. Shape \((N_y, D_y)\)

Returns:

Scalar value of the distance between distributions.

Improved Precision and Recall (P&R)

class piq.PR(nearest_k: int = 5)

Interface of Improved Precision and Recall. It’s computed for a whole set of data and uses features from encoder instead of images itself to decrease computation cost. Precision and Recall can compare two data distributions with different number of samples. But dimensionalities should match, otherwise it won’t be possible to correctly compute statistics.

Parameters:

nearest_k – Nearest neighbor to compute the non-parametric representation. Shape \(1\)

Examples

>>> pr_metric = PR()
>>> x_feats = torch.rand(10000, 1024)
>>> y_feats = torch.rand(10000, 1024)
>>> precision, recall = pr_metric(x_feats, y_feats)

References

Kynkäänniemi T. et al. (2019). Improved Precision and Recall Metric for Assessing Generative Models. Advances in Neural Information Processing Systems, https://arxiv.org/abs/1904.06991

compute_metric(real_features: Tensor, fake_features: Tensor) Tuple[Tensor, Tensor]

Creates non-parametric representations of the manifolds of real and generated data and computes the precision and recall between them.

Parameters:
  • real_features – Samples from data distribution. Shape \((N_x, D)\)

  • fake_features – Samples from fake distribution. Shape \((N_x, D)\)

Returns:

Scalar value of the precision of the generated images.

Scalar value of the recall of the generated images.