Grayscale to Hyperspectral at Any Resolution Using a Phase-Only Lens
Abstract
We consider the problem of reconstructing a HxWx31 hyperspectral image from a HxW grayscale snapshot measurement that is captured using a single diffractive optic and a filterless panchromatic photosensor. This problem is severely ill-posed, and we present the first model that is able to produce high-quality results. We train a conditional denoising diffusion model that maps a small grayscale measurement patch to a hyperspectral patch. We then deploy the model to many patches in parallel, using global physics-based guidance to synchronize the patch predictions. Our model can be trained using small hyperspectral datasets and then deployed to reconstruct hyperspectral images of arbitrary size. Also, by drawing multiple samples with different seeds, our model produces useful uncertainty maps. We show that our model achieves state-of-the-art performance on previous snapshot hyperspectral benchmarks where reconstruction is better conditioned. Our work lays the foundation for a new class of high-resolution hyperspectral imagers that are compact and light-efficient.
Hyperspectral Imaging via Camera Guided Diffusion
A diffractive metasurface lens introduces purposeful chromatic abberation, smearing the spectral information from a point in the scene across many pixels at the photosensor. This produces a simple but useful optical encoding of the high-dimensional hyperspectral cube (right) to a grayscale measurement (left). We show for the first time that the inverse, grayscale-to-hyperspectral reconstruction problem can be approximately solved without the need for complex multi-component optics or multiple measurements with spectral filters. The use of a simple lens and filterless sensor results in ambiguity where the grayscale measurement could map to many distinct but equally plausible hyperspectral cubes. Nonetheless, we obtain high quality reconstructions by leveraging a denoising diffusion model to learn the distribution of solutions.
We frame the reconstruction problem as a patch-to-patch translation task. We train our diffusion model to generate small 64x64x31 hyperspectral image patches, conditioned on 64x64 pixel measurement patches. This allows us to train with limited, real HSI datasets (900 images). Once trained, our model is applied to reconstruct measurements of any size. This is done by splitting the measurement into patches, processing all patches in parallel, and then enforcing consistency across patch predictions at each denoising step using diffusion guidance. Our guidance enforces that the stitched, full-size hyperspectral prediction projects back to the captured measuerment when rendered with the camera's known optical response.
We note that the success in reconstructing hyperspectral images by processing the measurement in patches is initially surprising. The measurement is formed by a convolution with the spatially-extended point-spread function kernel. Consequently, relevent signal about a target HSI patch is partially scattered outside of the co-aligned measurement patch. In addition, extraneous signal from neighboring HSI patches is scattered into the measurement patch. This makes a patch-based reconstruction algorithm highly ill-posed. Despite this fact, we find that our patch model outperforms existing deep models that process full measurements directly. We show that there is value in concetrating neural capacities to smaller regions using a diffusion model and to instead synthesize full-size predictions by tying together the patch predictions using guidance.
In contrast to RGB diffusion models that denoise only three image channels, our hyperspectral diffusion model is trained to denoise 31 channels corresponding to the scene radiance on narrow-band wavelengths from 400 to 700 nm. Below we show the generated HSI reconstructions from our trained model, displayed for several wavelength channels as the sampling algorithm iterates through time. Since the predictions change with different initial noise seeds, we also generate an uncertainty map by computing the total spectral variation per-pixel after repeated draws. We find that our hyperspectral diffusion model produces accurate spectral reconstructions. The spectral channels highlight different features and objects in the scene and provides a deeper representation than RGB images.
HSI Reconstructions on the ARAD1K Dataset
We benchmark our algorithm against existing grayscale to hyperspectral reconstruction networks using the ARAD1K dataset. Although our patch-based model can handle measurement of any sizes, we first train/test on 256x256 pixel measurements to enable comparison with previous models that are limited to small images. In the paper, we demonstrate that our model can directly reconstruct larger 1280x1280 and 1280x1536 pixel measurements. Additional studies are discussed in the paper, considering RGB filters and different optics used to capture measurements.
In the sliders below, we show the rendered grayscale measurements that are used as conditioning for our diffusion model. Each measurement is simulated using the metasurface lens and has significant chromatic aberration. To reconstruct a prediction of the true hyperspectral images, each measurement is split into 16 64x64 pixel patches during inference time and processed in parallel. The reconstructed hyperspectral image is visualized projected to RGB colorspace. We are able to restore fine spatial features and dense spectral information for every pixel, using only simple optical cues.
Our work is the first to demonstrate that hyperspectral images can be restored from a grayscale measurement captured using a single, simple lens. Moreover, we found that no existing network can succesfully solve this problem. We qualitatively display our findings below, showing the reconstructed hyperspectral images (viewed in RGB colorspace) reconstructed by our model vs previous networks.
1280x1280 HSI Reconstruction on ICVL Dataset
Here we highlight the unique ability of our model to process measurements of any size by splitting the measurement into smaller chunks. Our diffusion model only requires the memory and resources to process one 64x64 patch at any given moment. This enables high-resolution reconstructions on a commodity GPU. In contrast, most existing methods would attempt to reconstruct the full HSI at once, requiring significantly more memory and compute.
Below, we show two 1280x1280 resolution scenes from the ICVL dataset. The grayscale measurement is dipslayed in the left images (red grid denotes patches). The reconstructed HSI is viewed in RGB space in the middle and the line plot on the right shows the dense spectral curve for each highlighted pixel. We generally observe accurate spectral restorations for natural scenes without sacrificing FOV or resolution.