Internally Referenced Low-Light Enhancement

Peiyuan He, Hainuo Wang, Hengxing Liu, Mingjia Li, Xiaojie Guo*
Tianjin University, Tianjin, China
*Corresponding Author
Teaser Figure

Figure 1: Internal physical and structural references. Top: Internal Physical Reference. A local exposure-simulated pseudo-GT provides an internal physical reference for global color and brightness restoration. Bottom: Internal Structural References. Our spectral-domain design provides internal structural references that preserve structures and suppress noise without the blur artifacts caused by spatial misalignment.

Abstract

Self-supervised low-light image enhancement (LLIE) is highly appealing as it eliminates the reliance on external paired data. However, the lack of external references causes networks to struggle with decoupling entangled illumination, delicate textures, and amplified noise. To resolve this challenge, we propose an Internally Referenced LLIE framework that extracts reliable physical and structural references from the degraded input image itself.

First, we introduce a local exposure-simulated scheme to extract a low-frequency pseudo ground-truth. This serves as an internal physical reference to guide global illumination estimation and correct color casts. Second, we propose a dual-domain preservation strategy with spatial and spectral constraints to construct internal structural references. Specifically, an Illumination-Aligned Perceptual loss preserves global structures under illumination shifts, while a Shift-Invariant Spectral Correlation loss captures fine-grained local structures and suppresses high-frequency noise. Finally, we propose a Gain-Adaptive Feature Modulation (GAFM) mechanism to address highly spatially-variant residual noise. Extensive experiments demonstrate that our method achieves state-of-the-art performance, delivering superior noise suppression and textural fidelity.

Method Overview

Our philosophy is that, although a degraded input image lacks an external normal-light counterpart, it still contains sufficient physical and structural cues to guide the LLIE process. The framework is built upon three core components:

  • Internal Physical Reference: A local exposure-simulated scheme featuring robust white balancing and adaptive shadow desaturation extracts a pseudo-GT to guide global color and brightness restoration.
  • Internal Structural References: A dual-domain collaborative strategy utilizes an Illumination-Aligned Perceptual (IAP) loss to preserve global topology, and a Shift-Invariant Spectral Correlation (SISC) loss to capture fine-grained local textures without inducing spatial blur.
  • Gain-Guided Blind-Spot Denoising: A Gain-Adaptive Feature Modulation (GAFM) mechanism dynamically translates the self-estimated illumination map into an internal spatial gain prior, enabling spatially-aware denoising.
Framework Architecture

Figure 2: Overview of our IRLE. (a) Stage 1: Illumination estimation and structure extraction via a Dual-Domain Collaborative Retinex network, guided by a local exposure-simulated pseudo-GT. (b) Stage 2: Gain-guided blind-spot denoising, which handles spatially-variant noise. (c) The detailed architecture of the Gain-Aware Block. (d) The Gain-Adaptive Feature Modulation (GAFM) module.

Theoretical Analysis

We further analyze why internally referenced supervision improves both illumination recovery and structural preservation. The luminance statistics show whether an enhanced image follows a natural normal-light distribution, while the frequency-band visualization explains why the spectral constraint can preserve texture without inheriting amplified noise.

Luminance distribution comparison with CLIP-LIT on LOLv1
(a) CLIP-LIT (LOLv1)
Luminance distribution comparison with RetinexDIP on LOLv1
(b) RetinexDIP (LOLv1)
Luminance distribution comparison with Li et al. on LOLv2-Real
(c) Li et al. (LOLv2-Real)

Figure 4: Luminance distribution comparisons via kernel density estimation (KDE). The orange dashed line represents our method, which consistently aligns with the Ground Truth distribution, whereas competitors show noticeable distribution shifts.

Cross-frequency correlation analysis

Figure 5: Cross-Frequency Correlation (CFC) analysis. The CFC matrices reveal that mid-frequency bands preserve stable structural correlations between low-light and normal-light images, while extreme low- and high-frequency bands are more affected by illumination degradation and amplified noise. This motivates SISC to retain reliable texture bands and suppress noise-dominated components.

Quantitative Evaluation

Extensive experiments demonstrate that our internally referenced framework achieves state-of-the-art PSNR and competitive SSIM across multiple standard benchmarks. The table reports the full LOL quantitative comparison from the main paper, including both Normal and GT-Mean metrics.

Table 1: Quantitative comparison on LOL datasets. Both standard and GT-Mean metrics are reported. The best and second-best results among unsupervised methods are highlighted in bold and underline, respectively.

Method LOLv1 LOLv2-Real LOLv2-Synthetic
Normal GT-Mean Normal GT-Mean Normal GT-Mean
PSNRSSIMPSNRSSIM PSNRSSIMPSNRSSIM PSNRSSIMPSNRSSIM
Zero-DCE (CVPR'20)14.860.55921.060.53518.060.57420.780.54217.760.81321.500.849
RRDNet (ICME'20)11.460.46018.960.48413.960.48319.150.49514.870.65718.390.758
RUAS (CVPR'21)16.410.50018.650.51815.330.48819.060.51013.400.64417.790.695
EnlightenGAN (TIP'21)17.560.66521.330.64918.680.67321.040.66316.490.77519.320.823
RetinexDIP (TCSVT'21)11.670.48419.740.47114.510.52119.480.48716.010.73320.120.800
SCI (CVPR'22)14.780.52218.970.50117.300.53419.470.50915.430.74818.640.788
PSENet (WACV'23)17.500.54320.930.54617.630.53120.640.55016.620.77720.670.824
PairLIE (CVPR'23)19.510.73623.170.75319.890.77824.030.80319.070.79721.680.820
GDP (CVPR'23)15.820.54119.090.57814.400.49419.320.55912.120.49715.830.667
NeRCo (ICCV'23)19.740.74322.410.75519.660.71723.630.75017.590.73419.660.752
CLIP-LIT (ICCV'23)12.390.49320.030.44215.180.52919.450.46816.190.77520.750.817
CoLIE (ECCV'24)13.760.48120.370.47915.080.50120.220.49614.300.65419.040.786
CLODE (ICLR'25)19.600.71822.590.73617.870.68122.570.70317.210.78320.630.797
Li et al. (ICLR'25)19.820.75123.970.77920.350.79526.140.82817.820.80220.780.820
Ours 20.600.76024.650.788 20.720.79226.240.826 19.620.81522.820.833

Efficiency Analysis

Table 2: Computational complexity and performance comparison. The restoration quality (Normal PSNR) is evaluated on the LOLv1 dataset. Inference time and FPS are measured on 400 x 600 inputs using a single NVIDIA RTX 3090 GPU.

Method PSNR ↑ Params (M) ↓ MACs (G) ↓ RT (ms) ↓ FPS ↑
GDP (CVPR'23)15.82552.8144872.0001336965.000.0007
RRDNet (ICME'20)11.460.12831.06321793.180.0459
RetinexDIP (TCSVT'21)11.670.7073.41411325.170.0883
CoLIE (ECCV'24)13.760.1338.6571187.320.84
Li et al. (ICLR'25)19.820.34578.730295.833.38
NeRCo (ICCV'23)19.7423.0461136.000293.673.41
CLIP-LIT (ICCV'23)12.390.27966.67014.5368.80
PairLIE (CVPR'23)19.510.34282.92913.6973.06
EnlightenGAN (TIP'21)17.5654.410108.8785.63177.76
PSENet (WACV'23)17.500.0150.5573.76265.65
RUAS (CVPR'21)16.410.0030.7953.50285.91
Zero-DCE (CVPR'20)14.860.07919.0083.44290.81
SCI (CVPR'22)14.780.000260.1300.323113.86
Ours (Stage 1)20.390.88732.4795.09196.40
Ours (Total)20.602.20479.13936.2627.58

Qualitative Results

Qualitative visual comparisons on LOLv1, including three cases and ten methods for each case

Figure 3: Visual comparisons on LOLv1. Existing methods often exhibit color casts or lose delicate details due to over-smoothing. Our method maintains natural colors and effectively removes spatially-variant noise while preserving structures.

Acknowledgements

This work was partially supported by computational resources from TPU Research Cloud (TRC).