Internally Referenced Low-Light Enhancement

Figure 1: Internal physical and structural references. Top: Internal Physical Reference. A local exposure-simulated pseudo-GT provides an internal physical reference for global color and brightness restoration. Bottom: Internal Structural References. Our spectral-domain design provides internal structural references that preserve structures and suppress noise without the blur artifacts caused by spatial misalignment.

Abstract

Self-supervised low-light image enhancement (LLIE) is highly appealing as it eliminates the reliance on external paired data. However, the lack of external references causes networks to struggle with decoupling entangled illumination, delicate textures, and amplified noise. To resolve this challenge, we propose an Internally Referenced LLIE framework that extracts reliable physical and structural references from the degraded input image itself.

First, we introduce a local exposure-simulated scheme to extract a low-frequency pseudo ground-truth. This serves as an internal physical reference to guide global illumination estimation and correct color casts. Second, we propose a dual-domain preservation strategy with spatial and spectral constraints to construct internal structural references. Specifically, an Illumination-Aligned Perceptual loss preserves global structures under illumination shifts, while a Shift-Invariant Spectral Correlation loss captures fine-grained local structures and suppresses high-frequency noise. Finally, we propose a Gain-Adaptive Feature Modulation (GAFM) mechanism to address highly spatially-variant residual noise. Extensive experiments demonstrate that our method achieves state-of-the-art performance, delivering superior noise suppression and textural fidelity.

Method Overview

Our philosophy is that, although a degraded input image lacks an external normal-light counterpart, it still contains sufficient physical and structural cues to guide the LLIE process. The framework is built upon three core components:

Internal Physical Reference: A local exposure-simulated scheme featuring robust white balancing and adaptive shadow desaturation extracts a pseudo-GT to guide global color and brightness restoration.
Internal Structural References: A dual-domain collaborative strategy utilizes an Illumination-Aligned Perceptual (IAP) loss to preserve global topology, and a Shift-Invariant Spectral Correlation (SISC) loss to capture fine-grained local textures without inducing spatial blur.
Gain-Guided Blind-Spot Denoising: A Gain-Adaptive Feature Modulation (GAFM) mechanism dynamically translates the self-estimated illumination map into an internal spatial gain prior, enabling spatially-aware denoising.

Figure 2: Overview of our IRLE. (a) Stage 1: Illumination estimation and structure extraction via a Dual-Domain Collaborative Retinex network, guided by a local exposure-simulated pseudo-GT. (b) Stage 2: Gain-guided blind-spot denoising, which handles spatially-variant noise. (c) The detailed architecture of the Gain-Aware Block. (d) The Gain-Adaptive Feature Modulation (GAFM) module.

Theoretical Analysis

We further analyze why internally referenced supervision improves both illumination recovery and structural preservation. The luminance statistics show whether an enhanced image follows a natural normal-light distribution, while the frequency-band visualization explains why the spectral constraint can preserve texture without inheriting amplified noise.

Luminance distribution comparison with CLIP-LIT on LOLv1

(a) CLIP-LIT (LOLv1)

Luminance distribution comparison with RetinexDIP on LOLv1

(b) RetinexDIP (LOLv1)

Luminance distribution comparison with Li et al. on LOLv2-Real

(c) Li et al. (LOLv2-Real)

Figure 4: Luminance distribution comparisons via kernel density estimation (KDE). The orange dashed line represents our method, which consistently aligns with the Ground Truth distribution, whereas competitors show noticeable distribution shifts.

Figure 5: Cross-Frequency Correlation (CFC) analysis. The CFC matrices reveal that mid-frequency bands preserve stable structural correlations between low-light and normal-light images, while extreme low- and high-frequency bands are more affected by illumination degradation and amplified noise. This motivates SISC to retain reliable texture bands and suppress noise-dominated components.

Quantitative Evaluation

Extensive experiments demonstrate that our internally referenced framework achieves state-of-the-art PSNR and competitive SSIM across multiple standard benchmarks. The table reports the full LOL quantitative comparison from the main paper, including both Normal and GT-Mean metrics.

Table 1: Quantitative comparison on LOL datasets. Both standard and GT-Mean metrics are reported. The best and second-best results among unsupervised methods are highlighted in bold and underline, respectively.

Method	LOLv1				LOLv2-Real				LOLv2-Synthetic
	Normal		GT-Mean		Normal		GT-Mean		Normal		GT-Mean
	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM	PSNR	SSIM
Zero-DCE (CVPR'20)	14.86	0.559	21.06	0.535	18.06	0.574	20.78	0.542	17.76	0.813	21.50	0.849
RRDNet (ICME'20)	11.46	0.460	18.96	0.484	13.96	0.483	19.15	0.495	14.87	0.657	18.39	0.758
RUAS (CVPR'21)	16.41	0.500	18.65	0.518	15.33	0.488	19.06	0.510	13.40	0.644	17.79	0.695
EnlightenGAN (TIP'21)	17.56	0.665	21.33	0.649	18.68	0.673	21.04	0.663	16.49	0.775	19.32	0.823
RetinexDIP (TCSVT'21)	11.67	0.484	19.74	0.471	14.51	0.521	19.48	0.487	16.01	0.733	20.12	0.800
SCI (CVPR'22)	14.78	0.522	18.97	0.501	17.30	0.534	19.47	0.509	15.43	0.748	18.64	0.788
PSENet (WACV'23)	17.50	0.543	20.93	0.546	17.63	0.531	20.64	0.550	16.62	0.777	20.67	0.824
PairLIE (CVPR'23)	19.51	0.736	23.17	0.753	19.89	0.778	24.03	0.803	19.07	0.797	21.68	0.820
GDP (CVPR'23)	15.82	0.541	19.09	0.578	14.40	0.494	19.32	0.559	12.12	0.497	15.83	0.667
NeRCo (ICCV'23)	19.74	0.743	22.41	0.755	19.66	0.717	23.63	0.750	17.59	0.734	19.66	0.752
CLIP-LIT (ICCV'23)	12.39	0.493	20.03	0.442	15.18	0.529	19.45	0.468	16.19	0.775	20.75	0.817
CoLIE (ECCV'24)	13.76	0.481	20.37	0.479	15.08	0.501	20.22	0.496	14.30	0.654	19.04	0.786
CLODE (ICLR'25)	19.60	0.718	22.59	0.736	17.87	0.681	22.57	0.703	17.21	0.783	20.63	0.797
Li et al. (ICLR'25)	19.82	0.751	23.97	0.779	20.35	0.795	26.14	0.828	17.82	0.802	20.78	0.820
Ours	20.60	0.760	24.65	0.788	20.72	0.792	26.24	0.826	19.62	0.815	22.82	0.833

Efficiency Analysis

Table 2: Computational complexity and performance comparison. The restoration quality (Normal PSNR) is evaluated on the LOLv1 dataset. Inference time and FPS are measured on 400 x 600 inputs using a single NVIDIA RTX 3090 GPU.

Method	PSNR ↑	Params (M) ↓	MACs (G) ↓	RT (ms) ↓	FPS ↑
GDP (CVPR'23)	15.82	552.814	4872.000	1336965.00	0.0007
RRDNet (ICME'20)	11.46	0.128	31.063	21793.18	0.0459
RetinexDIP (TCSVT'21)	11.67	0.707	3.414	11325.17	0.0883
CoLIE (ECCV'24)	13.76	0.133	8.657	1187.32	0.84
Li et al. (ICLR'25)	19.82	0.345	78.730	295.83	3.38
NeRCo (ICCV'23)	19.74	23.046	1136.000	293.67	3.41
CLIP-LIT (ICCV'23)	12.39	0.279	66.670	14.53	68.80
PairLIE (CVPR'23)	19.51	0.342	82.929	13.69	73.06
EnlightenGAN (TIP'21)	17.56	54.410	108.878	5.63	177.76
PSENet (WACV'23)	17.50	0.015	0.557	3.76	265.65
RUAS (CVPR'21)	16.41	0.003	0.795	3.50	285.91
Zero-DCE (CVPR'20)	14.86	0.079	19.008	3.44	290.81
SCI (CVPR'22)	14.78	0.00026	0.130	0.32	3113.86
Ours (Stage 1)	20.39	0.887	32.479	5.09	196.40
Ours (Total)	20.60	2.204	79.139	36.26	27.58

Qualitative Results

Qualitative visual comparisons on LOLv1, including three cases and ten methods for each case

Figure 3: Visual comparisons on LOLv1. Existing methods often exhibit color casts or lose delicate details due to over-smoothing. Our method maintains natural colors and effectively removes spatially-variant noise while preserving structures.

Acknowledgements

This work was partially supported by computational resources from TPU Research Cloud (TRC).