DIFFNMR: DIFFUSION MODELS FOR NUCLEAR MAGNETIC RESONANCE SPECTRA ELUCIDATION
DIFFNMR: DIFFUSION MODELS FOR NUCLEAR MAGNETIC RESONANCE SPECTRA ELUCIDATION
-
摘要: Nuclear Magnetic Resonance (NMR) spectroscopy is a key method for molecular structure elucidation. However, interpreting NMR spectra to deduce molecular structures remains challenging due to the complexity of spectral data and the vastness of the chemical space. Here we introduce DiffNMR, a novel end-to-end framework that leverages a conditional discrete diffusion model for de novo molecular structure elucidation from NMR spectra. DiffNMR refines molecular graphs iteratively through a diffusion-based generative process, ensuring global consistency and mitigating error accumulation inherent in autoregressive methods. The framework integrates a two-stage pretraining strategy that aligns spectral and molecular representations via a diffusion autoencoder (Diff-AE) and contrastive learning. It also incorporates retrieval initialization and similarity filtering during inference. Our experimental results demonstrate that DiffNMR achieves competitive performance for NMR-based structure elucidation, especially outperforming autoregressive models in domain generalization and robustness, thereby offering an efficient and robust solution for automated molecular analysis.Abstract: Nuclear Magnetic Resonance (NMR) spectroscopy is a key method for molecular structure elucidation. However, interpreting NMR spectra to deduce molecular structures remains challenging due to the complexity of spectral data and the vastness of the chemical space. Here we introduce DiffNMR, a novel end-to-end framework that leverages a conditional discrete diffusion model for de novo molecular structure elucidation from NMR spectra. DiffNMR refines molecular graphs iteratively through a diffusion-based generative process, ensuring global consistency and mitigating error accumulation inherent in autoregressive methods. The framework integrates a two-stage pretraining strategy that aligns spectral and molecular representations via a diffusion autoencoder (Diff-AE) and contrastive learning. It also incorporates retrieval initialization and similarity filtering during inference. Our experimental results demonstrate that DiffNMR achieves competitive performance for NMR-based structure elucidation, especially outperforming autoregressive models in domain generalization and robustness, thereby offering an efficient and robust solution for automated molecular analysis.
下载: