Data Source
Raw waveform data from the Majorana Demonstrator germanium detector array.
The dataset consists of approximately three million high-purity germanium (HPGe) detector waveforms stored in HDF5 format. The data represents a 1% subset of 228Th calibration runs released by the Majorana Demonstrator Collaboration for AI and machine learning applications.
The raw digitized waveform arrays store ADC counts that represent charge collection over time inside the germanium crystal. This raw data is essential for our goal of filtering complex background noise to achieve a near-background-free search. The dataset was released by the Majorana Demonstrator Collaboration on Zenodo.
Data Structure
Format and organization of the raw waveform files.
Waveforms are stored in HDF5 format, with each file containing approximately 65,000 waveforms sampled at 100 MHz.
Each raw waveform is an array of 3,800 digitized ADC counts representing the charge pulse over time.
Each record includes the waveform, onset time (tp0), learning targets, and metadata such as the detector and run number IDs.
Event Labels
Ground-truth annotations used for supervised learning tasks.
Four binary targets (low_avse, high_avse, dcr, and lq) where a 1 indicates an accepted signal-like event and a 0 indicates rejection as background.
The energy_label provides the calibrated event energy in keV, serving as the continuous target for our regression models.