SpectraFormer dataset

Description

This is the dataset used to train SpectraFormer - a transformer-based Machine Learning model aimed for Raman spectra unmixing for graphene buffer layer on SiC substrate.

See more: arXiv paper, GitHub repo.

Each datafile contain the following coordinates: wave_number (Raman shift values, $cm^{^(-1)}$), and spatial coordinates X_0, X_1, and optionally for depth maps also X_2.

Dataset Structure

The dataset is organized into three sample categories:

  • 4H-SiC-Piranha/ - 4H SiC polytype
  • 6H_spectra_20250423/ - 6H SiC polytype subdivided by acquisition parameters (e.g., 10s_1p/, 5s_10p/, 5s_5p/)
  • main/ - Primary sample set with standard configurations

File Naming Convention

Each file follows this naming pattern: {system_type}_{spatial_dims}_{original_name}.nc

Filename components:

  • spatial_dims - spatial map dimensions (e.g., 15x15 = 15×15 spatial points)
  • Acquisition parameters in original name:
    • Xs (e.g., 10s, 5s) - acquisition time in seconds
    • Xp (e.g., 1p, 5p) - laser power percentage (1%, 5%, 10%)
    • Xacc (e.g., 1acc, 2acc) - number of accumulations
    • 100x - integration factor (e.g., 100× objective)

Example: 6H_spectra_20250423_15x15_10s_1p_2.nc = 6H sample, 15×15 spatial points, 10s integration, 1% laser power, 2nd acquisition file

Data Format & Dimensions

Files are stored in NetCDF4 format with the following structure:

Coordinates:

  • X_0, X_1 - spatial coordinates
  • X_2 (optional) - depth coordinate for depth-profiling maps
  • wave_number - Raman shift in cm^(-1)

Data variable:

  • __xarray_dataarray_variable__ - Raman intensity counts at each spatial and spectral point

Each file contains a spatially-resolved Raman spectrum map, allowing analysis of spectral variations across the sample surface.

Data Processing

Raw data from spectroscopy measurements (stored as .txt files with coordinates, wave numbers, and counts) was parsed and converted to NetCDF4 format using a spatial binning approach. This enables efficient multi-dimensional analysis with xarray.

Usage

Download the folder content into data/parsed_data_spatial/SiC-high-f to train your model. Load files using standard tools:

import xarray as xr

# Load a dataset
ds = xr.load_dataarray('6H_spectra_20250423_15x15_5s_5p_1.nc')

# Access coordinates and data
print(ds.dims)           # {'X_0': 15, 'X_1': 15, 'wave_number': 1800}
print(ds.wave_number)    # Raman shift values in cm^(-1)
print(ds)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Paper for dpoteryayev/SiC-high-f