Get trending papers in your email inbox once a day!
Get trending papers in your email inbox!
SubscribeConsistent Direct Time-of-Flight Video Depth Super-Resolution
Direct time-of-flight (dToF) sensors are promising for next-generation on-device 3D sensing. However, limited by manufacturing capabilities in a compact module, the dToF data has a low spatial resolution (e.g., sim 20times30 for iPhone dToF), and it requires a super-resolution step before being passed to downstream tasks. In this paper, we solve this super-resolution problem by fusing the low-resolution dToF data with the corresponding high-resolution RGB guidance. Unlike the conventional RGB-guided depth enhancement approaches, which perform the fusion in a per-frame manner, we propose the first multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the low-resolution dToF imaging. In addition, dToF sensors provide unique depth histogram information for each local patch, and we incorporate this dToF-specific feature in our network design to further alleviate spatial ambiguity. To evaluate our models on complex dynamic indoor environments and to provide a large-scale dToF sensor dataset, we introduce DyDToF, the first synthetic RGB-dToF video dataset that features dynamic objects and a realistic dToF simulator following the physical imaging process. We believe the methods and dataset are beneficial to a broad community as dToF depth sensing is becoming mainstream on mobile devices. Our code and data are publicly available: https://github.com/facebookresearch/DVSR/
SVDC: Consistent Direct Time-of-Flight Video Depth Completion with Frequency Selective Fusion
Lightweight direct Time-of-Flight (dToF) sensors are ideal for 3D sensing on mobile devices. However, due to the manufacturing constraints of compact devices and the inherent physical principles of imaging, dToF depth maps are sparse and noisy. In this paper, we propose a novel video depth completion method, called SVDC, by fusing the sparse dToF data with the corresponding RGB guidance. Our method employs a multi-frame fusion scheme to mitigate the spatial ambiguity resulting from the sparse dToF imaging. Misalignment between consecutive frames during multi-frame fusion could cause blending between object edges and the background, which results in a loss of detail. To address this, we introduce an adaptive frequency selective fusion (AFSF) module, which automatically selects convolution kernel sizes to fuse multi-frame features. Our AFSF utilizes a channel-spatial enhancement attention (CSEA) module to enhance features and generates an attention map as fusion weights. The AFSF ensures edge detail recovery while suppressing high-frequency noise in smooth regions. To further enhance temporal consistency, We propose a cross-window consistency loss to ensure consistent predictions across different windows, effectively reducing flickering. Our proposed SVDC achieves optimal accuracy and consistency on the TartanAir and Dynamic Replica datasets. Code is available at https://github.com/Lan1eve/SVDC.
Near Field iToF LIDAR Depth Improvement from Limited Number of Shots
Indirect Time of Flight LiDARs can indirectly calculate the scene's depth from the phase shift angle between transmitted and received laser signals with amplitudes modulated at a predefined frequency. Unfortunately, this method generates ambiguity in calculated depth when the phase shift angle value exceeds 2pi. Current state-of-the-art methods use raw samples generated using two distinct modulation frequencies to overcome this ambiguity problem. However, this comes at the cost of increasing laser components' stress and raising their temperature, which reduces their lifetime and increases power consumption. In our work, we study two different methods to recover the entire depth range of the LiDAR using fewer raw data sample shots from a single modulation frequency with the support of sensor's gray scale output to reduce the laser components' stress and power consumption.
Multi-Modal Neural Radiance Field for Monocular Dense SLAM with a Light-Weight ToF Sensor
Light-weight time-of-flight (ToF) depth sensors are compact and cost-efficient, and thus widely used on mobile devices for tasks such as autofocus and obstacle detection. However, due to the sparse and noisy depth measurements, these sensors have rarely been considered for dense geometry reconstruction. In this work, we present the first dense SLAM system with a monocular camera and a light-weight ToF sensor. Specifically, we propose a multi-modal implicit scene representation that supports rendering both the signals from the RGB camera and light-weight ToF sensor which drives the optimization by comparing with the raw sensor inputs. Moreover, in order to guarantee successful pose tracking and reconstruction, we exploit a predicted depth as an intermediate supervision and develop a coarse-to-fine optimization strategy for efficient learning of the implicit representation. At last, the temporal information is explicitly exploited to deal with the noisy signals from light-weight ToF sensors to improve the accuracy and robustness of the system. Experiments demonstrate that our system well exploits the signals of light-weight ToF sensors and achieves competitive results both on camera tracking and dense scene reconstruction. Project page: https://zju3dv.github.io/tof_slam/.
Direct Frequency-Mode-Stable Laser Amplification at Terahertz Burst Rates
Generation of high-fidelity amplified pulse bursts with a regular interpulse interval yields, in the spectral domain, an equidistant pattern of narrowband spectral modes, similar to frequency combs produced by cw mode-locked lasers, but with greatly increased pulse energy. Despite their great potential for nonlinear spectroscopy, material processing, etc., such long frequency-stable bursts are difficult to generate and amplify because of prominent temporal intensity modulation even after strong dispersive pulse stretching. This study presents a burst generation method based on a master-oscillator regenerative-amplifier system that allows for chirped-pulse amplification (CPA) with high scalability in pulse number. A gradual smoothing of temporal intensity profiles at an increasing number of pulses is discovered, demonstrating an unexpected recovery of the CPA performance at terahertz (THz) intraburst repetition rates. In consequence, a self-referenced stable burst spectral peak structure with megahertz (MHz) peak width is generated, without risk of amplifier damage caused by interference of chirped pulses. This result eliminates limitations in burst amplification and paves the way for advancements in ultrashort-pulse burst technology, particularly for its use in nonlinear optical applications.
Analytical simulations of the resonant transmission of electrons in a closed nanocircuit for terahertz applications where a tunneling junction is shunted by a metallic nanowire
Earlier, in the CINT program at Los Alamos National Laboratory, we focused ultrafast mode-locked lasers on the tip-sample junction of a scanning tunneling microscope to generate currents at hundreds of harmonics of the laser pulse repetition frequency. Each harmonic has a signal-to-noise ratio of 20 dB with a 10-dB linewidth of only 3 Hz. Now we model closed quantum nanocircuits with rectangular, triangular, or delta-function barrier, shunted by a beryllium filament for quasi-coherent electron transport over mean-free paths as great as 68 nm. The time-independent Schrödinger equation is solved with the boundary conditions that the wavefunction and its derivative are continuous at both connections. These four boundary conditions are used to form a four-by-four complex matrix equation with only zeros in the right-hand column vector which is required to have a non-trivial solution with each of the closed nanocircuits. Each model has four parameters: (1) the barrier length, (2) the height and shape of the barrier, (3) the length of the pre-barrier, and (4) the electron energy. Any three of these may be specified and then the fourth is varied to bring the determinant to zero to find the solutions on lines or surfaces in the space defined by the four parameters. First, we use a simplistic model having a rectangular barrier. The second model has a triangular barrier as a first approximation to field emission, and we are considering applying this approach for a self-contained nanoscale extension of our earlier effort to generate the harmonics at Los Alamos. The third model has a delta-function barrier, and the fourth model is an extension of the first one where the width of the rectangular barrier is varied inversely with its height.
Minimal evolution times for fast, pulse-based state preparation in silicon spin qubits
Standing as one of the most significant barriers to reaching quantum advantage, state-preparation fidelities on noisy intermediate-scale quantum processors suffer from quantum-gate errors, which accumulate over time. A potential remedy is pulse-based state preparation. We numerically investigate the minimal evolution times (METs) attainable by optimizing (microwave and exchange) pulses on silicon hardware. We investigate two state preparation tasks. First, we consider the preparation of molecular ground states and find the METs for H_2, HeH^+, and LiH to be 2.4 ns, 4.4 ns, and 27.2 ns, respectively. Second, we consider transitions between arbitrary states and find the METs for transitions between arbitrary four-qubit states to be below 50 ns. For comparison, connecting arbitrary two-qubit states via one- and two-qubit gates on the same silicon processor requires approximately 200 ns. This comparison indicates that pulse-based state preparation is likely to utilize the coherence times of silicon hardware more efficiently than gate-based state preparation. Finally, we quantify the effect of silicon device parameters on the MET. We show that increasing the maximal exchange amplitude from 10 MHz to 1 GHz accelerates the METs, e.g., for H_2 from 84.3 ns to 2.4 ns. This demonstrates the importance of fast exchange. We also show that increasing the maximal amplitude of the microwave drive from 884 kHz to 56.6 MHz shortens state transitions, e.g., for two-qubit states from 1000 ns to 25 ns. Our results bound both the state-preparation times for general quantum algorithms and the execution times of variational quantum algorithms with silicon spin qubits.
Embedded Pilot-Aided Channel Estimation for OTFS in Delay-Doppler Channels
Orthogonal time frequency space (OTFS) modulation was shown to provide significant error performance advantages over orthogonal frequency division multiplexing (OFDM) in delay--Doppler channels. In order to detect OTFS modulated data, the channel impulse response needs to be known at the receiver. In this paper, we propose embedded pilot-aided channel estimation schemes for OTFS. In each OTFS frame, we arrange pilot, guard, and data symbols in the delay--Doppler plane to suitably avoid interference between pilot and data symbols at the receiver. We develop such symbol arrangements for OTFS over multipath channels with integer and fractional Doppler shifts, respectively. At the receiver, channel estimation is performed based on a threshold method and the estimated channel information is used for data detection via a message passing (MP) algorithm. Thanks to our specific embedded symbol arrangements, both channel estimation and data detection are performed within the same OTFS frame with a minimum overhead. We compare by simulations the error performance of OTFS using the proposed channel estimation and OTFS with ideally known channel information and observe only a marginal performance loss. We also demonstrate that the proposed channel estimation in OTFS significantly outperforms OFDM with known channel information. Finally, we present extensions of the proposed schemes to MIMO and multi-user uplink/downlink.
PeriodWave: Multi-Period Flow Matching for High-Fidelity Waveform Generation
Recently, universal waveform generation tasks have been investigated conditioned on various out-of-distribution scenarios. Although GAN-based methods have shown their strength in fast waveform generation, they are vulnerable to train-inference mismatch scenarios such as two-stage text-to-speech. Meanwhile, diffusion-based models have shown their powerful generative performance in other domains; however, they stay out of the limelight due to slow inference speed in waveform generation tasks. Above all, there is no generator architecture that can explicitly disentangle the natural periodic features of high-resolution waveform signals. In this paper, we propose PeriodWave, a novel universal waveform generation model. First, we introduce a period-aware flow matching estimator that can capture the periodic features of the waveform signal when estimating the vector fields. Additionally, we utilize a multi-period estimator that avoids overlaps to capture different periodic features of waveform signals. Although increasing the number of periods can improve the performance significantly, this requires more computational costs. To reduce this issue, we also propose a single period-conditional universal estimator that can feed-forward parallel by period-wise batch inference. Additionally, we utilize discrete wavelet transform to losslessly disentangle the frequency information of waveform signals for high-frequency modeling, and introduce FreeU to reduce the high-frequency noise for waveform generation. The experimental results demonstrated that our model outperforms the previous models both in Mel-spectrogram reconstruction and text-to-speech tasks. All source code will be available at https://github.com/sh-lee-prml/PeriodWave.
Harnessing Selective State Space Models to Enhance Semianalytical Design of Fabrication-Ready Multilayered Huygens' Metasurfaces: Part II - Generative Inverse Design (MetaMamba)
We present a generative framework for inverse design of five-layer transmissive Huygens' metasurfaces (HMSs), addressing a longstanding challenge in achieving full-phase, high-efficiency unit cell designs with minimal full-wave simulations. The key to achieving this is our reliance on the field-based semianalytical (SA) scheme developed in Part I of this paper, which allows rapid and highly effective synthesis of such multilayer composites, however with limited accuracy. To overcome the prohibitive data demands of traditional pipelines, we employ Mamba, a selective state space model well suited for long-range sequence modeling as the backbone of our learning framework. A bidirectional Mamba (Bi-Mamba) forward surrogate is first trained on SA-generated data and subsequently fine-tuned with full-wave CST samples. An ablation over a 1080-sample CST pool shows that as few as 270 full-wave calibration samples suffice to reach near-CST-level agreement at a fraction of the simulation cost. An autoregressive Mamba inverse generator is subsequently trained on surrogate-augmented data, treating unit-cell synthesis as a sequential generation task. The resulting one-to-many generative model produces diverse unit cell geometries conditioned on target scattering responses. It achieves CST-validated designs with field transmission magnitude 0.9 across the full 0-2π phase range at 20 GHz. Moreover, a CST-calibrated surrogate trained to accurately predict frequency responses (18-22 GHz) enables functional post-selection of inverse generated designs. Together, the hybrid SA-generative methodology in this two-part compilation establishes a scalable and data-efficient solution for multilayer HMS synthesis, with natural extensions toward broadband, oblique-incidence, and higher-dimensional electromagnetic inverse-design problems.
EvidenceMoE: A Physics-Guided Mixture-of-Experts with Evidential Critics for Advancing Fluorescence Light Detection and Ranging in Scattering Media
Fluorescence LiDAR (FLiDAR), a Light Detection and Ranging (LiDAR) technology employed for distance and depth estimation across medical, automotive, and other fields, encounters significant computational challenges in scattering media. The complex nature of the acquired FLiDAR signal, particularly in such environments, makes isolating photon time-of-flight (related to target depth) and intrinsic fluorescence lifetime exceptionally difficult, thus limiting the effectiveness of current analytical and computational methodologies. To overcome this limitation, we present a Physics-Guided Mixture-of-Experts (MoE) framework tailored for specialized modeling of diverse temporal components. In contrast to the conventional MoE approaches our expert models are informed by underlying physics, such as the radiative transport equation governing photon propagation in scattering media. Central to our approach is EvidenceMoE, which integrates Evidence-Based Dirichlet Critics (EDCs). These critic models assess the reliability of each expert's output by providing per-expert quality scores and corrective feedback. A Decider Network then leverages this information to fuse expert predictions into a robust final estimate adaptively. We validate our method using realistically simulated Fluorescence LiDAR (FLiDAR) data for non-invasive cancer cell depth detection generated from photon transport models in tissue. Our framework demonstrates strong performance, achieving a normalized root mean squared error (NRMSE) of 0.030 for depth estimation and 0.074 for fluorescence lifetime.
Empirical Modeling of Variance in Medium Frequency R-Mode Time-of-Arrival Measurements
The R-Mode system, an advanced terrestrial integrated navigation system, is designed to address the vulnerabilities of global navigation satellite systems (GNSS) and explore the potential of a complementary navigation system. This study aims to enhance the accuracy of performance simulation for the medium frequency (MF) R-Mode system by modeling the variance of time-of-arrival (TOA) measurements based on actual data. Drawing inspiration from the method used to calculate the standard deviation of time-of-reception (TOR) measurements in Loran, we adapted and applied this approach to the MF R-Mode system. Data were collected from transmitters in Palmi and Chungju, South Korea, and the parameters for modeling the variance of TOA were estimated.
PulseDL-II: A System-on-Chip Neural Network Accelerator for Timing and Energy Extraction of Nuclear Detector Signals
Front-end electronics equipped with high-speed digitizers are being used and proposed for future nuclear detectors. Recent literature reveals that deep learning models, especially one-dimensional convolutional neural networks, are promising when dealing with digital signals from nuclear detectors. Simulations and experiments demonstrate the satisfactory accuracy and additional benefits of neural networks in this area. However, specific hardware accelerating such models for online operations still needs to be studied. In this work, we introduce PulseDL-II, a system-on-chip (SoC) specially designed for applications of event feature (time, energy, etc.) extraction from pulses with deep learning. Based on the previous version, PulseDL-II incorporates a RISC CPU into the system structure for better functional flexibility and integrity. The neural network accelerator in the SoC adopts a three-level (arithmetic unit, processing element, neural network) hierarchical architecture and facilitates parameter optimization of the digital design. Furthermore, we devise a quantization scheme compatible with deep learning frameworks (e.g., TensorFlow) within a selected subset of layer types. We validate the correct operations of PulseDL-II on field programmable gate arrays (FPGA) alone and with an experimental setup comprising a direct digital synthesis (DDS) and analog-to-digital converters (ADC). The proposed system achieved 60 ps time resolution and 0.40% energy resolution at signal to noise ratio (SNR) of 47.4 dB.
nnAudio: An on-the-fly GPU Audio to Spectrogram Conversion Toolbox Using 1D Convolution Neural Networks
Converting time domain waveforms to frequency domain spectrograms is typically considered to be a prepossessing step done before model training. This approach, however, has several drawbacks. First, it takes a lot of hard disk space to store different frequency domain representations. This is especially true during the model development and tuning process, when exploring various types of spectrograms for optimal performance. Second, if another dataset is used, one must process all the audio clips again before the network can be retrained. In this paper, we integrate the time domain to frequency domain conversion as part of the model structure, and propose a neural network based toolbox, nnAudio, which leverages 1D convolutional neural networks to perform time domain to frequency domain conversion during feed-forward. It allows on-the-fly spectrogram generation without the need to store any spectrograms on the disk. This approach also allows back-propagation on the waveforms-to-spectrograms transformation layer, which implies that this transformation process can be made trainable, and hence further optimized by gradient descent. nnAudio reduces the waveforms-to-spectrograms conversion time for 1,770 waveforms (from the MAPS dataset) from 10.64 seconds with librosa to only 0.001 seconds for Short-Time Fourier Transform (STFT), 18.3 seconds to 0.015 seconds for Mel spectrogram, 103.4 seconds to 0.258 for constant-Q transform (CQT), when using GPU on our DGX work station with CPU: Intel(R) Xeon(R) CPU E5-2698 v4 @ 2.20GHz Tesla v100 32Gb GPUs. (Only 1 GPU is being used for all the experiments.) We also further optimize the existing CQT algorithm, so that the CQT spectrogram can be obtained without aliasing in a much faster computation time (from 0.258 seconds to only 0.001 seconds).
HoloBeam: Learning Optimal Beamforming in Far-Field Holographic Metasurface Transceivers
Holographic Metasurface Transceivers (HMTs) are emerging as cost-effective substitutes to large antenna arrays for beamforming in Millimeter and TeraHertz wave communication. However, to achieve desired channel gains through beamforming in HMT, phase-shifts of a large number of elements need to be appropriately set, which is challenging. Also, these optimal phase-shifts depend on the location of the receivers, which could be unknown. In this work, we develop a learning algorithm using a {\it fixed-budget multi-armed bandit framework} to beamform and maximize received signal strength at the receiver for far-field regions. Our algorithm, named \Algo exploits the parametric form of channel gains of the beams, which can be expressed in terms of two {\it phase-shifting parameters}. Even after parameterization, the problem is still challenging as phase-shifting parameters take continuous values. To overcome this, {\it\HB} works with the discrete values of phase-shifting parameters and exploits their unimodal relations with channel gains to learn the optimal values faster. We upper bound the probability of {\it\HB} incorrectly identifying the (discrete) optimal phase-shift parameters in terms of the number of pilots used in learning. We show that this probability decays exponentially with the number of pilot signals. We demonstrate that {\it\HB} outperforms state-of-the-art algorithms through extensive simulations.
PlatoNeRF: 3D Reconstruction in Plato's Cave via Single-View Two-Bounce Lidar
3D reconstruction from a single-view is challenging because of the ambiguity from monocular cues and lack of information about occluded regions. Neural radiance fields (NeRF), while popular for view synthesis and 3D reconstruction, are typically reliant on multi-view images. Existing methods for single-view 3D reconstruction with NeRF rely on either data priors to hallucinate views of occluded regions, which may not be physically accurate, or shadows observed by RGB cameras, which are difficult to detect in ambient light and low albedo backgrounds. We propose using time-of-flight data captured by a single-photon avalanche diode to overcome these limitations. Our method models two-bounce optical paths with NeRF, using lidar transient data for supervision. By leveraging the advantages of both NeRF and two-bounce light measured by lidar, we demonstrate that we can reconstruct visible and occluded geometry without data priors or reliance on controlled ambient lighting or scene albedo. In addition, we demonstrate improved generalization under practical constraints on sensor spatial- and temporal-resolution. We believe our method is a promising direction as single-photon lidars become ubiquitous on consumer devices, such as phones, tablets, and headsets.
Consistent Time-of-Flight Depth Denoising via Graph-Informed Geometric Attention
Depth images captured by Time-of-Flight (ToF) sensors are prone to noise, requiring denoising for reliable downstream applications. Previous works either focus on single-frame processing, or perform multi-frame processing without considering depth variations at corresponding pixels across frames, leading to undesirable temporal inconsistency and spatial ambiguity. In this paper, we propose a novel ToF depth denoising network leveraging motion-invariant graph fusion to simultaneously enhance temporal stability and spatial sharpness. Specifically, despite depth shifts across frames, graph structures exhibit temporal self-similarity, enabling cross-frame geometric attention for graph fusion. Then, by incorporating an image smoothness prior on the fused graph and data fidelity term derived from ToF noise distribution, we formulate a maximum a posterior problem for ToF denoising. Finally, the solution is unrolled into iterative filters whose weights are adaptively learned from the graph-informed geometric attention, producing a high-performance yet interpretable network. Experimental results demonstrate that the proposed scheme achieves state-of-the-art performance in terms of accuracy and consistency on synthetic DVToF dataset and exhibits robust generalization on the real Kinectv2 dataset. Source code will be released at https://github.com/davidweidawang/GIGA-ToF{https://github.com/davidweidawang/GIGA-ToF}.
mini-TimeCube as a Neutron Scatter Camera
We present Monte Carlo (MC) simulation results from a study of a compact plastic-scintillator detector suitable for imaging fast neutrons in the 1 -- 10 MeV energy range: the miniTimeCube (mTC). Originally designed for antineutrino detection, the mTC consists of 24 MultiChannel Plate (MCP) photodetectors surrounding a 13 cm cube of boron-doped plastic scintillator. Our simulation results show that waveform digitization of 1536 optically sensitive channels surrounding the scintillator should allow for spatiotemporal determination of individual neutron-proton scatters in the detector volume to thicksim100 picoseconds and thicksim5 mm. A Bayesian estimation framework is presented for multiple-scatter reconstruction, and is used to estimate the incoming direction and energy of simulated individual neutrons. Finally, we show how populations of reconstructed neutrons can be used to estimate the direction and energy spectrum of nearby simulated neutron sources.
VoXtream2: Full-stream TTS with dynamic speaking rate control
Full-stream text-to-speech (TTS) for interactive systems must start speaking with minimal delay while remaining controllable as text arrives incrementally. We present VoXtream2, a zero-shot full-stream TTS model with dynamic speaking-rate control that can be updated mid-utterance on the fly. VoXtream2 combines a distribution matching mechanism over duration states with classifier-free guidance across conditioning signals to improve controllability and synthesis quality. Prompt-text masking enables textless audio prompting, removing the need for prompt transcription. Across standard zero-shot benchmarks and a dedicated speaking-rate test set, VoXtream2 achieves competitive objective and subjective results against public baselines despite a smaller model and less training data. In full-stream mode, it runs 4 times faster than real time with 74 ms first-packet latency on a consumer GPU.
Taking ROCKET on an Efficiency Mission: Multivariate Time Series Classification with LightWaveS
Nowadays, with the rising number of sensors in sectors such as healthcare and industry, the problem of multivariate time series classification (MTSC) is getting increasingly relevant and is a prime target for machine and deep learning approaches. Their expanding adoption in real-world environments is causing a shift in focus from the pursuit of ever-higher prediction accuracy with complex models towards practical, deployable solutions that balance accuracy and parameters such as prediction speed. An MTSC model that has attracted attention recently is ROCKET, based on random convolutional kernels, both because of its very fast training process and its state-of-the-art accuracy. However, the large number of features it utilizes may be detrimental to inference time. Examining its theoretical background and limitations enables us to address potential drawbacks and present LightWaveS: a framework for accurate MTSC, which is fast both during training and inference. Specifically, utilizing wavelet scattering transformation and distributed feature selection, we manage to create a solution that employs just 2.5% of the ROCKET features, while achieving accuracy comparable to recent MTSC models. LightWaveS also scales well across multiple compute nodes and with the number of input channels during training. In addition, it can significantly reduce the input size and provide insight to an MTSC problem by keeping only the most useful channels. We present three versions of our algorithm and their results on distributed training time and scalability, accuracy, and inference speedup. We show that we achieve speedup ranging from 9x to 53x compared to ROCKET during inference on an edge device, on datasets with comparable accuracy.
A Two-Dimensional Deep Network for RF-based Drone Detection and Identification Towards Secure Coverage Extension
As drones become increasingly prevalent in human life, they also raises security concerns such as unauthorized access and control, as well as collisions and interference with manned aircraft. Therefore, ensuring the ability to accurately detect and identify between different drones holds significant implications for coverage extension. Assisted by machine learning, radio frequency (RF) detection can recognize the type and flight mode of drones based on the sampled drone signals. In this paper, we first utilize Short-Time Fourier. Transform (STFT) to extract two-dimensional features from the raw signals, which contain both time-domain and frequency-domain information. Then, we employ a Convolutional Neural Network (CNN) built with ResNet structure to achieve multi-class classifications. Our experimental results show that the proposed ResNet-STFT can achieve higher accuracy and faster convergence on the extended dataset. Additionally, it exhibits balanced performance compared to other baselines on the raw dataset.
Fast Muon Tracking with Machine Learning Implemented in FPGA
In this work, we present a new approach for fast tracking on multiwire proportional chambers with neural networks. The tracking networks are developed and adapted for the first-level trigger at hadron collider experiments. We use Monte Carlo samples generated by Geant4 with a custom muon chamber, which resembles part of the thin gap chambers from the ATLAS experiment, for training and performance evaluations. The chamber has a total of seven gas gaps, where the first and last gas gaps are displaced by ~1.5 m. Each gas gap has 50 channels with a size of 18-20 mm. Two neural network models are developed and presented: a convolutional neural network and a neural network optimized for the detector configuration of this study. In the latter network, a convolution layer is provided for each of three groups formed from 2-3 gas gaps of the chamber, and the outputs are fed into multilayer perceptrons in sequence. Both networks are transformed into hardware description language and implemented in Virtex UltraScale+ FPGA. The angular resolution is 2 mrad, which is comparable to the maximum resolution of the detector estimated by the minimum chi2 method. The latency achieved by the implemented firmware is less than 100 ns, and the throughput rate is 160 MHz.
Multiple-photon disambiguation on stripline-anode Micro-Channel Plates
Large-Area Picosecond Photo-Detectors (LAPPDs) show great potential for expanding the performance envelope of Micro-Channel Plates (MCPs) to areas of up to 20 x 20 cm and larger. Such scaling introduces new challenges, including how to meet the electronics readout burden of ever larger area MCPs. One solution is to replace the traditional grid anode used for readout with a microwave stripline anode, thus allowing the channel count to scale with MCP width rather than area. However, stripline anodes introduce new issues not commonly dealt with in grid-anodes, especially as their length increases. One of these issues is the near simultaneous arrival of multiple photons on the detector, creating possible confusion about how to reconstruct their arrival times and positions. We propose a maximum a posteriori solution to the problem and verify its performance in simulated scintillator and water-Cherenkov detectors.
Deep Synoptic Array Science: Searching for Long Duration Radio Transients with the DSA-110
We describe the design and commissioning tests for the DSA-110 Not-So-Fast Radio Burst (NSFRB) search pipeline, a 1.4 GHz image-plane single-pulse search sensitive to 134 ms-160.8 s radio bursts. Extending the pulse width range of the Fast Radio Burst (FRB) search by 3 orders of magnitude, the NSFRB search is sensitive to the recently-discovered Galactic Long Period Radio Transients (LPRTs). The NSFRB search operates in real-time, utilizing a custom GPU-accelerated search code, cerberus, implemented in Python with JAX. We summarize successful commissioning sensitivity tests with continuum sources and pulsar B0329+54, estimating the 6sigma flux (fluence) threshold to be ~290 mJy (~40 Jy ms). Future tests of recovery of longer timescale transients, e.g. CHIME J1634+44, are planned to supplement injection testing and B0329+54 observations. An offline DSA-110 NSFRB Galactic Plane Survey was conducted to search for LPRTs, covering -3.5^circ<b<5.7^circ and 141^circ<l<225^circ (~770 square degrees) in Galactic coordinates. We estimate an upper limit Poissonian burst rate ~1 hr^{-1} per square degree (~7 hr^{-1} per 3^circtimes3^circ survey grid cell) maximized across the inner |b|<0.25^circ of the surveyed region. By imposing the ~290 mJy flux limit on two representative models (the magnetar plastic flow model and the White Dwarf-M Dwarf binary model), we reject with 95% confidence the presence of White Dwarf-M Dwarf binary LPRTs with periods between ~10-70s within ~95% of the surveyed region. Combined with the prevalence of LPRTs in the Galactic Plane, our results motivate further consideration of both White Dwarf-M Dwarf binary models and isolated magnetar models. We will continue to explore novel LPRT search strategies during real-time operations, such as triggered periodicity searches and additional targeted surveys.
Learnable Adaptive Time-Frequency Representation via Differentiable Short-Time Fourier Transform
The short-time Fourier transform (STFT) is widely used for analyzing non-stationary signals. However, its performance is highly sensitive to its parameters, and manual or heuristic tuning often yields suboptimal results. To overcome this limitation, we propose a unified differentiable formulation of the STFT that enables gradient-based optimization of its parameters. This approach addresses the limitations of traditional STFT parameter tuning methods, which often rely on computationally intensive discrete searches. It enables fine-tuning of the time-frequency representation (TFR) based on any desired criterion. Moreover, our approach integrates seamlessly with neural networks, allowing joint optimization of the STFT parameters and network weights. The efficacy of the proposed differentiable STFT in enhancing TFRs and improving performance in downstream tasks is demonstrated through experiments on both simulated and real-world data.
Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction
Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller. This paper proposes a generative adversarial network (GAN) based BWE model with parallel prediction of Amplitude and Phase spectra, named AP-BWE, which achieves both high-quality and efficient wideband speech waveform generation. The proposed AP-BWE generator is entirely based on convolutional neural networks (CNNs). It features a dual-stream architecture with mutual interaction, where the amplitude stream and the phase stream communicate with each other and respectively extend the high-frequency components from the input narrowband amplitude and phase spectra. To improve the naturalness of the extended speech signals, we employ a multi-period discriminator at the waveform level and design a pair of multi-resolution amplitude and phase discriminators at the spectral level, respectively. Experimental results demonstrate that our proposed AP-BWE achieves state-of-the-art performance in terms of speech quality for BWE tasks targeting sampling rates of both 16 kHz and 48 kHz. In terms of generation efficiency, due to the all-convolutional architecture and all-frame-level operations, the proposed AP-BWE can generate 48 kHz waveform samples 292.3 times faster than real-time on a single RTX 4090 GPU and 18.1 times faster than real-time on a single CPU. Notably, to our knowledge, AP-BWE is the first to achieve the direct extension of the high-frequency phase spectrum, which is beneficial for improving the effectiveness of existing BWE methods.
TempoRL: laser pulse temporal shape optimization with Deep Reinforcement Learning
High Power Laser's (HPL) optimal performance is essential for the success of a wide variety of experimental tasks related to light-matter interactions. Traditionally, HPL parameters are optimised in an automated fashion relying on black-box numerical methods. However, these can be demanding in terms of computational resources and usually disregard transient and complex dynamics. Model-free Deep Reinforcement Learning (DRL) offers a promising alternative framework for optimising HPL performance since it allows to tune the control parameters as a function of system states subject to nonlinear temporal dynamics without requiring an explicit dynamics model of those. Furthermore, DRL aims to find an optimal control policy rather than a static parameter configuration, particularly suitable for dynamic processes involving sequential decision-making. This is particularly relevant as laser systems are typically characterised by dynamic rather than static traits. Hence the need for a strategy to choose the control applied based on the current context instead of one single optimal control configuration. This paper investigates the potential of DRL in improving the efficiency and safety of HPL control systems. We apply this technique to optimise the temporal profile of laser pulses in the L1 pump laser hosted at the ELI Beamlines facility. We show how to adapt DRL to the setting of spectral phase control by solely tuning dispersion coefficients of the spectral phase and reaching pulses similar to transform limited with full-width at half-maximum (FWHM) of ca1.6 ps.
RF-ULM: Deep Learning for Radio-Frequency Ultrasound Localization Microscopy
In Ultrasound Localization Microscopy (ULM),achieving high-resolution images relies on the precise localization of contrast agent particles across consecutive beam-formed frames. However, our study uncovers an enormous potential: The process of delay-and-sum beamforming leads to an irreversible reduction of Radio-Frequency (RF) data, while its implications for localization remain largely unexplored. The rich contextual information embedded within RF wavefronts, including their hyperbolic shape and phase, offers great promise for guiding Deep Neural Networks (DNNs) in challenging localization scenarios. To fully exploit this data, we propose to directly localize scatterers in RF signals. Our approach involves a custom super-resolution DNN using learned feature channel shuffling and a novel semi-global convolutional sampling block tailored for reliable and accurate wavefront localization. Additionally, we introduce a geometric point transformation that facilitates seamless mapping between RF and B-mode coordinate space. To understand the impact of beamforming on ULM, we validate the effectiveness of our method by conducting an extensive comparison with State-Of-The-Art (SOTA) techniques. We present the inaugural in vivo results from an RF-trained DNN, highlighting its real-world practicality. Our findings show that RF-ULM bridges the domain gap between synthetic and real datasets, offering a considerable advantage in terms of precision and complexity. To enable the broader research community to benefit from our findings, our code and the associated SOTA methods are made available at https://github.com/hahnec/rf-ulm.
Parallelizing Optical Flow Estimation on an Ultra-Low Power RISC-V Cluster for Nano-UAV Navigation
Optical flow estimation is crucial for autonomous navigation and localization of unmanned aerial vehicles (UAV). On micro and nano UAVs, real-time calculation of the optical flow is run on low power and resource-constrained microcontroller units (MCUs). Thus, lightweight algorithms for optical flow have been proposed targeting real-time execution on traditional single-core MCUs. This paper introduces an efficient parallelization strategy for optical flow computation targeting new-generation multicore low power RISC-V based microcontroller units. Our approach enables higher frame rates at lower clock speeds. It has been implemented and evaluated on the eight-core cluster of a commercial octa-core MCU (GAP8) reaching a parallelization speedup factor of 7.21 allowing for a frame rate of 500 frames per second when running on a 50 MHz clock frequency. The proposed parallel algorithm significantly boosts the camera frame rate on micro unmanned aerial vehicles, which enables higher flight speeds: the maximum flight speed can be doubled, while using less than a third of the clock frequency of previous single-core implementations.
FISHER: A Foundation Model for Multi-Modal Industrial Signal Comprehensive Representation
With the rapid deployment of SCADA systems, how to effectively analyze industrial signals and detect abnormal states is an urgent need for the industry. Due to the significant heterogeneity of these signals, which we summarize as the M5 problem, previous works only focus on small sub-problems and employ specialized models, failing to utilize the synergies between modalities and the powerful scaling law. However, we argue that the M5 signals can be modeled in a unified manner due to the intrinsic similarity. As a result, we propose FISHER, a Foundation model for multi-modal Industrial Signal compreHEnsive Representation. To support arbitrary sampling rates, FISHER considers the increment of sampling rate as the concatenation of sub-band information. Specifically, FISHER takes the STFT sub-band as the modeling unit and adopts a teacher student SSL framework for pre-training. We also develop the RMIS benchmark, which evaluates the representations of M5 industrial signals on multiple health management tasks. Compared with top SSL models, FISHER showcases versatile and outstanding capabilities with a general performance gain up to 5.03%, along with much more efficient scaling curves. We also investigate the scaling law on downstream tasks and derive potential avenues for future works. FISHER is now open-sourced on https://github.com/jianganbai/FISHER
KAN-powered large-target detection for automotive radar
This paper presents a novel radar signal detection pipeline focused on detecting large targets such as cars and SUVs. Traditional methods, such as Ordered-Statistic Constant False Alarm Rate (OS-CFAR), commonly used in automotive radar, are designed for point or isotropic target models. These may not adequately capture the Range-Doppler (RD) scattering patterns of larger targets, especially in high-resolution radar systems. Additional modules such as association and tracking are necessary to refine and consolidate the detections over multiple dwells. To address these limitations, we propose a detection technique based on the probability density function (pdf) of RD segments, leveraging the Kolmogorov-Arnold neural network (KAN) to learn the data and generate interpretable symbolic expressions for binary hypotheses. Beside the Monte-Carlo study showing better performance for the proposed KAN expression over OS-CFAR, it is shown to exhibit a probability of detection (PD) of 96% when transfer learned with field data. The false alarm rate (PFA) is comparable with OS-CFAR designed with PFA = 10^{-6}. Additionally, the study also examines impact of the number of pdf bins representing RD segment on performance of the KAN-based detection.
TiDy-PSFs: Computational Imaging with Time-Averaged Dynamic Point-Spread-Functions
Point-spread-function (PSF) engineering is a powerful computational imaging techniques wherein a custom phase mask is integrated into an optical system to encode additional information into captured images. Used in combination with deep learning, such systems now offer state-of-the-art performance at monocular depth estimation, extended depth-of-field imaging, lensless imaging, and other tasks. Inspired by recent advances in spatial light modulator (SLM) technology, this paper answers a natural question: Can one encode additional information and achieve superior performance by changing a phase mask dynamically over time? We first prove that the set of PSFs described by static phase masks is non-convex and that, as a result, time-averaged PSFs generated by dynamic phase masks are fundamentally more expressive. We then demonstrate, in simulation, that time-averaged dynamic (TiDy) phase masks can offer substantially improved monocular depth estimation and extended depth-of-field imaging performance.
Ghost-FWL: A Large-Scale Full-Waveform LiDAR Dataset for Ghost Detection and Removal
LiDAR has become an essential sensing modality in autonomous driving, robotics, and smart-city applications. However, ghost points (or ghosts), which are false reflections caused by multi-path laser returns from glass and reflective surfaces, severely degrade 3D mapping and localization accuracy. Prior ghost removal relies on geometric consistency in dense point clouds, failing on mobile LiDAR's sparse, dynamic data. We address this by exploiting full-waveform LiDAR (FWL), which captures complete temporal intensity profiles rather than just peak distances, providing crucial cues for distinguishing ghosts from genuine reflections in mobile scenarios. As this is a new task, we present Ghost-FWL, the first and largest annotated mobile FWL dataset for ghost detection and removal. Ghost-FWL comprises 24K frames across 10 diverse scenes with 7.5 billion peak-level annotations, which is 100x larger than existing annotated FWL datasets. Benefiting from this large-scale dataset, we establish a FWL-based baseline model for ghost detection and propose FWL-MAE, a masked autoencoder for efficient self-supervised representation learning on FWL data. Experiments show that our baseline outperforms existing methods in ghost removal accuracy, and our ghost removal further enhances downstream tasks such as LiDAR-based SLAM (66% trajectory error reduction) and 3D object detection (50x false positive reduction). The dataset and code is publicly available and can be accessed via the project page: https://keio-csg.github.io/Ghost-FWL
Spatial-temporal manipulations of visible nanosecond sub-pulse sequences in an actively Q-switched Pr:YLF laser
Pulsed visible lasers either by Q-switching or mode locking have been attracting intense attentions both in solid-state laser and fiber laser. Here, we report on the simultaneous manipulation of reconfigurable sub-pulse sequences and customizable high-order vortex beams in an actively Q-switched visible laser. On the one hand, pulse sequences with up to 4 sub-pulses could be generated and fully controlled by means of an acoustic-optic modulator driven by an arbitrary waveform generator. Both pulse number and pulse intensity can be manipulated through the programmable step-signal, which is also theoretically simulated through the rate equations. On the other hand, assisted by the off-axis pumping technique and the astigmatic mode conversion, the laser cavity could emit high-quality vortex beams carrying Laguerre-Gaussian modes up to 30th order. To the best of our knowledge, this is the most flexible active manipulations not only on the intensity distribution of the transverse modes but also on the temporal distribution of the pulse sequences in a visible laser. The versatile manipulating techniques in this work could be immediately implemented into all other solid-state lasers to obtain sub-pulse vortex beams, which may provide enhanced functionality and flexibility for a large range of laser systems.
iSTFTNet: Fast and Lightweight Mel-Spectrogram Vocoder Incorporating Inverse Short-Time Fourier Transform
In recent text-to-speech synthesis and voice conversion systems, a mel-spectrogram is commonly applied as an intermediate representation, and the necessity for a mel-spectrogram vocoder is increasing. A mel-spectrogram vocoder must solve three inverse problems: recovery of the original-scale magnitude spectrogram, phase reconstruction, and frequency-to-time conversion. A typical convolutional mel-spectrogram vocoder solves these problems jointly and implicitly using a convolutional neural network, including temporal upsampling layers, when directly calculating a raw waveform. Such an approach allows skipping redundant processes during waveform synthesis (e.g., the direct reconstruction of high-dimensional original-scale spectrograms). By contrast, the approach solves all problems in a black box and cannot effectively employ the time-frequency structures existing in a mel-spectrogram. We thus propose iSTFTNet, which replaces some output-side layers of the mel-spectrogram vocoder with the inverse short-time Fourier transform (iSTFT) after sufficiently reducing the frequency dimension using upsampling layers, reducing the computational cost from black-box modeling and avoiding redundant estimations of high-dimensional spectrograms. During our experiments, we applied our ideas to three HiFi-GAN variants and made the models faster and more lightweight with a reasonable speech quality. Audio samples are available at https://www.kecl.ntt.co.jp/people/kaneko.takuhiro/projects/istftnet/.
Patient-Adaptive Focused Transmit Beamforming using Cognitive Ultrasound
Focused transmit beamforming is the most commonly used acquisition scheme for echocardiograms, but suffers from relatively low frame rates, and in 3D, even lower volume rates. Fast imaging based on unfocused transmits has disadvantages such as motion decorrelation and limited harmonic imaging capabilities. This work introduces a patient-adaptive focused transmit scheme that has the ability to drastically reduce the number of transmits needed to produce a high-quality ultrasound image. The method relies on posterior sampling with a temporal diffusion model to perceive and reconstruct the anatomy based on partial observations, while subsequently taking an action to acquire the most informative transmits. This active perception modality outperforms random and equispaced subsampling on the 2D EchoNet-Dynamic dataset and a 3D Philips dataset, where we actively select focused elevation planes. Furthermore, we show it achieves better performance in terms of generalized contrast-to-noise ratio when compared to the same number of diverging waves transmits on three in-house echocardiograms. Additionally, we can estimate ejection fraction using only 2% of the total transmits and show that the method is robust to outlier patients. Finally, our method can be run in real-time on GPU accelerators from 2023. The code is publicly available at https://tue-bmd.github.io/ulsa/
Simulate Any Radar: Attribute-Controllable Radar Simulation via Waveform Parameter Embedding
We present SA-Radar (Simulate Any Radar), a radar simulation approach that enables controllable and efficient generation of radar cubes conditioned on customizable radar attributes. Unlike prior generative or physics-based simulators, SA-Radar integrates both paradigms through a waveform-parameterized attribute embedding. We design ICFAR-Net, a 3D U-Net conditioned on radar attributes encoded via waveform parameters, which captures signal variations induced by different radar configurations. This formulation bypasses the need for detailed radar hardware specifications and allows efficient simulation of range-azimuth-Doppler (RAD) tensors across diverse sensor settings. We further construct a mixed real-simulated dataset with attribute annotations to robustly train the network. Extensive evaluations on multiple downstream tasks-including 2D/3D object detection and radar semantic segmentation-demonstrate that SA-Radar's simulated data is both realistic and effective, consistently improving model performance when used standalone or in combination with real data. Our framework also supports simulation in novel sensor viewpoints and edited scenes, showcasing its potential as a general-purpose radar data engine for autonomous driving applications. Code and additional materials are available at https://zhuxing0.github.io/projects/SA-Radar.
Analytical sensitivity curves of the second-generation time-delay interferometry
Forthcoming space-based gravitational-wave (GW) detectors will employ second-generation time-delay interferometry (TDI) to suppress laser frequency noise and achieve the sensitivity required for GW detection. We introduce an inverse light-path operator P_{i_{1}i_{2}i_{3}ldots i_{n-1}i_{n}}, which enables simple representation of second-generation TDI combinations and a concise description of light propagation. Analytical expressions and high-accuracy approximate formulas are derived for the sky- and polarization-averaged response functions, noise power spectral densities (PSDs), and sensitivity curves of TDI Michelson, (alpha,beta,gamma), Monitor, Beacon, Relay, and Sagnac combinations, as well as their orthogonal A, E, T channels. Our results show that: (i) second-generation TDIs have the same sensitivities as their first-generation counterparts; (ii) the A, E, T sensitivities and the optimal sensitivity are independent of the TDI generation and specific combination; (iii) the A and E channels have equal averaged responses, noise PSDs, and sensitivities, while the T channel has much weaker response and sensitivity at low frequencies (2pi fL/clesssim3); (iv) except for the (alpha,beta,gamma) and zeta combinations and the T channel, all sensitivity curves exhibit a flat section in the range f_{n}<flesssim 1.5/(2pi L/c), where the noise-balance frequency f_{n} separates the proof-mass- and optical-path-dominated regimes, while the response-transition frequency sim 1.5/(2pi L/c) separates the response function's low- and high-frequency behaviors; (v) the averaged response, noise PSD, and sensitivity of zeta scales with those of the T channel. These analytical and approximate formulations provide useful benchmarks for instrument optimization and data-analysis studies for future space-based GW detectors.
Full-Duplex-Bench-v2: A Multi-Turn Evaluation Framework for Duplex Dialogue Systems with an Automated Examiner
While full-duplex speech agents enable natural, low-latency interaction by speaking and listening simultaneously, their consistency and task performance in multi-turn settings remain underexplored. We introduce Full-Duplex-Bench-v2 (FDB-v2), a streaming framework that integrates with an automated examiner that enforces staged goals under two pacing setups (Fast vs. Slow). FDB-v2 covers four task families: daily, correction, entity tracking, and safety. We report turn-taking fluency, multi-turn instruction following, and task-specific competence. The framework is extensible, supporting both commercial APIs and open source models. When we test full-duplex systems with FDB-v2, they often get confused when people talk at the same time, struggle to handle corrections smoothly, and sometimes lose track of who or what is being talked about. Through an open-sourced, standardized streaming protocol and a task set, FDB-v2 makes it easy to extend to new task families, allowing the community to tailor and accelerate evaluation of multi-turn full-duplex systems.
OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
Prostate cancer is one of the most common and lethal cancers among men, making its early detection critically important. Although ultrasound imaging offers greater accessibility and cost-effectiveness compared to MRI, traditional transrectal ultrasound methods suffer from low sensitivity, especially in detecting anteriorly located tumors. Ultrasound computed tomography provides quantitative tissue characterization, but its clinical implementation faces significant challenges, particularly under anatomically constrained limited-angle acquisition conditions specific to prostate imaging. To address these unmet needs, we introduce OpenPros, the first large-scale benchmark dataset explicitly developed for limited-view prostate USCT. Our dataset includes over 280,000 paired samples of realistic 2D speed-of-sound (SOS) phantoms and corresponding ultrasound full-waveform data, generated from anatomically accurate 3D digital prostate models derived from real clinical MRI/CT scans and ex vivo ultrasound measurements, annotated by medical experts. Simulations are conducted under clinically realistic configurations using advanced finite-difference time-domain and Runge-Kutta acoustic wave solvers, both provided as open-source components. Through comprehensive baseline experiments, we demonstrate that state-of-the-art deep learning methods surpass traditional physics-based approaches in both inference efficiency and reconstruction accuracy. Nevertheless, current deep learning models still fall short of delivering clinically acceptable high-resolution images with sufficient accuracy. By publicly releasing OpenPros, we aim to encourage the development of advanced machine learning algorithms capable of bridging this performance gap and producing clinically usable, high-resolution, and highly accurate prostate ultrasound images. The dataset is publicly accessible at https://open-pros.github.io/.
An Empirical Study of Large-Scale Data-Driven Full Waveform Inversion
This paper investigates the impact of big data on deep learning models to help solve the full waveform inversion (FWI) problem. While it is well known that big data can boost the performance of deep learning models in many tasks, its effectiveness has not been validated for FWI. To address this gap, we present an empirical study that investigates how deep learning models in FWI behave when trained on OpenFWI, a collection of large-scale, multi-structural, synthetic datasets published recently. In particular, we train and evaluate the FWI models on a combination of 10 2D subsets in OpenFWI that contain 470K pairs of seismic data and velocity maps in total. Our experiments demonstrate that training on the combined dataset yields an average improvement of 13.03% in MAE, 7.19% in MSE and 1.87% in SSIM compared to each split dataset, and an average improvement of 28.60%, 21.55% and 8.22% in the leave-one-out generalization test. We further demonstrate that model capacity needs to scale in accordance with data size for optimal improvement, where our largest model yields an average improvement of 20.06%, 13.39% and 0.72% compared to the smallest one.
MVDR Beamforming for Cyclostationary Processes
Conventional acoustic beamformers assume that noise is stationary within short time frames. This assumption prevents them from exploiting correlations between frequencies in almost-periodic noise sources such as musical instruments, fans, and engines. These signals exhibit periodically varying statistics and are better modeled as cyclostationary processes. This paper introduces the cyclic MVDR (cMVDR) beamformer, an extension of the conventional MVDR that leverages both spatial and spectral correlations to improve noise reduction, particularly in low-SNR scenarios. The method builds on frequency-shifted (FRESH) filtering, where shifted versions of the input are combined to attenuate or amplify components that are coherent across frequency. To address inharmonicity, where harmonic partials deviate from exact integer multiples of the fundamental frequency, we propose a data-driven strategy that estimates resonant frequencies via periodogram analysis and computes the frequency shifts from their spacing. Analytical and experimental results demonstrate that performance improves with increasing spectral correlation. On real recordings, the cMVDR achieves up to 5 dB gain in scale-invariant signal-to-distortion ratio (SI-SDR) over the MVDR and remains effective even with a single microphone. Code is available at https://github.com/Screeen/cMVDR.
E2ESlack: An End-to-End Graph-Based Framework for Pre-Routing Slack Prediction
Pre-routing slack prediction remains a critical area of research in Electronic Design Automation (EDA). Despite numerous machine learning-based approaches targeting this task, there is still a lack of a truly end-to-end framework that engineers can use to obtain TNS/WNS metrics from raw circuit data at the placement stage. Existing works have demonstrated effectiveness in Arrival Time (AT) prediction but lack a mechanism for Required Arrival Time (RAT) prediction, which is essential for slack prediction and obtaining TNS/WNS metrics. In this work, we propose E2ESlack, an end-to-end graph-based framework for pre-routing slack prediction. The framework includes a TimingParser that supports DEF, SDF and LIB files for feature extraction and graph construction, an arrival time prediction model and a fast RAT estimation module. To the best of our knowledge, this is the first work capable of predicting path-level slacks at the pre-routing stage. We perform extensive experiments and demonstrate that our proposed RAT estimation method outperforms the SOTA ML-based prediction method and also pre-routing STA tool. Additionally, the proposed E2ESlack framework achieves TNS/WNS values comparable to post-routing STA results while saving up to 23x runtime.
UWB TDoA Error Correction using Transformers: Patching and Positional Encoding Strategies
Despite their high accuracy, UWB-based localization systems suffer inaccuracies when deployed in industrial locations with many obstacles due to multipath effects and non-line-of-sight (NLOS) conditions. In such environments, current error mitigation approaches for time difference of arrival (TDoA) localization typically exclude NLOS links. However, this exclusion approach leads to geometric dilution of precision problems and this approach is infeasible when the majority of links are NLOS. To address these limitations, we propose a transformer-based TDoA position correction method that uses raw channel impulse responses (CIRs) from all available anchor nodes to compute position corrections. We introduce different CIR ordering, patching and positional encoding strategies for the transformer, and analyze each proposed technique's scalability and performance gains. Based on experiments on real-world UWB measurements, our approach can provide accuracies of up to 0.39 m in a complex environment consisting of (almost) only NLOS signals, which is an improvement of 73.6 % compared to the TDoA baseline.
On the Sensing Performance of OFDM-based ISAC under the Influence of Oscillator Phase Noise
Integrated sensing and communication (ISAC) is a novel capability expected for sixth generation (6G) cellular networks. To that end, several challenges must be addressed to enable both mono- and bistatic sensing in existing deployments. A common impairment in both architectures is oscillator phase noise (PN), which not only degrades communication performance, but also severely impairs radar sensing. To enable a broader understanding of orthogonal-frequency division multiplexing (OFDM)-based sensing impaired by PN, this article presents an analysis of sensing peformance in OFDM-based ISAC for different waveform parameter choices and settings in both mono- and bistatic architectures. In this context, the distortion of the adopted digital constellation modulation is analyzed and the resulting PN-induced effects in range-Doppler radar images are investigated both without and with PN compensation. These effects include peak power loss of target reflections and higher sidelobe levels, especially in the Doppler shift direction. In the conducted analysis, these effects are measured by the peak power loss ratio, peak-to-sidelobe level ratio, and integrated sidelobe level ratio parameters, the two latter being evaluated in both range and Doppler shift directions. In addition, the signal-to-interference ratio is analyzed to allow not only quantifying the distortion of a target reflection, but also measuring the interference floor level in a radar image. The achieved results allow to quantify not only the PN-induced impairments to a single target, but also how the induced degradation may impair the sensing performance of OFDM-based ISAC systems in multi-target scenarios.
Harnessing Selective State Space Models to Enhance Semianalytical Design of Fabrication-Ready Multilayered Huygens' Metasurfaces: Part I - Field-based Semianalytical Synthesis
Planar metasurfaces can profoundly control electromagnetic scattering. At microwave frequencies, such devices are typically implemented using multilayer cascades of patterned metallic sheets, whose design often requires time-consuming full-wave optimization. Here, we extend analytical models originally developed for sparse loaded-wire metagratings to accurately describe densely packed Jerusalem-cross meta-atoms embedded in standard printed circuit board (PCB) dielectric stacks. The model captures both near- and far-field coupling within and between layers, enabling efficient prediction of the dual-polarized response. Using this framework, we identify highly transmissive meta-atoms whose phase is controlled by the leg lengths of the Jerusalem crosses (microscopic design stage). This (phase)-(leg-length) "lookup table" allows rapid synthesis of Huygens' metasurfaces (macroscopic design stage), demonstrated through a full-wave-validated metalens exhibiting low-reflection beam manipulation. Notably, we implement a judicious scaling method to further extend the model to predict wideband meta-atom responses. In the companion paper (Part II), a hybrid machine-learning approach leverages this semianalytical framework to enhance accuracy without requiring the conventional exhaustive full-wave training, enabling ultrafast inverse design across the full parameter space. Overall, the presented methodology -- the standalone semianlytical scheme (Part I) and the machine-learning enhanced version (Part II) -- establishes an effective open-source toolkit for versatile, rapid, and highly accurate synthesis of fabrication-ready dual-polarized transmissive Huygens' meta-atoms and metasurfaces.
Orbital Transformers for Predicting Wavefunctions in Time-Dependent Density Functional Theory
We aim to learn wavefunctions simulated by time-dependent density functional theory (TDDFT), which can be efficiently represented as linear combination coefficients of atomic orbitals. In real-time TDDFT, the electronic wavefunctions of a molecule evolve over time in response to an external excitation, enabling first-principles predictions of physical properties such as optical absorption, electron dynamics, and high-order response. However, conventional real-time TDDFT relies on time-consuming propagation of all occupied states with fine time steps. In this work, we propose OrbEvo, which is based on an equivariant graph transformer architecture and learns to evolve the full electronic wavefunction coefficients across time steps. First, to account for external field, we design an equivariant conditioning to encode both strength and direction of external electric field and break the symmetry from SO(3) to SO(2). Furthermore, we design two OrbEvo models, OrbEvo-WF and OrbEvo-DM, using wavefunction pooling and density matrix as interaction method, respectively. Motivated by the central role of the density functional in TDDFT, OrbEvo-DM encodes the density matrix aggregated from all occupied electronic states into feature vectors via tensor contraction, providing a more intuitive approach to learn the time evolution operator. We adopt a training strategy specifically tailored to limit the error accumulation of time-dependent wavefunctions over autoregressive rollout. To evaluate our approach, we generate TDDFT datasets consisting of 5,000 different molecules in the QM9 dataset and 1,500 molecular configurations of the malonaldehyde molecule in the MD17 dataset. Results show that our OrbEvo model accurately captures quantum dynamics of excited states under external field, including time-dependent wavefunctions, time-dependent dipole moment, and optical absorption spectra.
Predicting Time-Dependent Flow Over Complex Geometries Using Operator Networks
Fast, geometry-generalizing surrogates for unsteady flow remain challenging. We present a time-dependent, geometry-aware Deep Operator Network that predicts velocity fields for moderate-Re flows around parametric and non-parametric shapes. The model encodes geometry via a signed distance field (SDF) trunk and flow history via a CNN branch, trained on 841 high-fidelity simulations. On held-out shapes, it attains sim 5% relative L2 single-step error and up to 1000X speedups over CFD. We provide physics-centric rollout diagnostics, including phase error at probes and divergence norms, to quantify long-horizon fidelity. These reveal accurate near-term transients but error accumulation in fine-scale wakes, most pronounced for sharp-cornered geometries. We analyze failure modes and outline practical mitigations. Code, splits, and scripts are openly released at: https://github.com/baskargroup/TimeDependent-DeepONet to support reproducibility and benchmarking.
Waver: Wave Your Way to Lifelike Video Generation
We present Waver, a high-performance foundation model for unified image and video generation. Waver can directly generate videos with durations ranging from 5 to 10 seconds at a native resolution of 720p, which are subsequently upscaled to 1080p. The model simultaneously supports text-to-video (T2V), image-to-video (I2V), and text-to-image (T2I) generation within a single, integrated framework. We introduce a Hybrid Stream DiT architecture to enhance modality alignment and accelerate training convergence. To ensure training data quality, we establish a comprehensive data curation pipeline and manually annotate and train an MLLM-based video quality model to filter for the highest-quality samples. Furthermore, we provide detailed training and inference recipes to facilitate the generation of high-quality videos. Building on these contributions, Waver excels at capturing complex motion, achieving superior motion amplitude and temporal consistency in video synthesis. Notably, it ranks among the Top 3 on both the T2V and I2V leaderboards at Artificial Analysis (data as of 2025-07-30 10:00 GMT+8), consistently outperforming existing open-source models and matching or surpassing state-of-the-art commercial solutions. We hope this technical report will help the community more efficiently train high-quality video generation models and accelerate progress in video generation technologies. Official page: https://github.com/FoundationVision/Waver.
SSMRadNet : A Sample-wise State-Space Framework for Efficient and Ultra-Light Radar Segmentation and Object Detection
We introduce SSMRadNet, the first multi-scale State Space Model (SSM) based detector for Frequency Modulated Continuous Wave (FMCW) radar that sequentially processes raw ADC samples through two SSMs. One SSM learns a chirp-wise feature by sequentially processing samples from all receiver channels within one chirp, and a second SSM learns a representation of a frame by sequentially processing chirp-wise features. The latent representations of a radar frame are decoded to perform segmentation and detection tasks. Comprehensive evaluations on the RADIal dataset show SSMRadNet has 10-33x fewer parameters and 60-88x less computation (GFLOPs) while being 3.7x faster than state-of-the-art transformer and convolution-based radar detectors at competitive performance for segmentation tasks.
Ab uno disce omnes: Single-harmonic search for extreme mass-ratio inspirals
Extreme mass-ratio inspirals (EMRIs) are one of the key sources of gravitational waves for space-based detectors such as LISA. However, their detection remains a major data analysis challenge due to the signals' complexity and length. We present a semi-coherent, time-frequency search strategy for detecting EMRI harmonics without relying on full waveform templates. We perform an injection and search campaign of single mildly-eccentric equatorial EMRIs in stationary Gaussian noise. The detection statistic is constructed solely from the EMRI frequency evolution, which is modeled phenomenologically using a Singular Value Decomposition basis. The pipeline and the detection statistic are implemented in time-frequency, enabling efficient searches over one year of data in approximately one hour on a single GPU. The search pipeline achieves 94% detection probability at SNR = 30 for a false-alarm probability of 10^{-2}, recovering the frequency evolution of the dominant harmonic to 1% relative error. By mapping the EMRI parameters consistent with the recovered frequency evolution, we show that the semi-coherent detection statistic enables a sub-percent precision estimation of the EMRI intrinsic parameters. These results establish a computationally efficient framework for constructing EMRI proposals for the LISA global fit.
Generating arbitrary polarization states by manipulating the thicknesses of a pair of uniaxial birefringent plates
We report an optical method of generating arbitrary polarization states by manipulating the thicknesses of a pair of uniaxial birefringent plates, the optical axes of which are set at a crossing angle of {\pi}/4. The method has the remarkable feature of being able to generate a distribution of arbitrary polarization states in a group of highly discrete spectra without spatially separating the individual spectral components. The target polarization-state distribution is obtained as an optimal solution through an exploration. Within a realistic exploration range, a sufficient number of near-optimal solutions are found. This property is also reproduced well by a concise model based on a distribution of exploration points on a Poincar\'e sphere, showing that the number of near-optimal solutions behaves according to a power law with respect to the number of spectral components of concern. As a typical example of an application, by applying this method to a set of phase-locked highly discrete spectra, we numerically demonstrate the continuous generation of a vector-like optical electric field waveform, the helicity of which is alternated within a single optical cycle in the time domain.
Chirp Localization via Fine-Tuned Transformer Model: A Proof-of-Concept Study
Spectrograms are pivotal in time-frequency signal analysis, widely used in audio processing and computational neuroscience. Chirp-like patterns in electroencephalogram (EEG) spectrograms (marked by linear or exponential frequency sweep) are key biomarkers for seizure dynamics, but automated tools for their detection, localization, and feature extraction are lacking. This study bridges this gap by fine-tuning a Vision Transformer (ViT) model on synthetic spectrograms, augmented with Low-Rank Adaptation (LoRA) to boost adaptability. We generated 100000 synthetic spectrograms with chirp parameters, creating the first large-scale benchmark for chirp localization. These spectrograms mimic neural chirps using linear or exponential frequency sweep, Gaussian noise, and smoothing. A ViT model, adapted for regression, predicted chirp parameters. LoRA fine-tuned the attention layers, enabling efficient updates to the pre-trained backbone. Training used MSE loss and the AdamW optimizer, with a learning rate scheduler and early stopping to curb overfitting. Only three features were targeted: Chirp Start Time (Onset Time), Chirp Start Frequency (Onset Frequency), and Chirp End Frequency (Offset Frequency). Performance was evaluated via Pearson correlation between predicted and actual labels. Results showed strong alignment: 0.9841 correlation for chirp start time, with stable inference times (137 to 140s) and minimal bias in error distributions. This approach offers a tool for chirp analysis in EEG time-frequency representation, filling a critical methodological void.
Radio Frequency Fingerprint Identification for LoRa Using Spectrogram and CNN
Radio frequency fingerprint identification (RFFI) is an emerging device authentication technique that relies on intrinsic hardware characteristics of wireless devices. We designed an RFFI scheme for Long Range (LoRa) systems based on spectrogram and convolutional neural network (CNN). Specifically, we used spectrogram to represent the fine-grained time-frequency characteristics of LoRa signals. In addition, we revealed that the instantaneous carrier frequency offset (CFO) is drifting, which will result in misclassification and significantly compromise the system stability; we demonstrated CFO compensation is an effective mitigation. Finally, we designed a hybrid classifier that can adjust CNN outputs with the estimated CFO. The mean value of CFO remains relatively stable, hence it can be used to rule out CNN predictions whose estimated CFO falls out of the range. We performed experiments in real wireless environments using 20 LoRa devices under test (DUTs) and a Universal Software Radio Peripheral (USRP) N210 receiver. By comparing with the IQ-based and FFT-based RFFI schemes, our spectrogram-based scheme can reach the best classification accuracy, i.e., 97.61% for 20 LoRa DUTs.
Accelerating High-Fidelity Waveform Generation via Adversarial Flow Matching Optimization
This paper introduces PeriodWave-Turbo, a high-fidelity and high-efficient waveform generation model via adversarial flow matching optimization. Recently, conditional flow matching (CFM) generative models have been successfully adopted for waveform generation tasks, leveraging a single vector field estimation objective for training. Although these models can generate high-fidelity waveform signals, they require significantly more ODE steps compared to GAN-based models, which only need a single generation step. Additionally, the generated samples often lack high-frequency information due to noisy vector field estimation, which fails to ensure high-frequency reproduction. To address this limitation, we enhance pre-trained CFM-based generative models by incorporating a fixed-step generator modification. We utilized reconstruction losses and adversarial feedback to accelerate high-fidelity waveform generation. Through adversarial flow matching optimization, it only requires 1,000 steps of fine-tuning to achieve state-of-the-art performance across various objective metrics. Moreover, we significantly reduce inference speed from 16 steps to 2 or 4 steps. Additionally, by scaling up the backbone of PeriodWave from 29M to 70M parameters for improved generalization, PeriodWave-Turbo achieves unprecedented performance, with a perceptual evaluation of speech quality (PESQ) score of 4.454 on the LibriTTS dataset. Audio samples, source code and checkpoints will be available at https://github.com/sh-lee-prml/PeriodWave.
FLM-Audio: Natural Monologues Improves Native Full-Duplex Chatbots via Dual Training
Full-duplex dialog models are designed to listen and speak simultaneously with rapid responses to fast-changing user input. Among existing approaches, native full-duplex models merges different channels (e.g. listen and speak) in a single time step, overcoming the high response latency inherent to time-division multiplexing time-division multiplexing (TDM) alternatives. Yet, a key challenge remains: aligning textual monologues with audio streams that operate at different bitrates. The prevailing solution relies on word-level alignment, but this can degrade the language ability of large pre-trained models. Moreover, it requires highly accurate timestamps for every token, which introduces cascading errors and increases pre-processing costs. In this paper, we propose textual monologues in continuous tokens sequence, namely "natural" monologues, which mimics humanoid cognitive behavior in dialogs. For temporal alignment, we alternate the position of the natural monologue - leading or trailing the audio - across different training stages. This "dual" training paradigm proves highly effective in building FLM-Audio, our 7B spoken dialog model that demonstrates superior responsiveness, duplexity, and chatting experiences, as confirmed by experimental results.
Reconstruction of inclined extensive air showers using radio signals: from arrival times and amplitudes to direction and energy
Radio detection is now an established technique for the study of ultra-high-energy (UHE) cosmic rays with energies above sim10^{17} eV. The next-generation of radio experiments aims to extend this technique to the observation of UHE earth-skimming neutrinos, which requires the detection of very inclined extensive air showers (EAS). In this article we present a new reconstruction method for the arrival direction and the energy of EAS. It combines a point-source-like description of the radio wavefront with a phenomenological model: the Angular Distribution Function (ADF). The ADF describes the angular distribution of the radio signal amplitude in the 50-200 MHz frequency range, with a particular focus on the Cherenkov angle, a crucial feature of the radio amplitude pattern. The method is applicable to showers with zenith angles larger than 60^circ, and in principle up to neutrino-induced showers with up-going trajectories. It is tested here on a simulated data set of EAS induced by cosmic rays. A resolution better than 4 arc-minutes (0.07^circ) is achieved on arrival direction, as well as an intrinsic resolution of 5% on the electromagnetic energy, and around 15% on the primary energy.
Efficient Physics-Based Learned Reconstruction Methods for Real-Time 3D Near-Field MIMO Radar Imaging
Near-field multiple-input multiple-output (MIMO) radar imaging systems have recently gained significant attention. In this paper, we develop novel non-iterative deep learning-based reconstruction methods for real-time near-field MIMO imaging. The goal is to achieve high image quality with low computational cost at compressive settings. The developed approaches have two stages. In the first approach, physics-based initial stage performs adjoint operation to back-project the measurements to the image-space, and deep neural network (DNN)-based second stage converts the 3D backprojected measurements to a magnitude-only reflectivity image. Since scene reflectivities often have random phase, DNN processes directly the magnitude of the adjoint result. As DNN, 3D U-Net is used to jointly exploit range and cross-range correlations. To comparatively evaluate the significance of exploiting physics in a learning-based approach, two additional approaches that replace the physics-based first stage with fully connected layers are also developed as purely learning-based methods. The performance is also analyzed by changing the DNN architecture for the second stage to include complex-valued processing (instead of magnitude-only processing), 2D convolution kernels (instead of 3D), and ResNet architecture (instead of U-Net). Moreover, we develop a synthesizer to generate large-scale dataset for training with 3D extended targets. We illustrate the performance through experimental data and extensive simulations. The results show the effectiveness of the developed physics-based learned reconstruction approach in terms of both run-time and image quality at highly compressive settings. Our source codes and dataset are made available at GitHub.
Model-agnostic search for the quasinormal modes of gravitational wave echoes
Post-merger gravitational wave echoes provide a unique opportunity to probe the near-horizon structure of astrophysical black holes, that may be modified due to non-perturbative quantum gravity phenomena. However, since the waveform is subject to large theoretical uncertainties, it is necessary to develop model-agnostic search methods for detecting echoes from observational data. A promising strategy is to identify the characteristic quasinormal modes (QNMs) associated with echoes, {\it in frequency space}, which complements existing searches of quasiperiodic pulses in time. In this study, we build upon our previous work targeting these modes by incorporating relative phase information to optimize the Bayesian search algorithm. Using a new phase-marginalized likelihood, the performance can be significantly improved for well-resolved QNMs. This enables an efficient model-agnostic search for QNMs of different shapes by using a simple search template. To demonstrate the robustness of the search algorithm, we construct four complementary benchmarks for the echo waveform that span a diverse range of different theoretical possibilities for the near-horizon structure. We then validate our Bayesian search algorithms by injecting the benchmark models into different realizations of Gaussian noise. Using two types of phase-marginalized likelihoods, we find that the search algorithm can efficiently detect the corresponding QNMs. Therefore, our search strategy provides a concrete Bayesian and model-agnostic approach to "quantum black hole seismology".
Cyclic Multichannel Wiener Filter for Acoustic Beamforming
Acoustic beamforming models typically assume wide-sense stationarity of speech signals within short time frames. However, voiced speech is better modeled as a cyclostationary (CS) process, a random process whose mean and autocorrelation are T_1-periodic, where alpha_1=1/T_1 corresponds to the fundamental frequency of vowels. Higher harmonic frequencies are found at integer multiples of the fundamental. This work introduces a cyclic multichannel Wiener filter (cMWF) for speech enhancement derived from a cyclostationary model. This beamformer exploits spectral correlation across the harmonic frequencies of the signal to further reduce the mean-squared error (MSE) between the target and the processed input. The proposed cMWF is optimal in the MSE sense and reduces to the MWF when the target is wide-sense stationary. Experiments on simulated data demonstrate considerable improvements in scale-invariant signal-to-distortion ratio (SI-SDR) on synthetic data but also indicate high sensitivity to the accuracy of the estimated fundamental frequency alpha_1, which limits effectiveness on real data.
ANN-based position and speed sensorless estimation for BLDC motors
BLDC motor applications require precise position and speed measurements, traditionally obtained with sensors. This article presents a method for estimating those measurements without position sensors using terminal phase voltages with attenuated spurious, acquired with a FPGA that also operates a PWM-controlled inverter. Voltages are labelled with electrical and virtual rotor states using an encoder that provides training and testing data for two three-layer ANNs with perceptron-based cascade topology. The first ANN estimates the position from features of voltages with incremental timestamps, and the second ANN estimates the speed from features of position differentials considering timestamps in an acquisition window. Sensor-based training and sensorless testing at 125 to 1,500 rpm with a loaded 8-pole-pair motor obtained absolute errors of 0.8 electrical degrees and 22 rpm. Results conclude that the overall position estimation significantly improved conventional and advanced methods, and the speed estimation slightly improved conventional methods, but was worse than in advanced ones.
The Latency Wall: Benchmarking Off-the-Shelf Emotion Recognition for Real-Time Virtual Avatars
In the realm of Virtual Reality (VR) and Human-Computer Interaction (HCI), real-time emotion recognition shows promise for supporting individuals with Autism Spectrum Disorder (ASD) in improving social skills. This task requires a strict latency-accuracy trade-off, with motion-to-photon (MTP) latency kept below 140 ms to maintain contingency. However, most off-the-shelf Deep Learning models prioritize accuracy over the strict timing constraints of commodity hardware. As a first step toward accessible VR therapy, we benchmark State-of-the-Art (SOTA) models for Zero-Shot Facial Expression Recognition (FER) on virtual characters using the UIBVFED dataset. We evaluate Medium and Nano variants of YOLO (v8, v11, and v12) for face detection, alongside general-purpose Vision Transformers including CLIP, SigLIP, and ViT-FER.Our results on CPU-only inference demonstrate that while face detection on stylized avatars is robust (100% accuracy), a "Latency Wall" exists in the classification stage. The YOLOv11n architecture offers the optimal balance for detection (~54 ms). However, general-purpose Transformers like CLIP and SigLIP fail to achieve viable accuracy (<23%) or speed (>150 ms) for real-time loops. This study highlights the necessity for lightweight, domain-specific architectures to enable accessible, real-time AI in therapeutic settings.
WaveFlow: A Compact Flow-based Model for Raw Audio
In this work, we propose WaveFlow, a small-footprint generative flow for raw audio, which is directly trained with maximum likelihood. It handles the long-range structure of 1-D waveform with a dilated 2-D convolutional architecture, while modeling the local variations using expressive autoregressive functions. WaveFlow provides a unified view of likelihood-based models for 1-D data, including WaveNet and WaveGlow as special cases. It generates high-fidelity speech as WaveNet, while synthesizing several orders of magnitude faster as it only requires a few sequential steps to generate very long waveforms with hundreds of thousands of time-steps. Furthermore, it can significantly reduce the likelihood gap that has existed between autoregressive models and flow-based models for efficient synthesis. Finally, our small-footprint WaveFlow has only 5.91M parameters, which is 15times smaller than WaveGlow. It can generate 22.05 kHz high-fidelity audio 42.6times faster than real-time (at a rate of 939.3 kHz) on a V100 GPU without engineered inference kernels.
Separating source-intrinsic and Lorentz invariance violation induced delays in the very high energy emission of blazar flares
Aims: The aim of the present study is to explore how to disentangle energy-dependent time delays due to a possible Lorentz invariance violation (LIV) at Planck scale from intrinsic delays expected in standard blazar flares. Methods: We first characterise intrinsic time delays in BL Lacs and Flat Spectrum Radio Quasars in standard one-zone time-dependent synchrotron self-Compton or external Compton models, during flares produced by particle acceleration and cooling processes. We simulate families of flares with both intrinsic and external LIV-induced energy-dependent delays. Discrimination between intrinsic and LIV delays is then investigated in two different ways. A technique based on Euclidean distance calculation between delays obtained in the synchrotron and in the inverse-Compton spectral bumps is used to assess their degree of correlation. A complementary study is performed using spectral hardness versus intensity diagrams in both energy ranges. Results: We show that the presence of non-negligible LIV effects, which essentially act only at very high energies (VHE), can drastically reduce the strong correlation expected between the X-ray and the VHE gamma-ray emission in leptonic scenarios. The LIV phenomenon can then be hinted at measuring the Euclidean distance d_{E} from simultaneous X-ray and gamma-ray flare monitoring. Large values of minimal distance d_{E,min} would directly indicate the influence of non-intrinsic time delays possibly due to LIV in SSC flares. LIV effects can also significantly modify the VHE hysteresis patterns in hardness-intensity diagrams and even change their direction of rotation as compared to the X-ray behaviour. Both observables could be used to discriminate between LIV and intrinsic delays, provided high quality flare observations are available.
The Turing Synthetic Radar Dataset: A dataset for pulse deinterleaving
We present the Turing Synthetic Radar Dataset, a comprehensive dataset to serve both as a benchmark for radar pulse deinterleaving research and as an enabler of new research methods. The dataset addresses the critical problem of separating interleaved radar pulses from multiple unknown emitters for electronic warfare applications and signal intelligence. Our dataset contains a total of 6000 pulse trains over two receiver configurations, totalling to almost 3 billion pulses, featuring realistic scenarios with up to 110 emitters and significant parameter space overlap. To encourage dataset adoption and establish standardised evaluation procedures, we have launched an accompanying Turing Deinterleaving Challenge, for which models need to associate pulses in interleaved pulse trains to the correct emitter by clustering and maximising metrics such as the V-measure. The Turing Synthetic Radar Dataset is one of the first publicly available, comprehensively simulated pulse train datasets aimed to facilitate sophisticated model development in the electronic warfare community
Impact of local bunching factors in single-pass THz free electron lasers
In simulations for modern free-electron lasers (FEL), shot noise plays a crucial role. While it is inversely proportional to the number of electrons, shot noise is typically modeled using macroparticles, with their bunching factors corresponding to the bunching factors of the much larger number of electrons. For short-wavelength FELs, the macroparticles are assumed to be uniformly distributed on the scale of the resonant wavelength, since shot noise dominates the initial radiation - for instance, in the self-amplified spontaneous emission (SASE) regime. In this paper, we show that this assumption does not hold at longer wavelengths, particularly in the THz range, where the bunch current profile is not uniform even within the length of the resonant wavelength. Instead, the current profile dominates the initial bunching factors, which can be several orders of magnitude higher than shot noise. The slice-based bunching factors and bunching phases are derived for Gaussian distributions and compared with shot noise under the assumption that the current within each slice remains constant. Using the THz FEL at the photoinjector test facility at DESY in Zeuthen (PITZ) as a case study, the influence of the current profile has been benchmarked through simulations under very low bunch charge, where the full number of electrons can be modeled using the Genesis1.3 code. Additional simulations with the nominal working parameters of PITZ THz FEL have been compared with experimental data, indicating better agreement when the actual current profile is taken into account.
A Machine Learning Pipeline for Hunting Hidden Axion Signals in Pulsar Dispersion Measurements
In the axion model, electromagnetic waves interacting with axions induce frequency-dependent time delays, determined by the axion mass and decay constant. These small delays are difficult to detect, making traditional methods ineffective. To address this, we computed time delays for various parameters and found a prominent dispersion signal when the wave frequency equals half the axion mass. Based on this, we developed a machine learning-based pipeline, achieving 95\% classification accuracy and demonstrating strong detection capability in low signal-to-noise data. Applying this to PSR J1933-6211, we found no axion-induced delays within current sensitivity limits. While existing constraints are limited by atomic clock resolution in radio telescopes, future advances in optical clocks and broader bandwidths will enable more extensive searches. In particular, combining high-precision optical clocks with next-generation radio telescopes, such as the Qitai Radio Telescope, could improve decay constant constraints by four orders of magnitude for axion masses in the 10^{-6} sim 10^{-4} eV range.
Prediction of the Position of External Markers Using a Recurrent Neural Network Trained With Unbiased Online Recurrent Optimization for Safe Lung Cancer Radiotherapy
During lung radiotherapy, the position of infrared reflective objects on the chest can be recorded to estimate the tumor location. However, radiotherapy systems have a latency inherent to robot control limitations that impedes the radiation delivery precision. Prediction with online learning of recurrent neural networks (RNN) allows for adaptation to non-stationary respiratory signals, but classical methods such as RTRL and truncated BPTT are respectively slow and biased. This study investigates the capabilities of unbiased online recurrent optimization (UORO) to forecast respiratory motion and enhance safety in lung radiotherapy. We used 9 observation records of the 3D position of 3 external markers on the chest and abdomen of healthy individuals breathing during intervals from 73s to 222s. The sampling frequency was 10Hz, and the amplitudes of the recorded trajectories range from 6mm to 40mm in the superior-inferior direction. We forecast the 3D location of each marker simultaneously with a horizon value between 0.1s and 2.0s, using an RNN trained with UORO. We compare its performance with an RNN trained with RTRL, LMS, and offline linear regression. We provide closed-form expressions for quantities involved in the loss gradient calculation in UORO, thereby making its implementation efficient. Training and cross-validation were performed during the first minute of each sequence. On average over the horizon values considered and the 9 sequences, UORO achieves the lowest root-mean-square (RMS) error and maximum error among the compared algorithms. These errors are respectively equal to 1.3mm and 8.8mm, and the prediction time per time step was lower than 2.8ms (Dell Intel core i9-9900K 3.60 GHz). Linear regression has the lowest RMS error for the horizon values 0.1s and 0.2s, followed by LMS for horizon values between 0.3s and 0.5s, and UORO for horizon values greater than 0.6s.
TR-DQ: Time-Rotation Diffusion Quantization
Diffusion models have been widely adopted in image and video generation. However, their complex network architecture leads to high inference overhead for its generation process. Existing diffusion quantization methods primarily focus on the quantization of the model structure while ignoring the impact of time-steps variation during sampling. At the same time, most current approaches fail to account for significant activations that cannot be eliminated, resulting in substantial performance degradation after quantization. To address these issues, we propose Time-Rotation Diffusion Quantization (TR-DQ), a novel quantization method incorporating time-step and rotation-based optimization. TR-DQ first divides the sampling process based on time-steps and applies a rotation matrix to smooth activations and weights dynamically. For different time-steps, a dedicated hyperparameter is introduced for adaptive timing modeling, which enables dynamic quantization across different time steps. Additionally, we also explore the compression potential of Classifier-Free Guidance (CFG-wise) to establish a foundation for subsequent work. TR-DQ achieves state-of-the-art (SOTA) performance on image generation and video generation tasks and a 1.38-1.89x speedup and 1.97-2.58x memory reduction in inference compared to existing quantization methods.
Prospects for identifying pulsar candidates in radio surveys using scintillation
In our previous paper, we developed a technique for identifying pulsar candidates in interferometric radio images using their distinctive scintillation signatures. Building on this technique, the present study simulates a pulsar population using the PsrPopPy Python module to investigate the technique's limitations and detection capabilities. Among pulsars detectable exclusively by this technique, 50% have duty cycles exceeding the mean value of 0.09 observed in time-domain detections. Our pulsar population simulations revealed a set of observational parameters that optimize pulsar detection. An observation frequency of ~ 1420 MHz and a channel width of ~10 kHz emerge as the optimal configuration to maximize the pulsar detection efficiency. By applying a scintillation-based technique to future radio telescopes like DSA-2000, we can detect 56% of normal pulsars and 84% of MSPs in addition to those detected using non-imaging, time-domain surveys. These detected pulsars cannot be verified by time-domain searches.
Real-time respiratory motion forecasting with online learning of recurrent neural networks for accurate targeting in externally guided radiotherapy
In lung radiotherapy, infrared cameras can track reflective objects on the chest to estimate tumor motion due to breathing, but treatment system latencies hinder radiation beam precision. Real-time recurrent learning (RTRL) is a potential solution that can learn patterns within non-stationary respiratory data but has high complexity. This study assesses the capabilities of resource-efficient online RNN algorithms, namely unbiased online recurrent optimization (UORO), sparse-1 step approximation (SnAp-1), and decoupled neural interfaces (DNI) to forecast respiratory motion during radiotherapy treatment accurately. We use time series containing the 3D positions of external markers on the chest of healthy subjects. We propose efficient implementations for SnAp-1 and DNI that compress the influence and immediate Jacobian matrices and accurately update the linear coefficients used in credit assignment estimation, respectively. Data was originally sampled at 10Hz; we resampled it at 3.33Hz and 30Hz to analyze the effect of the sampling rate on performance. We use UORO, SnAp-1, and DNI to forecast each marker's 3D position with horizons h<=2.1s (the time interval in advance for which the prediction is made) and compare them with RTRL, least mean squares, kernel support vector regression, and linear regression. RNNs trained online achieved similar or better accuracy than most previous works using larger training databases and deep learning, even though we used only the first minute of each sequence to predict motion within that exact sequence. SnAp-1 had the lowest normalized root mean square errors (nRMSEs) averaged over the horizon values considered, equal to 0.335 and 0.157, at 3.33Hz and 10.0Hz, respectively. Similarly, UORO had the lowest nRMSE at 30Hz, equal to 0.086. DNI's inference time (6.8ms per time step at 30Hz, Intel Core i7-13700 CPU) was the lowest among the RNN methods.
