Abstract: Real-time audio communications over IP have become essential to our daily lives. Packet-switched networks, however, are inherently prone to jitter and data losses, thus creating a strong need for effective packet loss concealment (PLC) techniques. Though solutions based on deep learning have made significant progress in that direction as far as speech is concerned, extending the use of such methods to applications of Networked Music Performance (NMP) presents significant challenges, including high fidelity requirements, higher sampling rates, and stringent temporal constraints associated to the simultaneous interaction between remote musicians. We present PARCnet, a hybrid PLC method that utilizes a feed-forward neural network to estimate the time-domain residual signal of a parallel linear autoregressive model. Objective metrics and a listening test show that PARCnet provides state-of-the-art results while enabling real-time operation on CPU.
This page features several audio examples of PLC obtained using PARCnet, which was introduced in
A. I. Mezza, M. Amerena, A. Bernardini, and A. Sarti, “Hybrid Packet Loss Concealment for Real-Time Networked Music Applications,” in IEEE Open Journal of Signal Processing, doi: 10.1109/OJSP.2023.3343318.
Source code
Visit the official GitHub repository to access the source code and pre-trained PARCnet models.
Audio Examples
Reference (Clean) | Zero filling | Verma et al. | PLAAE | TF-GAN | LPCnet | FRN | AR(128) | PARCnet (Ours) |
---|---|---|---|---|---|---|---|---|
Reference (Clean) | Zero filling | Verma et al. | PLAAE | TF-GAN | LPCnet | FRN | AR(128) | PARCnet (Ours) |
---|---|---|---|---|---|---|---|---|
Reference (Clean) | Zero filling | Verma et al. | PLAAE | TF-GAN | LPCnet | FRN | AR(128) | PARCnet (Ours) |
---|---|---|---|---|---|---|---|---|
Reference (Clean) | Zero filling | Verma et al. | PLAAE | TF-GAN | LPCnet | FRN | AR(128) | PARCnet (Ours) |
---|---|---|---|---|---|---|---|---|
Reference (Clean) | Zero filling | Verma et al. | PLAAE | TF-GAN | LPCnet | FRN | AR(128) | PARCnet (Ours) |
---|---|---|---|---|---|---|---|---|
References
Excerpts taken from MAESTRO: C. Hawthorne, A. Stasyuk, A. Roberts, I. Simon, C.-Z. A. Huang, S. Dieleman, E. Elsen, J. Engel, and D. Eck, “Enabling factorized piano music modeling and generation with the MAESTRO dataset,” in Proc. Int. Conf. Learning Representations, 2019
Verma et al.: P. Verma, A. I. Mezza, C. Chafe, and C. Rottondi, “A deep learning approach for low-latency packet loss concealment of audio signals in networked music performance applications,” in Proc. Conf. of Open Innovations Association, 2020, pp. 268–275.
PLAAE: S. Pascual, J. Serrà, and J. Pons, “Adversarial auto-encoding for packet loss concealment,” in 2021 IEEE Workshop Appl. Signal Process. Audio Acoust., 2021, pp. 71–75.
TF-GAN: J. Wang, Y. Guan, C. Zheng, R. Peng, and X. Li, “A temporal-spectral generative adversarial network based end-to-end packet loss concealment for wideband speech transmission,” J. Acoust. Soc. Am., vol. 150, no. 4, pp. 2577–2588, 2021.
LPCnet: J.-M. Valin, A. Mustafa, C. Montgomery, T. B. Terriberry, M. Klingbeil, P. Smaragdis, and A. Krishnaswamy, “Real-Time Packet Loss Concealment With Mixed Generative and Predictive Model,” in Proc. Interspeech, 2022, pp. 570–574.
FRN: V.-A. Nguyen, A. H. T. Nguyen, and A. W. H. Khong, “Improving performance of real-time full-band blind packet-loss concealment with predictive network,” in Proc. IEEE Int. Conf. Acoust. Speech Signal Process., 2023, pp. 1–5.