-
Notifications
You must be signed in to change notification settings - Fork 102
Open
Description
Hi,
I noticed a degradation in Large v2 quality compared to Medium v2 quality.
I want to make sure I am using the config and settings correctly.
from encoder.utils import convert_audio
import torchaudio
import torch
from decoder.pretrained import WavTokenizer
device=torch.device('cpu')
config_path = "configs/wavtokenizer_smalldata_frame75_3s_nq1_code4096_dim512_kmeans200_attn.yaml"
model_path = "novateur/wavtokenizer_large_speech_320_v2.ckpt"
'''OR'''
model_path = "novateur/WavTokenizer-medium-speech-75token/wavtokenizer_medium_speech_320_24k_v2.ckpt"
audio_path = "test_audio.wav"
audio_outpath = "wav_test_audio.wav"
wavtokenizer = WavTokenizer.from_pretrained0802(config_path, model_path)
wavtokenizer = wavtokenizer.to(device)
wav, sr = torchaudio.load(audio_path)
wav = convert_audio(wav, sr, 24000, 1)
bandwidth_id = torch.tensor([0])
wav=wav.to(device)
features,discrete_code= wavtokenizer.encode_infer(wav, bandwidth_id=bandwidth_id)
print(features.shape)
print(discrete_code.shape)
for i in range(0, discrete_code.shape[-1], 75):
print(discrete_code[:, :, i:i+75], end='\n\n')
audio_out = wavtokenizer.decode(features, bandwidth_id=bandwidth_id)
torchaudio.save(audio_outpath, audio_out, sample_rate=24000, encoding='PCM_S', bits_per_sample=16)
Test audio file:
https://limewire.com/d/XpsaM#fpKfVRRdUd
Notice that breathing sounds are all messed up in large v2 compared medium v2
@jishengpeng
Metadata
Metadata
Assignees
Labels
No labels