Time Cost Estimation for Training Piper from Scratch #706

Lennox-Dai · 2025-01-22T02:29:15Z

Lennox-Dai
Jan 22, 2025

It seems that I haven't found a model for Piper in some languages (such as Japanese), and when I try finetuning from models in other languages like Chinese, the results don’t seem to be ideal. So, I would like to train a Japanese model from scratch.

My GPU resources are 8*H800, and I am using the DDP strategy for multi-GPU training. The dataset consists of over 7,000 high-quality single-speaker Japanese speech samples, totaling about 10 hours of audio. Currently, I have trained for epoch=17979-step=724820.ckpt, but the results are not very satisfactory; the generated speech still sounds like a bunch of noise.

My training command is as follows:
torchrun --nproc_per_node=8 --nnodes=1 --master_port=29506 -m piper_train
--dataset-dir "$DATASET_DIR"
--accelerator "gpu"
--strategy "ddp"
--devices 8 \

--batch-size 64
--validation-split 0.0
--num-test-examples 0
--max_epochs $EPOCH
--checkpoint-epochs 10
--precision 32
--max-phoneme-ids 600

My loss_disc_all fluctuates between 1.2 and 1.8, and loss_gen_all is between 40 and 55.

Could this poor result be due to insufficient training time, or is the dataset not enough? If you have any suggestions or improvements for my training strategy, I would greatly appreciate any guidance you can provide. Thank you so much!

adekos · 2025-07-21T08:52:23Z

adekos
Jul 21, 2025

Hi do you have news about japanese model in piper tts?

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Time Cost Estimation for Training Piper from Scratch #706

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Time Cost Estimation for Training Piper from Scratch #706

Uh oh!

Uh oh!

Lennox-Dai Jan 22, 2025

Replies: 1 comment

Uh oh!

adekos Jul 21, 2025

Lennox-Dai
Jan 22, 2025

adekos
Jul 21, 2025