This repository was archived by the owner on Oct 6, 2025. It is now read-only.
Time Cost Estimation for Training Piper from Scratch #706
Unanswered
Lennox-Dai
asked this question in
Q&A
Replies: 1 comment
-
|
Hi do you have news about japanese model in piper tts? |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
It seems that I haven't found a model for Piper in some languages (such as Japanese), and when I try finetuning from models in other languages like Chinese, the results don’t seem to be ideal. So, I would like to train a Japanese model from scratch.
My GPU resources are 8*H800, and I am using the DDP strategy for multi-GPU training. The dataset consists of over 7,000 high-quality single-speaker Japanese speech samples, totaling about 10 hours of audio. Currently, I have trained for epoch=17979-step=724820.ckpt, but the results are not very satisfactory; the generated speech still sounds like a bunch of noise.
My training command is as follows:
torchrun --nproc_per_node=8 --nnodes=1 --master_port=29506 -m piper_train
--dataset-dir "$DATASET_DIR"
--accelerator "gpu"
--strategy "ddp"
--devices 8 \
--batch-size 64
--validation-split 0.0
--num-test-examples 0
--max_epochs $EPOCH
--checkpoint-epochs 10
--precision 32
--max-phoneme-ids 600
My loss_disc_all fluctuates between 1.2 and 1.8, and loss_gen_all is between 40 and 55.
Could this poor result be due to insufficient training time, or is the dataset not enough? If you have any suggestions or improvements for my training strategy, I would greatly appreciate any guidance you can provide. Thank you so much!
Beta Was this translation helpful? Give feedback.
All reactions