-
Notifications
You must be signed in to change notification settings - Fork 74
Description
We have been using dsync to migrate data from an old file system to new file system (GPFS->CephFS). The data migration works well and is meeting most of our needs.
In our data sets we have many small files (>95% of file count) and a smaller number of large files (>95% storage used). The dsync operations generally work well and perform the tree walks efficiently. The data copy performance is ultimately limited by the file sizes.
Something we have noticed, however, is that we can get into a very long tail during our batches. The throughput starts strong but then trails off to a trickle for a long time. The next batch then picks up again where the throughput starts good but then trails off again with a long tail.
We suspect this is due a a large file existing in the batch and that the rank processing the file hasn't finished, leaving the other ranks idle awaiting the next batch. It seems that the file list can be shared across ranks but that a file action (copy) is only carried out by a single rank.
Is our intuition correct?
If so, is there a way to improve the copy portion of the dsync? One solution could be to copy file data in parallel, assigning portions of a data transfer to idle ranks. This would enable all ranks to contribute to the completion of the transfer of the large file in that batch. Another option might be to allow the idle ranks to start on the next batch, avoiding a stall due to lack of work.
We'd be interested in your feedback on this assessment and suggestions for improvement.
Thanks.