Hello! Thank you for new word segmentation. I think you should use VISTEC-TP-TH-21 dataset. It is CC-BY-SA and is the largest social media domain datasets for Thai text processing.
VISTEC-TP-TH-21: https://github.com/mrpeerat/OSKut/tree/main/VISTEC-TP-TH-2021