-
Notifications
You must be signed in to change notification settings - Fork 30
Description
Hello author, after training SCEdit, I made an error when inference with my own weights (using the guidance of segmentation) : 1. The inferred image is completely different from the trained image. 2. Even if the input image is different without changing the seed, the result is exactly the same. The inference code is as follows: CUDA_VISIBLE_DEVICES=1 python /data/twinkle/app/scepter/scepter/tools/run_inference.py --cfg /data/twinkle/app/scepter/scepter/methods/scedit/ctr/sd15_512_sce_ctr_segmentation_Accusyn_Blur.yaml --num_samples 1 --prompt 'Convert to a segmentation map based on the prompt: disk and cup segmentation map' --save_folder /data/twinkle/app/scepter/Newpaper_Accusyn_Blur_1 --image_size 512 --pretrained_model /data/twinkle/app/scepter/cache/save_data/Newpaper_sd15_512_sce_ctr_Accusyn_Blur/checkpoints/ldm_step-100000.pth
--image /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test/target/335L1.png --control_mode segmentation --task control --seed 2023
The inferred images: The first one is the segmentation guidance image, and the second one is the inferred image

During training, the inference images stored in eval_probe:

yaml文件如下:
ENV:
BACKEND: nccl
SOLVER:
NAME: LatentDiffusionSolver
RESUME_FROM:
LOAD_MODEL_ONLY: True
USE_FSDP: False
SHARDING_STRATEGY:
USE_AMP: True
DTYPE: float16
CHANNELS_LAST: True
MAX_STEPS: 100000
MAX_EPOCHS: -1
NUM_FOLDS: 1
ACCU_STEP: 1
EVAL_INTERVAL: 1000
RESCALE_LR: False
WORK_DIR: ./cache/save_data/Newpaper_sd15_512_sce_ctr_Accusyn_Blur
LOG_FILE: std_log.txt
FILE_SYSTEM:
NAME: "ModelscopeFs"
TEMP_DIR: "./cache/cache_data"
FREEZE:
FREEZE_PART: [ "first_stage_model", "cond_stage_model", "model" ]
TRAIN_PART: [ "control_blocks" ]
MODEL:
NAME: LatentDiffusionSCEControl
PARAMETERIZATION: eps
TIMESTEPS: 1000
MIN_SNR_GAMMA:
ZERO_TERMINAL_SNR: False
PRETRAINED_MODEL: ms://AI-ModelScope/[email protected]
IGNORE_KEYS: [ ]
SCALE_FACTOR: 0.18215
SIZE_FACTOR: 8
DEFAULT_N_PROMPT:
SCHEDULE_ARGS:
"NAME": "scaled_linear"
"BETA_MIN": 0.00085
"BETA_MAX": 0.012
USE_EMA: False
#
DIFFUSION_MODEL:
NAME: DiffusionUNet
IN_CHANNELS: 4
OUT_CHANNELS: 4
MODEL_CHANNELS: 320
NUM_HEADS: 8
NUM_RES_BLOCKS: 2
ATTENTION_RESOLUTIONS: [ 4, 2, 1 ]
CHANNEL_MULT: [ 1, 2, 4, 4 ]
CONV_RESAMPLE: True
DIMS: 2
USE_CHECKPOINT: False
USE_SCALE_SHIFT_NORM: False
RESBLOCK_UPDOWN: False
USE_SPATIAL_TRANSFORMER: True
TRANSFORMER_DEPTH: 1
CONTEXT_DIM: 768
DISABLE_MIDDLE_SELF_ATTN: False
USE_LINEAR_IN_TRANSFORMER: False
PRETRAINED_MODEL:
IGNORE_KEYS: []
#
FIRST_STAGE_MODEL:
NAME: AutoencoderKL
EMBED_DIM: 4
PRETRAINED_MODEL:
IGNORE_KEYS: []
BATCH_SIZE: 4
#
ENCODER:
NAME: Encoder
CH: 128
OUT_CH: 3
NUM_RES_BLOCKS: 2
IN_CHANNELS: 3
ATTN_RESOLUTIONS: [ ]
CH_MULT: [ 1, 2, 4, 4 ]
Z_CHANNELS: 4
DOUBLE_Z: True
DROPOUT: 0.0
RESAMP_WITH_CONV: True
#
DECODER:
NAME: Decoder
CH: 128
OUT_CH: 3
NUM_RES_BLOCKS: 2
IN_CHANNELS: 3
ATTN_RESOLUTIONS: [ ]
CH_MULT: [ 1, 2, 4, 4 ]
Z_CHANNELS: 4
DROPOUT: 0.0
RESAMP_WITH_CONV: True
GIVE_PRE_END: False
TANH_OUT: False
#
TOKENIZER:
NAME: ClipTokenizer
PRETRAINED_PATH: ms://AI-ModelScope/clip-vit-large-patch14
LENGTH: 77
CLEAN: True
#
COND_STAGE_MODEL:
NAME: FrozenCLIPEmbedder
FREEZE: True
LAYER: last
PRETRAINED_MODEL: ms://AI-ModelScope/clip-vit-large-patch14
#
LOSS:
NAME: ReconstructLoss
LOSS_TYPE: l2
#
CONTROL_MODEL:
NAME: CSCTuners
PRE_HINT_IN_CHANNELS: 3
PRE_HINT_OUT_CHANNELS: 256
DENSE_HINT_KERNAL: 3
SCALE: 1.0
SC_TUNER_CFG:
NAME: SCTuner
TUNER_NAME: SCEAdapter
DOWN_RATIO: 1.0
CONTROL_ANNO:
NAME: SegmentationAnnotator
UNET_WEIGHT: /data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth
SEGMENTATION_PATH: /data/twinkle/app/scepter/Accusyn_segmentation/Blur/
SAMPLE_ARGS:
SAMPLER: ddim
SAMPLE_STEPS: 50
SEED: 2023
GUIDE_SCALE: 7.5
GUIDE_RESCALE: 0.5
DISCRETIZATION: trailing
IMAGE_SIZE: [512, 512]
RUN_TRAIN_N: False
OPTIMIZER:
NAME: AdamW
LEARNING_RATE: 0.0001
BETAS: [ 0.9, 0.999 ]
EPS: 1e-8
WEIGHT_DECAY: 1e-2
AMSGRAD: False
TRAIN_DATA:
NAME: ImageTextPairMSDataset
MODE: train
MS_DATASET_NAME: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Train
MS_DATASET_NAMESPACE: ""
MS_DATASET_SPLIT: train
MS_DATASET_SUBNAME: ""
MS_REMAP_KEYS: null
PROMPT_PREFIX: ""
MS_REMAP_PATH: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Train
REPLACE_STYLE: False
PIN_MEMORY: True
BATCH_SIZE: 4
NUM_WORKERS: 4
SAMPLER:
NAME: LoopSampler
TRANSFORMS:
- NAME: LoadImageFromFile
RGB_ORDER: RGB
BACKEND: pillow
- NAME: Resize
SIZE: 512
INTERPOLATION: bilinear
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: CenterCrop
SIZE: 512
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: ToNumpy
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'image_preprocess' ]
- NAME: ImageToTensor
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: Normalize
MEAN: [ 0.5, 0.5, 0.5 ]
STD: [ 0.5, 0.5, 0.5 ]
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: torchvision
- NAME: Rename
INPUT_KEY: [ 'img', 'image_preprocess' ]
OUTPUT_KEY: [ 'image', 'image_preprocess' ]
- NAME: Select
KEYS: [ 'image', 'prompt', 'image_preprocess' ]
META_KEYS: [ 'data_key' ]
EVAL_DATA:
NAME: ImageTextPairMSDataset
MODE: eval
# MS_DATASET_NAME: style_custom_dataset
# MS_DATASET_NAMESPACE: damo
# MS_DATASET_SUBNAME: 3D
# PROMPT_PREFIX: ""
# MS_DATASET_SPLIT: train_short
# MS_REMAP_KEYS: { 'Image:FILE': 'Target:FILE' }
MS_DATASET_NAME: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test
MS_DATASET_NAMESPACE: ""
MS_DATASET_SPLIT: train
MS_DATASET_SUBNAME: ""
MS_REMAP_KEYS: null
MS_REMAP_PATH: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test
PROMPT_PREFIX: ""
REPLACE_STYLE: False
PIN_MEMORY: True
BATCH_SIZE: 10
NUM_WORKERS: 4
TRANSFORMS:
- NAME: LoadImageFromFile
RGB_ORDER: RGB
BACKEND: pillow
- NAME: Resize
SIZE: 512
INTERPOLATION: bilinear
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: CenterCrop
SIZE: 512
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: ToNumpy
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'image_preprocess' ]
- NAME: ImageToTensor
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: pillow
- NAME: Normalize
MEAN: [ 0.5, 0.5, 0.5 ]
STD: [ 0.5, 0.5, 0.5 ]
INPUT_KEY: [ 'img' ]
OUTPUT_KEY: [ 'img' ]
BACKEND: torchvision
- NAME: Rename
INPUT_KEY: [ 'img', 'image_preprocess' ]
OUTPUT_KEY: [ 'image', 'image_preprocess' ]
- NAME: Select
KEYS: [ 'image', 'prompt', 'image_preprocess' ]
META_KEYS: [ 'data_key' ]
TRAIN_HOOKS:
-
NAME: BackwardHook
PRIORITY: 0
-
NAME: LogHook
LOG_INTERVAL: 50
-
NAME: CheckpointHook
INTERVAL: 1000
-
NAME: ProbeDataHook
PROB_INTERVAL: 1000
EVAL_HOOKS:
-
NAME: ProbeDataHook
PROB_INTERVAL: 1000
The segmetation code implemented by myself:
import os
import torch
import numpy as np
from PIL import Image
from skimage import measure
from scepter.modules.annotator.registry import ANNOTATORS
from scepter.modules.annotator.base_annotator import BaseAnnotator
from scepter.modules.utils.config import dict_to_yaml
import sys
sys.path.append('/data/twinkle/anaconda3/envs/scepter/lib/python3.8/site-packages/scepter/modules/annotator/unet_model')
from unet_model.unet_model import UNet
from unet_model import UNet
from mmdet.apis import inference_detector, init_detector
from mmengine import Config
import cv2
import time
import concurrent.futures
i = 0
注册 SegmentationAnnotator 到 ANNOTATORS
@ANNOTATORS.register_class()
class SegmentationAnnotator(BaseAnnotator):
para_dict = {}
def __init__(self, cfg, logger=None):
super().__init__(cfg, logger=logger)
self.unet_weight = cfg.get('UNET_WEIGHT', '/data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth')
self.segmentation_path = cfg.get('SEGMENTATION_PATH', '/data/twinkle/app/scepter/Accusyn_segmentation/test/')
def forward(self, image):
# print(f"进入到隐藏的SegmentationAnnotator")
global i
i += 1
# print(f'第{i}次')
# 确保图像为 numpy 数组
if isinstance(image, Image.Image):
image = np.array(image)
elif isinstance(image, torch.Tensor):
image = image.detach().cpu().numpy()
elif isinstance(image, np.ndarray):
image = image.copy()
else:
raise ValueError(f'Unsupported data type {type(image)}, only supports np.ndarray, torch.Tensor, Pillow Image.')
# print(f"进入SegmentationAnnotator")
# Load and initialize UNet model for segmentation:
with torch.no_grad():
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
unet = UNet(n_channels=3, n_classes=3).to(device)
unet.load_state_dict(torch.load(os.path.join(self.unet_weight), map_location=device))
# unet.load_state_dict(torch.load('/data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth', map_location=device))
unet.eval()
# 将输入图像保存为图片
input_image = Image.fromarray(image.astype(np.uint8)) # 转换为PIL Image
basepath = self.segmentation_path
# 创建输出文件夹,如果不存在
# print(os.path.join(basepath, "input"))
if not os.path.exists(os.path.join(basepath, "input")):
os.makedirs(os.path.join(basepath, "input"))
input_image.save(os.path.join(basepath, "input/input_image.png")) # 保存输入图像
img = cv2.imread(os.path.join(basepath, "input/input_image.png"))
# 转为tensor
img_tensor = torch.from_numpy(img)
# 将tensor拷贝到device中,只用cpu就是拷贝到cpu中,用cuda就是拷贝到cuda中。
img_tensor = img_tensor.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"), dtype=torch.float32)
img_tensor = (img_tensor / 127.5) - 1.0
# print(f"img_tensor: {img_tensor}")
# print(f"img_tensor_max: {torch.max(img_tensor)}")
# print(f"img_tensor_min: {torch.min(img_tensor)}")
# 预测
# print(f"img_tensor.shape: {img_tensor.shape}")
img_tensor = img_tensor.unsqueeze(0) # 在第0维添加一个批次维度
img_tensor = img_tensor.permute(0, 3, 1, 2) # 转换张量维度
# print(f"img_tensor.shape转换后: {img_tensor.shape}")
pred_unet = unet(img_tensor)
# print(f"pred_unet.shape: {pred_unet.shape}")
# print(f"pred_unet_max: {torch.max(pred_unet)}")
# print(f"pred_unet_min: {torch.min(pred_unet)}")
# print(pred_unet)
pred = torch.argmax(pred_unet, dim=1).squeeze(0).cpu().numpy() # 获取每个像素的类别索引
pred_resized = cv2.resize(pred, (512, 512), interpolation=cv2.INTER_NEAREST)
pred_resized = (pred_resized * 255 / 2).astype(np.uint8)
# print(f"pred_resized.shape: {pred_resized.shape}")
# # 打印每个像素的类别分布(用于调试)
# unique_classes, counts = np.unique(pred_resized, return_counts=True)
# for cls, count in zip(unique_classes, counts):
# print(f"Class {cls}: {count} pixels")
# # 打印每个像素的类别分布(用于调试)
# unique_classes, counts = np.unique(pred, return_counts=True)
# for cls, count in zip(unique_classes, counts):
# print(f"Class {cls}: {count} pixels")
# 创建颜色映射
color_map = {
0: [0, 0, 0], # 背景 - 黑色
2: [255, 0, 0], # cup - 红色
1: [0, 0, 255] # disk - 蓝色
}
# 遍历输入文件夹中的所有图片
image_array = pred
# print(pred)
# 创建一个新的彩色图像数组
colored_image = np.zeros((512, 512, 3), dtype=np.uint8)
# 将每个像素的值映射到颜色上
for label, color in color_map.items():
colored_image[image_array == label] = color
# print(colored_image.shape)
colored_image_save = Image.fromarray(colored_image) # 将 numpy 数组转换为图片
if not os.path.exists(os.path.join(basepath, "output")):
os.makedirs(os.path.join(basepath, "output"))
colored_image_save.save(os.path.join(basepath, "output/output_image.png")) # 保存图片并随着i增加
# print(f"结束")
return colored_image
def save_result(self, result, save_path):
# 确保 result 是三维的 (H, W, C) 格式
if result.shape != (512, 512, 3):
raise ValueError(f"Expected result shape (512, 512, 3), but got {result.shape}")
# 使用 PIL 将结果保存为图片
image = Image.fromarray(result.astype(np.uint8)) # 将 numpy 数组转换为图片
image.save(save_path) # 保存图片
print(f"Result saved to {save_path}")
@staticmethod
def get_config_template():
return dict_to_yaml('ANNOTATORS', __class__.__name__, SegmentationAnnotator.para_dict, set_name=True)
Modifications in utils:
