Skip to content

An error occurred during SCEdit inference #113

@twinkleyang1

Description

@twinkleyang1

Hello author, after training SCEdit, I made an error when inference with my own weights (using the guidance of segmentation) : 1. The inferred image is completely different from the trained image. 2. Even if the input image is different without changing the seed, the result is exactly the same. The inference code is as follows: CUDA_VISIBLE_DEVICES=1 python /data/twinkle/app/scepter/scepter/tools/run_inference.py --cfg /data/twinkle/app/scepter/scepter/methods/scedit/ctr/sd15_512_sce_ctr_segmentation_Accusyn_Blur.yaml --num_samples 1 --prompt 'Convert to a segmentation map based on the prompt: disk and cup segmentation map' --save_folder /data/twinkle/app/scepter/Newpaper_Accusyn_Blur_1 --image_size 512 --pretrained_model /data/twinkle/app/scepter/cache/save_data/Newpaper_sd15_512_sce_ctr_Accusyn_Blur/checkpoints/ldm_step-100000.pth
--image /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test/target/335L1.png --control_mode segmentation --task control --seed 2023

The inferred images: The first one is the segmentation guidance image, and the second one is the inferred image
Image

During training, the inference images stored in eval_probe:
Image

yaml文件如下:
ENV:
BACKEND: nccl
SOLVER:
NAME: LatentDiffusionSolver
RESUME_FROM:
LOAD_MODEL_ONLY: True
USE_FSDP: False
SHARDING_STRATEGY:
USE_AMP: True
DTYPE: float16
CHANNELS_LAST: True
MAX_STEPS: 100000
MAX_EPOCHS: -1
NUM_FOLDS: 1
ACCU_STEP: 1
EVAL_INTERVAL: 1000
RESCALE_LR: False

WORK_DIR: ./cache/save_data/Newpaper_sd15_512_sce_ctr_Accusyn_Blur
LOG_FILE: std_log.txt

FILE_SYSTEM:
NAME: "ModelscopeFs"
TEMP_DIR: "./cache/cache_data"

FREEZE:
FREEZE_PART: [ "first_stage_model", "cond_stage_model", "model" ]
TRAIN_PART: [ "control_blocks" ]

MODEL:
NAME: LatentDiffusionSCEControl
PARAMETERIZATION: eps
TIMESTEPS: 1000
MIN_SNR_GAMMA:
ZERO_TERMINAL_SNR: False
PRETRAINED_MODEL: ms://AI-ModelScope/[email protected]
IGNORE_KEYS: [ ]
SCALE_FACTOR: 0.18215
SIZE_FACTOR: 8
DEFAULT_N_PROMPT:
SCHEDULE_ARGS:
"NAME": "scaled_linear"
"BETA_MIN": 0.00085
"BETA_MAX": 0.012
USE_EMA: False
#
DIFFUSION_MODEL:
NAME: DiffusionUNet
IN_CHANNELS: 4
OUT_CHANNELS: 4
MODEL_CHANNELS: 320
NUM_HEADS: 8
NUM_RES_BLOCKS: 2
ATTENTION_RESOLUTIONS: [ 4, 2, 1 ]
CHANNEL_MULT: [ 1, 2, 4, 4 ]
CONV_RESAMPLE: True
DIMS: 2
USE_CHECKPOINT: False
USE_SCALE_SHIFT_NORM: False
RESBLOCK_UPDOWN: False
USE_SPATIAL_TRANSFORMER: True
TRANSFORMER_DEPTH: 1
CONTEXT_DIM: 768
DISABLE_MIDDLE_SELF_ATTN: False
USE_LINEAR_IN_TRANSFORMER: False
PRETRAINED_MODEL:
IGNORE_KEYS: []
#
FIRST_STAGE_MODEL:
NAME: AutoencoderKL
EMBED_DIM: 4
PRETRAINED_MODEL:
IGNORE_KEYS: []
BATCH_SIZE: 4
#
ENCODER:
NAME: Encoder
CH: 128
OUT_CH: 3
NUM_RES_BLOCKS: 2
IN_CHANNELS: 3
ATTN_RESOLUTIONS: [ ]
CH_MULT: [ 1, 2, 4, 4 ]
Z_CHANNELS: 4
DOUBLE_Z: True
DROPOUT: 0.0
RESAMP_WITH_CONV: True
#
DECODER:
NAME: Decoder
CH: 128
OUT_CH: 3
NUM_RES_BLOCKS: 2
IN_CHANNELS: 3
ATTN_RESOLUTIONS: [ ]
CH_MULT: [ 1, 2, 4, 4 ]
Z_CHANNELS: 4
DROPOUT: 0.0
RESAMP_WITH_CONV: True
GIVE_PRE_END: False
TANH_OUT: False
#
TOKENIZER:
NAME: ClipTokenizer
PRETRAINED_PATH: ms://AI-ModelScope/clip-vit-large-patch14
LENGTH: 77
CLEAN: True
#
COND_STAGE_MODEL:
NAME: FrozenCLIPEmbedder
FREEZE: True
LAYER: last
PRETRAINED_MODEL: ms://AI-ModelScope/clip-vit-large-patch14
#
LOSS:
NAME: ReconstructLoss
LOSS_TYPE: l2
#
CONTROL_MODEL:
NAME: CSCTuners
PRE_HINT_IN_CHANNELS: 3
PRE_HINT_OUT_CHANNELS: 256
DENSE_HINT_KERNAL: 3
SCALE: 1.0
SC_TUNER_CFG:
NAME: SCTuner
TUNER_NAME: SCEAdapter
DOWN_RATIO: 1.0
CONTROL_ANNO:
NAME: SegmentationAnnotator
UNET_WEIGHT: /data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth
SEGMENTATION_PATH: /data/twinkle/app/scepter/Accusyn_segmentation/Blur/

SAMPLE_ARGS:
SAMPLER: ddim
SAMPLE_STEPS: 50
SEED: 2023
GUIDE_SCALE: 7.5
GUIDE_RESCALE: 0.5
DISCRETIZATION: trailing
IMAGE_SIZE: [512, 512]
RUN_TRAIN_N: False

OPTIMIZER:
NAME: AdamW
LEARNING_RATE: 0.0001
BETAS: [ 0.9, 0.999 ]
EPS: 1e-8
WEIGHT_DECAY: 1e-2
AMSGRAD: False

TRAIN_DATA:
NAME: ImageTextPairMSDataset
MODE: train
MS_DATASET_NAME: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Train
MS_DATASET_NAMESPACE: ""
MS_DATASET_SPLIT: train
MS_DATASET_SUBNAME: ""
MS_REMAP_KEYS: null
PROMPT_PREFIX: ""
MS_REMAP_PATH: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Train

REPLACE_STYLE: False
PIN_MEMORY: True
BATCH_SIZE: 4
NUM_WORKERS: 4
SAMPLER:
  NAME: LoopSampler
TRANSFORMS:
  - NAME: LoadImageFromFile
    RGB_ORDER: RGB
    BACKEND: pillow
  - NAME: Resize
    SIZE: 512
    INTERPOLATION: bilinear
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: CenterCrop
    SIZE: 512
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: ToNumpy
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'image_preprocess' ]
  - NAME: ImageToTensor
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: Normalize
    MEAN: [ 0.5,  0.5,  0.5 ]
    STD: [ 0.5,  0.5,  0.5 ]
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: torchvision
  - NAME: Rename
    INPUT_KEY: [ 'img', 'image_preprocess' ]
    OUTPUT_KEY: [ 'image', 'image_preprocess' ]
  - NAME: Select
    KEYS: [ 'image', 'prompt', 'image_preprocess' ]
    META_KEYS: [ 'data_key' ]

EVAL_DATA:
NAME: ImageTextPairMSDataset
MODE: eval
# MS_DATASET_NAME: style_custom_dataset
# MS_DATASET_NAMESPACE: damo
# MS_DATASET_SUBNAME: 3D
# PROMPT_PREFIX: ""
# MS_DATASET_SPLIT: train_short
# MS_REMAP_KEYS: { 'Image:FILE': 'Target:FILE' }

MS_DATASET_NAME: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test
MS_DATASET_NAMESPACE: ""
MS_DATASET_SPLIT: train  
MS_DATASET_SUBNAME: ""
MS_REMAP_KEYS: null
MS_REMAP_PATH: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test
PROMPT_PREFIX: ""

REPLACE_STYLE: False
PIN_MEMORY: True
BATCH_SIZE: 10
NUM_WORKERS: 4
TRANSFORMS:
  - NAME: LoadImageFromFile
    RGB_ORDER: RGB
    BACKEND: pillow
  - NAME: Resize
    SIZE: 512
    INTERPOLATION: bilinear
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: CenterCrop
    SIZE: 512
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: ToNumpy
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'image_preprocess' ]
  - NAME: ImageToTensor
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: pillow
  - NAME: Normalize
    MEAN: [ 0.5,  0.5,  0.5 ]
    STD: [ 0.5,  0.5,  0.5 ]
    INPUT_KEY: [ 'img' ]
    OUTPUT_KEY: [ 'img' ]
    BACKEND: torchvision
  - NAME: Rename
    INPUT_KEY: [ 'img', 'image_preprocess' ]
    OUTPUT_KEY: [ 'image', 'image_preprocess' ]
  - NAME: Select
    KEYS: [ 'image', 'prompt', 'image_preprocess' ]
    META_KEYS: [ 'data_key' ]

TRAIN_HOOKS:
-
NAME: BackwardHook
PRIORITY: 0
-
NAME: LogHook
LOG_INTERVAL: 50
-
NAME: CheckpointHook
INTERVAL: 1000
-
NAME: ProbeDataHook
PROB_INTERVAL: 1000

EVAL_HOOKS:
-
NAME: ProbeDataHook
PROB_INTERVAL: 1000

The segmetation code implemented by myself:
import os
import torch
import numpy as np
from PIL import Image
from skimage import measure
from scepter.modules.annotator.registry import ANNOTATORS
from scepter.modules.annotator.base_annotator import BaseAnnotator
from scepter.modules.utils.config import dict_to_yaml
import sys
sys.path.append('/data/twinkle/anaconda3/envs/scepter/lib/python3.8/site-packages/scepter/modules/annotator/unet_model')

from unet_model.unet_model import UNet

from unet_model import UNet

from mmdet.apis import inference_detector, init_detector
from mmengine import Config
import cv2
import time
import concurrent.futures

i = 0
注册 SegmentationAnnotator 到 ANNOTATORS
@ANNOTATORS.register_class()
class SegmentationAnnotator(BaseAnnotator):
para_dict = {}

def __init__(self, cfg, logger=None):
    super().__init__(cfg, logger=logger)
    self.unet_weight = cfg.get('UNET_WEIGHT', '/data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth')
    self.segmentation_path = cfg.get('SEGMENTATION_PATH', '/data/twinkle/app/scepter/Accusyn_segmentation/test/')
   

def forward(self, image):
    # print(f"进入到隐藏的SegmentationAnnotator")

    global i
    i += 1
    # print(f'第{i}次')
    # 确保图像为 numpy 数组
    if isinstance(image, Image.Image):
        image = np.array(image)
    elif isinstance(image, torch.Tensor):
        image = image.detach().cpu().numpy()
    elif isinstance(image, np.ndarray):
        image = image.copy()
    else:
        raise ValueError(f'Unsupported data type {type(image)}, only supports np.ndarray, torch.Tensor, Pillow Image.')
    
    # print(f"进入SegmentationAnnotator")
    # Load and initialize UNet model for segmentation:
    with torch.no_grad():
        device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
        unet = UNet(n_channels=3, n_classes=3).to(device)
        unet.load_state_dict(torch.load(os.path.join(self.unet_weight), map_location=device))
        # unet.load_state_dict(torch.load('/data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth', map_location=device))
        unet.eval()

     # 将输入图像保存为图片
    input_image = Image.fromarray(image.astype(np.uint8))  # 转换为PIL Image

    basepath = self.segmentation_path

    # 创建输出文件夹,如果不存在
    # print(os.path.join(basepath, "input"))
    if not os.path.exists(os.path.join(basepath, "input")):
        os.makedirs(os.path.join(basepath, "input"))
    input_image.save(os.path.join(basepath, "input/input_image.png"))  # 保存输入图像

    img = cv2.imread(os.path.join(basepath, "input/input_image.png"))
    # 转为tensor
    img_tensor = torch.from_numpy(img)
    # 将tensor拷贝到device中,只用cpu就是拷贝到cpu中,用cuda就是拷贝到cuda中。
    img_tensor = img_tensor.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"), dtype=torch.float32)
    img_tensor = (img_tensor / 127.5) - 1.0
    # print(f"img_tensor: {img_tensor}")
    # print(f"img_tensor_max: {torch.max(img_tensor)}")
    # print(f"img_tensor_min: {torch.min(img_tensor)}")
    # 预测
    # print(f"img_tensor.shape: {img_tensor.shape}")
    img_tensor = img_tensor.unsqueeze(0)  # 在第0维添加一个批次维度
    img_tensor = img_tensor.permute(0, 3, 1, 2)  # 转换张量维度
    # print(f"img_tensor.shape转换后: {img_tensor.shape}")
    pred_unet = unet(img_tensor)
    # print(f"pred_unet.shape: {pred_unet.shape}")
    # print(f"pred_unet_max: {torch.max(pred_unet)}")
    # print(f"pred_unet_min: {torch.min(pred_unet)}")
    # print(pred_unet)
    pred = torch.argmax(pred_unet, dim=1).squeeze(0).cpu().numpy()  # 获取每个像素的类别索引

    pred_resized = cv2.resize(pred, (512, 512), interpolation=cv2.INTER_NEAREST)
    pred_resized = (pred_resized * 255 / 2).astype(np.uint8)
    # print(f"pred_resized.shape: {pred_resized.shape}")

    # # 打印每个像素的类别分布(用于调试)
    # unique_classes, counts = np.unique(pred_resized, return_counts=True)
    # for cls, count in zip(unique_classes, counts):
    #     print(f"Class {cls}: {count} pixels")

    # # 打印每个像素的类别分布(用于调试)
    # unique_classes, counts = np.unique(pred, return_counts=True)
    # for cls, count in zip(unique_classes, counts):
    #     print(f"Class {cls}: {count} pixels")
    
    # 创建颜色映射
    color_map = {
        0: [0, 0, 0],      # 背景 - 黑色
        2: [255, 0, 0],    # cup - 红色
        1: [0, 0, 255]     # disk - 蓝色
    }

    # 遍历输入文件夹中的所有图片

    image_array = pred
    # print(pred)

    # 创建一个新的彩色图像数组
    colored_image = np.zeros((512, 512, 3), dtype=np.uint8)

    # 将每个像素的值映射到颜色上
    for label, color in color_map.items():
        colored_image[image_array == label] = color

    # print(colored_image.shape)
    colored_image_save = Image.fromarray(colored_image)  # 将 numpy 数组转换为图片
    if not os.path.exists(os.path.join(basepath, "output")):
        os.makedirs(os.path.join(basepath, "output"))
    colored_image_save.save(os.path.join(basepath, "output/output_image.png"))  # 保存图片并随着i增加

    # print(f"结束")
    return colored_image

def save_result(self, result, save_path):
    # 确保 result 是三维的 (H, W, C) 格式
    if result.shape != (512, 512, 3):
        raise ValueError(f"Expected result shape (512, 512, 3), but got {result.shape}")

    # 使用 PIL 将结果保存为图片
    image = Image.fromarray(result.astype(np.uint8))  # 将 numpy 数组转换为图片
    image.save(save_path)  # 保存图片
    print(f"Result saved to {save_path}")

@staticmethod
def get_config_template():
    return dict_to_yaml('ANNOTATORS', __class__.__name__, SegmentationAnnotator.para_dict, set_name=True)

Modifications in utils:

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions