An error occurred during SCEdit inference

Hello author, after training SCEdit, I made an error when inference with my own weights (using the guidance of segmentation) : 1. The inferred image is completely different from the trained image. 2. Even if the input image is different without changing the seed, the result is exactly the same.  The inference code is as follows: CUDA_VISIBLE_DEVICES=1 python /data/twinkle/app/scepter/scepter/tools/run_inference.py   --cfg /data/twinkle/app/scepter/scepter/methods/scedit/ctr/sd15_512_sce_ctr_segmentation_Accusyn_Blur.yaml   --num_samples 1    --prompt 'Convert to a segmentation map based on the prompt: disk and cup segmentation map'    --save_folder   /data/twinkle/app/scepter/Newpaper_Accusyn_Blur_1     --image_size 512     --pretrained_model /data/twinkle/app/scepter/cache/save_data/Newpaper_sd15_512_sce_ctr_Accusyn_Blur/checkpoints/ldm_step-100000.pth
  --image   /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test/target/335L1.png    --control_mode  segmentation    --task  control   --seed  2023

The inferred images: The first one is the segmentation guidance image, and the second one is the inferred image
![Image](https://github.com/user-attachments/assets/676e2622-5be7-4a5f-a0fa-e8a753b3987e)

During training, the inference images stored in eval_probe：
![Image](https://github.com/user-attachments/assets/c6217b34-ae7a-485a-9ee5-0060f69a1803)

yaml文件如下：
ENV:
  BACKEND: nccl
SOLVER:
  NAME: LatentDiffusionSolver
  RESUME_FROM:
  LOAD_MODEL_ONLY: True
  USE_FSDP: False
  SHARDING_STRATEGY:
  USE_AMP: True
  DTYPE: float16
  CHANNELS_LAST: True
  MAX_STEPS: 100000
  MAX_EPOCHS: -1
  NUM_FOLDS: 1
  ACCU_STEP: 1
  EVAL_INTERVAL: 1000
  RESCALE_LR: False
  #
  WORK_DIR: ./cache/save_data/Newpaper_sd15_512_sce_ctr_Accusyn_Blur
  LOG_FILE: std_log.txt
  #
  FILE_SYSTEM:
    NAME: "ModelscopeFs"
    TEMP_DIR: "./cache/cache_data"
  #
  FREEZE:
    FREEZE_PART: [ "first_stage_model", "cond_stage_model", "model" ]
    TRAIN_PART: [ "control_blocks" ]
  #
  MODEL:
    NAME: LatentDiffusionSCEControl
    PARAMETERIZATION: eps
    TIMESTEPS: 1000
    MIN_SNR_GAMMA:
    ZERO_TERMINAL_SNR: False
    PRETRAINED_MODEL: ms://AI-ModelScope/stable-diffusion-v1-5@v1-5-pruned-emaonly.safetensors
    IGNORE_KEYS: [ ]
    SCALE_FACTOR: 0.18215
    SIZE_FACTOR: 8
    DEFAULT_N_PROMPT:
    SCHEDULE_ARGS:
      "NAME": "scaled_linear"
      "BETA_MIN": 0.00085
      "BETA_MAX": 0.012
    USE_EMA: False
    #
    DIFFUSION_MODEL:
      NAME: DiffusionUNet
      IN_CHANNELS: 4
      OUT_CHANNELS: 4
      MODEL_CHANNELS: 320
      NUM_HEADS: 8
      NUM_RES_BLOCKS: 2
      ATTENTION_RESOLUTIONS: [ 4, 2, 1 ]
      CHANNEL_MULT: [ 1, 2, 4, 4 ]
      CONV_RESAMPLE: True
      DIMS: 2
      USE_CHECKPOINT: False
      USE_SCALE_SHIFT_NORM: False
      RESBLOCK_UPDOWN: False
      USE_SPATIAL_TRANSFORMER: True
      TRANSFORMER_DEPTH: 1
      CONTEXT_DIM: 768
      DISABLE_MIDDLE_SELF_ATTN: False
      USE_LINEAR_IN_TRANSFORMER: False
      PRETRAINED_MODEL:
      IGNORE_KEYS: []
    #
    FIRST_STAGE_MODEL:
      NAME: AutoencoderKL
      EMBED_DIM: 4
      PRETRAINED_MODEL:
      IGNORE_KEYS: []
      BATCH_SIZE: 4
      #
      ENCODER:
        NAME: Encoder
        CH: 128
        OUT_CH: 3
        NUM_RES_BLOCKS: 2
        IN_CHANNELS: 3
        ATTN_RESOLUTIONS: [ ]
        CH_MULT: [ 1, 2, 4, 4 ]
        Z_CHANNELS: 4
        DOUBLE_Z: True
        DROPOUT: 0.0
        RESAMP_WITH_CONV: True
      #
      DECODER:
        NAME: Decoder
        CH: 128
        OUT_CH: 3
        NUM_RES_BLOCKS: 2
        IN_CHANNELS: 3
        ATTN_RESOLUTIONS: [ ]
        CH_MULT: [ 1, 2, 4, 4 ]
        Z_CHANNELS: 4
        DROPOUT: 0.0
        RESAMP_WITH_CONV: True
        GIVE_PRE_END: False
        TANH_OUT: False
    #
    TOKENIZER:
      NAME: ClipTokenizer
      PRETRAINED_PATH: ms://AI-ModelScope/clip-vit-large-patch14
      LENGTH: 77
      CLEAN: True
    #
    COND_STAGE_MODEL:
      NAME: FrozenCLIPEmbedder
      FREEZE: True
      LAYER: last
      PRETRAINED_MODEL: ms://AI-ModelScope/clip-vit-large-patch14
    #
    LOSS:
      NAME: ReconstructLoss
      LOSS_TYPE: l2
    #
    CONTROL_MODEL:
      NAME: CSCTuners
      PRE_HINT_IN_CHANNELS: 3
      PRE_HINT_OUT_CHANNELS: 256
      DENSE_HINT_KERNAL: 3
      SCALE: 1.0
      SC_TUNER_CFG:
        NAME: SCTuner
        TUNER_NAME: SCEAdapter
        DOWN_RATIO: 1.0
    CONTROL_ANNO:
      NAME: SegmentationAnnotator
      UNET_WEIGHT: /data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth
      SEGMENTATION_PATH: /data/twinkle/app/scepter/Accusyn_segmentation/Blur/

      
  #
  SAMPLE_ARGS:
    SAMPLER: ddim
    SAMPLE_STEPS: 50
    SEED: 2023
    GUIDE_SCALE: 7.5
    GUIDE_RESCALE: 0.5
    DISCRETIZATION: trailing
    IMAGE_SIZE: [512, 512]
    RUN_TRAIN_N: False
  #
  OPTIMIZER:
    NAME: AdamW
    LEARNING_RATE: 0.0001
    BETAS: [ 0.9, 0.999 ]
    EPS: 1e-8
    WEIGHT_DECAY: 1e-2
    AMSGRAD: False
  #
  TRAIN_DATA:
    NAME: ImageTextPairMSDataset
    MODE: train
    MS_DATASET_NAME: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Train
    MS_DATASET_NAMESPACE: ""
    MS_DATASET_SPLIT: train
    MS_DATASET_SUBNAME: ""
    MS_REMAP_KEYS: null
    PROMPT_PREFIX: ""
    MS_REMAP_PATH: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Train  

    REPLACE_STYLE: False
    PIN_MEMORY: True
    BATCH_SIZE: 4
    NUM_WORKERS: 4
    SAMPLER:
      NAME: LoopSampler
    TRANSFORMS:
      - NAME: LoadImageFromFile
        RGB_ORDER: RGB
        BACKEND: pillow
      - NAME: Resize
        SIZE: 512
        INTERPOLATION: bilinear
        INPUT_KEY: [ 'img' ]
        OUTPUT_KEY: [ 'img' ]
        BACKEND: pillow
      - NAME: CenterCrop
        SIZE: 512
        INPUT_KEY: [ 'img' ]
        OUTPUT_KEY: [ 'img' ]
        BACKEND: pillow
      - NAME: ToNumpy
        INPUT_KEY: [ 'img' ]
        OUTPUT_KEY: [ 'image_preprocess' ]
      - NAME: ImageToTensor
        INPUT_KEY: [ 'img' ]
        OUTPUT_KEY: [ 'img' ]
        BACKEND: pillow
      - NAME: Normalize
        MEAN: [ 0.5,  0.5,  0.5 ]
        STD: [ 0.5,  0.5,  0.5 ]
        INPUT_KEY: [ 'img' ]
        OUTPUT_KEY: [ 'img' ]
        BACKEND: torchvision
      - NAME: Rename
        INPUT_KEY: [ 'img', 'image_preprocess' ]
        OUTPUT_KEY: [ 'image', 'image_preprocess' ]
      - NAME: Select
        KEYS: [ 'image', 'prompt', 'image_preprocess' ]
        META_KEYS: [ 'data_key' ]
  #
  EVAL_DATA:
    NAME: ImageTextPairMSDataset
    MODE: eval
    # MS_DATASET_NAME: style_custom_dataset
    # MS_DATASET_NAMESPACE: damo
    # MS_DATASET_SUBNAME: 3D
    # PROMPT_PREFIX: ""
    # MS_DATASET_SPLIT: train_short
    # MS_REMAP_KEYS: { 'Image:FILE': 'Target:FILE' }

    MS_DATASET_NAME: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test
    MS_DATASET_NAMESPACE: ""
    MS_DATASET_SPLIT: train  
    MS_DATASET_SUBNAME: ""
    MS_REMAP_KEYS: null
    MS_REMAP_PATH: /data/twinkle/app/01_paper/Dataset/SCEdit/Blur/Test
    PROMPT_PREFIX: ""

    REPLACE_STYLE: False
    PIN_MEMORY: True
    BATCH_SIZE: 10
    NUM_WORKERS: 4
    TRANSFORMS:
      - NAME: LoadImageFromFile
        RGB_ORDER: RGB
        BACKEND: pillow
      - NAME: Resize
        SIZE: 512
        INTERPOLATION: bilinear
        INPUT_KEY: [ 'img' ]
        OUTPUT_KEY: [ 'img' ]
        BACKEND: pillow
      - NAME: CenterCrop
        SIZE: 512
        INPUT_KEY: [ 'img' ]
        OUTPUT_KEY: [ 'img' ]
        BACKEND: pillow
      - NAME: ToNumpy
        INPUT_KEY: [ 'img' ]
        OUTPUT_KEY: [ 'image_preprocess' ]
      - NAME: ImageToTensor
        INPUT_KEY: [ 'img' ]
        OUTPUT_KEY: [ 'img' ]
        BACKEND: pillow
      - NAME: Normalize
        MEAN: [ 0.5,  0.5,  0.5 ]
        STD: [ 0.5,  0.5,  0.5 ]
        INPUT_KEY: [ 'img' ]
        OUTPUT_KEY: [ 'img' ]
        BACKEND: torchvision
      - NAME: Rename
        INPUT_KEY: [ 'img', 'image_preprocess' ]
        OUTPUT_KEY: [ 'image', 'image_preprocess' ]
      - NAME: Select
        KEYS: [ 'image', 'prompt', 'image_preprocess' ]
        META_KEYS: [ 'data_key' ]
  #
  TRAIN_HOOKS:
    -
      NAME: BackwardHook
      PRIORITY: 0
    -
      NAME: LogHook
      LOG_INTERVAL: 50
    -
      NAME: CheckpointHook
      INTERVAL: 1000
    -
      NAME: ProbeDataHook
      PROB_INTERVAL: 1000
  #
  EVAL_HOOKS:
    -
      NAME: ProbeDataHook
      PROB_INTERVAL: 1000


The segmetation code implemented by myself:
import os
import torch
import numpy as np
from PIL import Image
from skimage import measure
from scepter.modules.annotator.registry import ANNOTATORS
from scepter.modules.annotator.base_annotator import BaseAnnotator
from scepter.modules.utils.config import dict_to_yaml
import sys
sys.path.append('/data/twinkle/anaconda3/envs/scepter/lib/python3.8/site-packages/scepter/modules/annotator/unet_model')

# from unet_model.unet_model import UNet
from unet_model import UNet

from mmdet.apis import inference_detector, init_detector
from mmengine import Config
import cv2
import time
import concurrent.futures

i = 0
注册 SegmentationAnnotator 到 ANNOTATORS
@ANNOTATORS.register_class()
class SegmentationAnnotator(BaseAnnotator):
    para_dict = {}
    
    def __init__(self, cfg, logger=None):
        super().__init__(cfg, logger=logger)
        self.unet_weight = cfg.get('UNET_WEIGHT', '/data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth')
        self.segmentation_path = cfg.get('SEGMENTATION_PATH', '/data/twinkle/app/scepter/Accusyn_segmentation/test/')
       

    def forward(self, image):
        # print(f"进入到隐藏的SegmentationAnnotator")

        global i
        i += 1
        # print(f'第{i}次')
        # 确保图像为 numpy 数组
        if isinstance(image, Image.Image):
            image = np.array(image)
        elif isinstance(image, torch.Tensor):
            image = image.detach().cpu().numpy()
        elif isinstance(image, np.ndarray):
            image = image.copy()
        else:
            raise ValueError(f'Unsupported data type {type(image)}, only supports np.ndarray, torch.Tensor, Pillow Image.')
        
        # print(f"进入SegmentationAnnotator")
        # Load and initialize UNet model for segmentation:
        with torch.no_grad():
            device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
            unet = UNet(n_channels=3, n_classes=3).to(device)
            unet.load_state_dict(torch.load(os.path.join(self.unet_weight), map_location=device))
            # unet.load_state_dict(torch.load('/data/twinkle/app/01_paper/weight/unet/unet_Blur/dataset_Blur.pth', map_location=device))
            unet.eval()

         # 将输入图像保存为图片
        input_image = Image.fromarray(image.astype(np.uint8))  # 转换为PIL Image

        basepath = self.segmentation_path

        # 创建输出文件夹，如果不存在
        # print(os.path.join(basepath, "input"))
        if not os.path.exists(os.path.join(basepath, "input")):
            os.makedirs(os.path.join(basepath, "input"))
        input_image.save(os.path.join(basepath, "input/input_image.png"))  # 保存输入图像

        img = cv2.imread(os.path.join(basepath, "input/input_image.png"))
        # 转为tensor
        img_tensor = torch.from_numpy(img)
        # 将tensor拷贝到device中，只用cpu就是拷贝到cpu中，用cuda就是拷贝到cuda中。
        img_tensor = img_tensor.to(torch.device("cuda:0" if torch.cuda.is_available() else "cpu"), dtype=torch.float32)
        img_tensor = (img_tensor / 127.5) - 1.0
        # print(f"img_tensor: {img_tensor}")
        # print(f"img_tensor_max: {torch.max(img_tensor)}")
        # print(f"img_tensor_min: {torch.min(img_tensor)}")
        # 预测
        # print(f"img_tensor.shape: {img_tensor.shape}")
        img_tensor = img_tensor.unsqueeze(0)  # 在第0维添加一个批次维度
        img_tensor = img_tensor.permute(0, 3, 1, 2)  # 转换张量维度
        # print(f"img_tensor.shape转换后: {img_tensor.shape}")
        pred_unet = unet(img_tensor)
        # print(f"pred_unet.shape: {pred_unet.shape}")
        # print(f"pred_unet_max: {torch.max(pred_unet)}")
        # print(f"pred_unet_min: {torch.min(pred_unet)}")
        # print(pred_unet)
        pred = torch.argmax(pred_unet, dim=1).squeeze(0).cpu().numpy()  # 获取每个像素的类别索引

        pred_resized = cv2.resize(pred, (512, 512), interpolation=cv2.INTER_NEAREST)
        pred_resized = (pred_resized * 255 / 2).astype(np.uint8)
        # print(f"pred_resized.shape: {pred_resized.shape}")

        # # 打印每个像素的类别分布（用于调试）
        # unique_classes, counts = np.unique(pred_resized, return_counts=True)
        # for cls, count in zip(unique_classes, counts):
        #     print(f"Class {cls}: {count} pixels")

        # # 打印每个像素的类别分布（用于调试）
        # unique_classes, counts = np.unique(pred, return_counts=True)
        # for cls, count in zip(unique_classes, counts):
        #     print(f"Class {cls}: {count} pixels")
        
        # 创建颜色映射
        color_map = {
            0: [0, 0, 0],      # 背景 - 黑色
            2: [255, 0, 0],    # cup - 红色
            1: [0, 0, 255]     # disk - 蓝色
        }

        # 遍历输入文件夹中的所有图片
    
        image_array = pred
        # print(pred)

        # 创建一个新的彩色图像数组
        colored_image = np.zeros((512, 512, 3), dtype=np.uint8)

        # 将每个像素的值映射到颜色上
        for label, color in color_map.items():
            colored_image[image_array == label] = color

        # print(colored_image.shape)
        colored_image_save = Image.fromarray(colored_image)  # 将 numpy 数组转换为图片
        if not os.path.exists(os.path.join(basepath, "output")):
            os.makedirs(os.path.join(basepath, "output"))
        colored_image_save.save(os.path.join(basepath, "output/output_image.png"))  # 保存图片并随着i增加

        # print(f"结束")
        return colored_image

    def save_result(self, result, save_path):
        # 确保 result 是三维的 (H, W, C) 格式
        if result.shape != (512, 512, 3):
            raise ValueError(f"Expected result shape (512, 512, 3), but got {result.shape}")

        # 使用 PIL 将结果保存为图片
        image = Image.fromarray(result.astype(np.uint8))  # 将 numpy 数组转换为图片
        image.save(save_path)  # 保存图片
        print(f"Result saved to {save_path}")

    @staticmethod
    def get_config_template():
        return dict_to_yaml('ANNOTATORS', __class__.__name__, SegmentationAnnotator.para_dict, set_name=True)

Modifications in utils：

![Image](https://github.com/user-attachments/assets/e34bd6a4-e8f4-4077-97f2-6ead3ef7050c)

    



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

An error occurred during SCEdit inference #113

from unet_model.unet_model import UNet

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

An error occurred during SCEdit inference #113

Description

from unet_model.unet_model import UNet

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions