Case #4: Using Stable Diffusion XL and ComfyUI for Synthetic Dataset Augmentation

Overview

Synthetic Datasets require a lot of variations and it traditionally takes a lot of visual expertise and time to generate them. Generative augmentations make it possible to create massive amounts of variations via prompts and close the appearance domain gap, effortlessly creating photorealistic results.

This example demonstrates a basic workflow for modifying an existing synthetic dataset using ComfyUI and a generative diffusion model. We use techniques that are suited to robust synthetic data generation such as ‘regional prompting’ and ‘inpainting’. This prepares you for the creation of your own augmentation pipeline.

Preparing a Synthetic Dataset

We provide a sample dataset of images for the workflow below. If you’d like to make your own dataset, this Replicator script demonstrates a custom writer that will generate stable semantic color IDs, along with a direct output of depthmap images for use within the ComfyUI Graph.

The outputs of the Replicator script from your own scene should closely match the dataset the images below. These four image outputs, RGB, Semantic Segmentation, Normals and Depth will all be used as inputs in the ComfyUI workflow.

../_images/case4_1.png

One key part of this workflow is the use of 3D assets. If you are missing assets in your library, you can create them with Generative AI. NVIDIA has made a NIM available to partners - Edify3D - to do so. As an enterprise, you can simply contact one of the companies that have built model using NVIDIA Edify 3D to augment your dataset. With Edify3D powered NIMs you can quickly make simple objects that you might not have in your asset library. These assets can then be treated almost like a 3D control net where we can use these assets as a base for augmentation and not as the final 3D asset itself. These generated 3D assets can help provide the necessary 3D rough data and mask conditioning for the models to generate photoreal results in their place.

Requirements and Installation

Before you get started, review the following requirements, then follow the ComfyUI installation instructions below.

  1. Check that your hardware matches the Technical Requirements for the Omniverse Platform.

  2. Install ComfyUI by following the installation instructions on their GitHub page.

    1. Windows - It is easiest to install the portable standalone build for Windows. However, a manual install will also work.

    2. Linux - Use the manual install method.

  3. Install ComfyUI-Manager by following the installation instructions on their GitHub page. Pay special attention to the different methods.

    1. Windows with the portable version of ComfyUI should choose method 2.

    2. Windows or Linux with a manual install should choose method 1.

  4. Download and install the required models.

    1. Download the SDXL base model file ‘sd_xl_base_1.0.safetensors’. Move the file to the ComfyUI installation folder under ComfyUI\models\checkpoints.

    2. Two ControlNet models should be downloaded.

      1. The ControlNet Union model ‘diffusion_pytorch_model.safetensors’.

      2. The Stability Control Lora Canny ‘control-lora-canny-rank128.safetensors’.

    Move both files to the ComfyUI installation folder under ComfyUI\models\controlnet.

  5. Download the example files provided for this workflow from the synthetic-data-examples on GitHub.

    1. A small sample dataset of images to be augmented by the ComfyUI graph.

    2. The augmentation ComfyUI graph.

Set Up and Open the ComfyUI Augmentation Graph Example

ComfyUI is a powerful backend and GUI for building and executing pipelines using diffusion models and more.

  1. Run ComfyUI.

    1. Windows Portable - Run ‘ComfyUI_windows_portable\run_nvidia_gpu.bat’.

    2. Windows and Linux Manual Install - Using the command line, run ComfyUI\python main.py.

  2. The ComfyUI server process will run in a command line window. There you can view what ComfyUI is doing if you need to check for errors, warnings, or to see progress of image generations.

    1. When fully loaded, it will open a window in your internet browser with the local address http://127.0.0.1:8188/. This is connecting the visual frontend to the ComfyUI server now running in the background.

    2. Next, we want to “load” the file ‘sdg_SDXL_augmentation_example.json’ that we downloaded earlier from the example links. Click the Load button on the ComfyUI menu. Navigate to the location of the downloaded .json graph and open it.

      ../_images/case4_5.png
    3. An error window will display and show missing dependencies, along with red “missing” nodes behind it. This is expected.

      ../_images/error_window.png

      We can fix this through the following steps:

      1. Click the Manager button at the bottom of the ComfyUI menu.

        ../_images/manager_button.png


        Note

        If you do not see a Manager button, see the ComfyUI-Manager instructions in the “Requirements and Installation” section of this page.

      2. Select Install Missing Custom Nodes.

        ../_images/case4_6.png
      3. Select Install for each of the listed node dependencies. Alternatively, you can select the checkbox next to ID (upper left) to select them all and then select Install. These may take several minutes to install, depending on your internet speed and hardware. You can check the progress of these downloads in ComfyUI’s command line window.

      ../_images/case4_7.png
      1. Once all have been installed, Click the red Restart button at the bottom. This will restart the comfyUI local server to enable the newly added dependencies. Note that you do not need to close the browser window, but a Reconnecting… dialog will be displayed until the server has been restarted.

      Note

      This first restart may take several minutes, downloading more dependencies as needed. You can always view the progress in the command line window.

      After installing the dependencies, your graph should no longer display any bright red missing nodes.

    4. At this stage your ComfyUI graph should look similar to the following image. Next, we will assign the models and image references in the marked sections A, B and C.

      ../_images/comfyui_graph.png
    5. In section A, assign the sd_xl_base_1.0.safetensors model in the Load Checkpoint node. Directly click on the highlighted section, and select the model from the pop-up list. This must be done to refresh the assignment of the model: Even though it may show correctly, if not reassigned here, queueing a prompt later will error.

      ../_images/assign_model.png
    6. In section B, assign the diffusion_pytorch_model.safetensors and control-lora-canny-rank128.safetensors models to each of the Load Advanced ControlNet Model nodes. As before, directly click on the highlighted sections and select the model from the popup list. This must be done to refresh the assignment of the model: Even though it may show correctly, if not reassigned here, queueing a prompt later will error.

      ../_images/section_b.png
    7. In section C assign the sample dataset images to the four nodes.

      ../_images/dataset_images.png
    8. Click on Choose file to upload on each of the nodes, and select the images listed below. These images are found in the small sample dataset downloaded previously in the Requirements and Installation section above.

      • ‘Semantic Segmentation Image’ Node

        ‘dataset/semantic_2.png’

      • ‘Depth Image’ Node

        ‘‘dataset/depth_2.png’

      • ‘Normals Image’ Node

        ‘‘dataset/normals_2.png’

      • ‘RGB Image’ Node

        ‘‘dataset/rgb_2.png’

    9. In the ComfyUI menu, click the Queue Prompt button, featured prominently at the top. A green progress bar will display and update at the top of the browser window. When complete, the generated image will be displayed in the Save Image node at the farthest right edge of the graph.

      1. Saved images are written by default to the ComfyUI installation ComfyUI\output folder.

    10. Troubleshooting If you encounter an error similar to Failed to find  .../comfyui_controlnet_aux/ckpts/lllyasviel/Annotators/ControlNetHED.pth in the comfyUI command line, its likely a Windows path length issue. Try reinstalling ComfyUI in a location where the path is shorter.

Augmentation Graph Breakdown

The ComfyUI graph itself is a developer tool for building and iterating on pipelines. Another way to think about it is ‘programming with models’. Below is an image of the example graph and the different sections and their purpose.

../_images/aug_graph.png

Overall, the graph uses ‘regional prompting’ with the masks from the semantic segmentation image. This allows us to describe the visual properties of the different parts of the image.

Let’s explore the different parts of the graph for a more complete understanding of how it works.

  1. Load Checkpoint

    • This is the node that loads the base sdxl model.

  2. Regional Prompting

    • The ‘Conditioning (SetMask)’ nodes take a prompt and a mask to determine the conditioning of the different parts of the image. These nodes and prompts form the heart of the graph and are the most important piece influencing the output generations.

    • For example, we can change the prompt so that the generated forklift is yellow, or red, rusty, old, new etc

      ../_images/case4_12.png
  3. Load Images

    • These nodes load the Dataset images previously created using Omniverse and Replicator.

  4. Masks, ControlNet Images, Denoising Weightmap

    • In this section, we create masks from the segmentation image for each of the regions we want to inpaint/outpaint.

    • We also create an outline image using the ‘HED’ preprocessor to be used on the ControlNet in section 5.

    • Lastly, we create a mask to inform the inpainting how much denoising should be applied to different sections of the image. We do this, because sometimes we only wish to lightly denoise some objects, changing them slightly, instead of a full “reimagining” with a denoise of 100% (or white in the mask).

  5. ControlNets

    • The controlnet models are used to constrain and inform guidance for the generation of the outputs. These are critical for the control of the structure of the outputs. Without these, the model would be free to imagine anything loosely based on the prompts, with no adherence to the exact placement of the objects in the scene.

  6. KSampler

    • This is the node that does the work in the graph. It iteratively denoises the image over many steps until it completes.

Alter Properties of the Graph

Many properties of this graph can be altered. To name a few:

  1. Prompts

    • Change the local and global prompts to create variation in the generated output images.

  2. KSampler

    • control_after_generate - Ensure this is random if you want different generations each time

    • steps - 20-30 is a good range

    • cfg - 2-4

    • sampler_name - Try different ones. Some are slower but have higher quality.

    • scheduler - Similar to the sampler, different scheduling can have different effects on outputs.

  3. ControlNets

    • Try adjusting the strength or start_percent to see how they guide the image.

    • You could also try different ControlNet types, such as normals.

  4. Denoise Strength

    • The denoise strength for each of the mask classes can be adjusted. For example, a low denoise strength on the forklift will lightly alter the forklift in the image. A high denoise of ‘1.0’ will completely reimagine the forklift according to the provided prompt for that class.

      ../_images/forklift_denoise.png

Augmentation Results and Analysis

Shown below, is an example augmentation from the sample dataset with the SDXL base model.

../_images/case4_14.png

The quality of image outputs can be considerably improved through the use of fine-tuned base models, or other techniques such a LoRAs. Shown below are some examples using more advanced models and techniques.

../_images/case4_15.png


Next Steps: Review the Known Issues