Online Generation#

This tutorial provides an example of using Isaac Sim and Replicator for online training of machine learning models by directly feeding the generated training data into the model’s training pipeline from memory, eliminating the need for disk storage.

The example in this tutorial uses ground truth visualizations from Isaac Sim to demonstrate how to set up a PyTorch DataLoader and train Deep Neural Networks (DNNs) online. The generated data stays in memory during the training process, without the need of disk storage. The full example can be executed through the Isaac Sim standalone Python environment.

The tutorial examines the example script section-by-section.Scene generation and ground truth collection are integrated into a PyTorch DataLoader. The DataLoader is then used to train a Mask-RCNN instance segmentation model.

Mesh Converter#

Before you can generate data, you must:

  1. Convert the ShapeNet assets in the database to USD.

  2. Download the ShapeNetCore dataset to a local directory.

  3. Set a variable to tell the script where to find the ShapeNet dataset locally:

export SHAPENET_LOCAL_DIR=<path/to/shapenet>

Convert only the geometry, to allow for quick loading of assets into the scene. With the SHAPENET_LOCAL_DIR variable set, run the following script to create a new directory at {SHAPENET_LOCAL_DIR}_nomat, which is for the geometry-only USD files storage.

./python.sh standalone_examples/api/omni.replicator.isaac/usd_convertor.py --categories plane watercraft rocket --max_models 100

The above command tells the script to convert the plane watercraft rocket categories and to convert a maximum of 100 models per category.

Note

Other category examples include: table, monitor, phone, chair, bowl, bench, plane, car, microwave, piano, pillow, sofa, bottle, bowl.

DataLoader#

To run the example:

  • In Linux, use the following command:

./python.sh standalone_examples/replicator/online_generation/generate_shapenet.py \
--root $SHAPENET_LOCAL_DIR'_nomat' \
--categories plane watercraft rocket \
--max_asset_size 50
  • In Windows, use the following command:

python.bat standalone_examples/replicator/online_generation/generate_shapenet.py --root %SHAPENET_LOCAL_DIR%_nomat --categories plane watercraft rocket --max_asset_size 50

The generate_shapenet.py script generates an endless stream of randomized data with which to train. Below is a visualization of the data that the dataset is producing with the plane watercraft rocket categories selected. Open the _out_gen_imgs/domain_randomization_test_image_*.png locally saved image file to see the visualization.

Instance Segmentation Dataset

The Dataloader Core#

To create a DataLoader, use the PyTorch torch.utils.data.IterableDataset class, which generates an endless stream of random scenes, each with a corresponding ground truth. The basic structure for the dataset is shown below:

class MyAwesomeDataset(torch.utils.data.IterableDataset):
    def __init__(self):
        # Setup the scene, lights, walls, camera, etc.
        setup_scene()
        # Setup replicator randomizer graph
        setup_replicator()

    def __next__(self):
        # Trigger a randomization and a render of the scene
        self.rep.orchestrator.step()
        # Collect groundtruth
        gt = {
            "rgb": self.rgb.get_data(device="cuda"),
            "boundingBox2DTight": self.bbox_2d_tight.get_data(device="cpu"),
            "instanceSegmentation": self.instance_seg.get_data(device="cuda"),
        }
        # [..]
        return image, target

Using the outline, you can assemble your dataset by filling in the __next__ method. You generate and randomize the scene in lines 8-25. The next step is to collect the ground truth, as shown in line 35. The code that follows in lines 39-77 consists of preparing the data for the model to consume. This code is, in large part, specific to the model your are using and your application.

 1def __iter__(self):
 2    return self
 3
 4def __next__(self):
 5    # Step - trigger a randomization and a render
 6    self.rep.orchestrator.step()
 7
 8    # Collect Groundtruth
 9    gt = {
10        "rgb": self.rgb.get_data(device="cuda"),
11        "boundingBox2DTight": self.bbox_2d_tight.get_data(device="cpu"),
12        "instanceSegmentation": self.instance_seg.get_data(device="cuda"),
13    }
14
15    # RGB
16    # Drop alpha channel
17    image = self.wp.to_torch(gt["rgb"])[..., :3]
18
19    # Normalize between 0. and 1. and change order to channel-first.
20    image = image.float() / 255.0
21    image = image.permute(2, 0, 1)
22
23    # Bounding Box
24    gt_bbox = gt["boundingBox2DTight"]["data"]
25
26    # Create mapping from categories to index
27    bboxes = torch.tensor(gt_bbox[["x_min", "y_min", "x_max", "y_max"]].tolist(), device="cuda")
28    id_to_labels = gt["boundingBox2DTight"]["info"]["idToLabels"]
29    prim_paths = gt["boundingBox2DTight"]["info"]["primPaths"]
30
31    # For each bounding box, map semantic label to label index
32    cat_to_id = {cat: i + 1 for i, cat in enumerate(self.categories)}
33    semantic_labels_mapping = {int(k): v.get("class", "") for k, v in id_to_labels.items()}
34    semantic_labels = [cat_to_id[semantic_labels_mapping[i]] for i in gt_bbox["semanticId"]]
35    labels = torch.tensor(semantic_labels, device="cuda")
36
37    # Calculate bounding box area for each area
38    areas = (bboxes[:, 2] - bboxes[:, 0]) * (bboxes[:, 3] - bboxes[:, 1])
39    # Identify invalid bounding boxes to filter final output
40    valid_areas = (areas > 0.0) * (areas < (image.shape[1] * image.shape[2]))
41
42    # Instance Segmentation
43    instance_data = self.wp.to_torch(gt["instanceSegmentation"]["data"]).squeeze()
44    path_to_instance_id = {v: int(k) for k, v in gt["instanceSegmentation"]["info"]["idToLabels"].items()}
45
46    instance_list = [im[0] for im in gt_bbox]
47    masks = torch.zeros((len(instance_list), *instance_data.shape), dtype=bool, device="cuda")
48
49    # Filter for the mask of each object
50    for i, prim_path in enumerate(prim_paths):
51        # Merge child instances of prim_path as one instance
52        for instance in path_to_instance_id:
53            if prim_path in instance:
54                masks[i] += torch.isin(instance_data, path_to_instance_id[instance])
55
56    target = {
57        "boxes": bboxes[valid_areas],
58        "labels": labels[valid_areas],
59        "masks": masks[valid_areas],
60        "image_id": torch.LongTensor([self.cur_idx]),
61        "area": areas[valid_areas],
62        "iscrowd": torch.BoolTensor([False] * len(bboxes[valid_areas])),  # Assume no crowds
63    }
64
65    self.cur_idx += 1
66    return image, target

Details about rest of the DataLoader, including the initialization step and methods specified within __next__, are explained in the below sections.

Initialization Step#

  1. Launch Omniverse Kit using the SimulationApp and the rendering configurations.

  2. After the app starts, the default Isaac extensions are hot-loaded so you can import from them.

  3. Set up replicator and your nucleus server, which are used in this example to manage the domain randomization assets.

Domain randomization is entirely handled through replicator in this example.

from isaacsim import SimulationApp

# Setup default variables
RESOLUTION = (1024, 1024)
OBJ_LOC_MIN = (-50, 5, -50)
OBJ_LOC_MAX = (50, 5, 50)
CAM_LOC_MIN = (100, 0, -100)
CAM_LOC_MAX = (100, 100, 100)
SCALE_MIN = 15
SCALE_MAX = 40

# Default rendering parameters
RENDER_CONFIG = {"headless": False}


class RandomObjects(torch.utils.data.IterableDataset):
    def __init__(
        self, root, categories, max_asset_size=None, num_assets_min=3, num_assets_max=5, split=0.7, train=True
    ):
        assert len(categories) > 1
        assert (split > 0) and (split <= 1.0)

        self.kit = SimulationApp(RENDER_CONFIG)

        import carb
        import omni.replicator.core as rep
        import warp as wp

        self.rep = rep
        self.wp = wp

        from omni.isaac.nucleus import get_assets_root_path

        self.assets_root_path = get_assets_root_path()
        if self.assets_root_path is None:
            carb.log_error("Could not find Isaac Sim assets folder")
            return
        .
        .
        .

The self._find_usd_assets() method searches the root directory for USD files within the category directories you’ve specified and returns their paths. When you want to add a new asset to your scene, pick a path at random and attach it as a reference to a new prim in the scene. Use split to select a subset of training samples so that you can keep a hold-out set for validation. Finally, self.setup_scene() creates a room, lights, and a camera.

class RandomObjects(torch.utils.data.IterableDataset):
    def __init__(
        self, root, categories, max_asset_size=None, num_assets_min=3, num_assets_max=5, split=0.7, train=True
    ):
        .
        .
        .
        # If ShapeNet categories are specified with their names, convert to synset ID
        # Remove this if using with a different dataset than ShapeNet
        category_ids = [utils.LABEL_TO_SYNSET.get(c, c) for c in categories]
        self.categories = category_ids
        self.range_num_assets = (num_assets_min, max(num_assets_min, num_assets_max))
        try:
            self.references = self._find_usd_assets(root, category_ids, max_asset_size, split, train)
        except ValueError as err:
            carb.log_error(str(err))
            self.kit.close()
            sys.exit()

        # Setup the scene, lights, walls, camera, etc.
        self.setup_scene()

        # Setup replicator randomizer graph
        self.setup_replicator()

        self.cur_idx = 0
        self.exiting = False

        signal.signal(signal.SIGINT, self._handle_exit)

    def _find_usd_assets(self, root, categories, max_asset_size, split, train=True):
        ... # (see code for implementation details)

    def setup_scene(self):
        ... # (see code for implementation details)

Setting up a Replicator Graph#

To setup our randomizers to vary the content and appearance of every frame by leveraging Omni.Replicator. Replicator enables the creation of a randomization graph, which executes our specified randomizations.

  1. Set static components, in this case two sphere lights.

  2. Set a replicator on_frame trigger, which triggers randomization at each new frame.

  3. Create the randomization components. The first modifies the color attribute of our two lights.

  4. Randomize the camera position and set its look_at value to the origin so that the camera always orients itself towards that point.

  5. Setup asset randomizers for each asset category and randomize their position, rotation, scale, and material texture. Using the instantiate method, create a prototype of the asset in cache where new instances reference the created prototype.

def _instantiate_category(self, category, references):
    with self.rep.randomizer.instantiate(references, size=1, mode="reference"):
        self.rep.modify.semantics([("class", category)])
        self.rep.modify.pose(
            position=self.rep.distribution.uniform(OBJ_LOC_MIN, OBJ_LOC_MAX),
            rotation=self.rep.distribution.uniform((0, -180, 0), (0, 180, 0)),
            scale=self.rep.distribution.uniform(SCALE_MIN, SCALE_MAX),
        )
        self.rep.randomizer.texture(self._get_textures(), project_uvw=True)

def setup_replicator(self):
    """Setup the replicator graph with various attributes."""

    # Create two sphere lights
    light1 = self.rep.create.light(light_type="sphere", position=(-450, 350, 350), scale=100, intensity=30000.0)
    light2 = self.rep.create.light(light_type="sphere", position=(450, 350, 350), scale=100, intensity=30000.0)

    with self.rep.new_layer():
        with self.rep.trigger.on_frame():
            # Randomize light colors
            with self.rep.create.group([light1, light2]):
                self.rep.modify.attribute("color", self.rep.distribution.uniform((0.1, 0.1, 0.1), (1.0, 1.0, 1.0)))

            # Randomize camera position
            with self.camera:
                self.rep.modify.pose(
                    position=self.rep.distribution.uniform((100, 0, -100), (100, 100, 100)), look_at=(0, 0, 0)
                )

            # Randomize asset positions and textures
            for category, references in self.references.items():
                self._instantiate_category(category, references)

    # Run replicator for a single iteration without triggering any writes
    self.rep.orchestrator.preview()

Train#

Getting Started#

After you have a DataLoader, you can start training. To run the training example:

  • In Linux, use the following command:

./python.sh standalone_examples/replicator/online_generation/train_shapenet.py \
--root $SHAPENET_LOCAL_DIR'_nomat' \
--categories plane watercraft rocket \
--visualize \
--max_asset_size 50
  • In Windows, use the following command:

python.bat standalone_examples/replicator/online_generation/train_shapenet.py --root %SHAPENET_LOCAL_DIR%_nomat --categories plane watercraft rocket --visualize --max_asset_size 50

Validate that you see the loss going down in your terminal and, after approximately 100 iterations, start to see instance segmentation and object detection results being visualized. The max_asset_size 5 argument tells the dataset to skip assets over 5 MB in size. This helps avoid out-of-memory errors caused by loading larger assets. This value can be increased depending on the capacity of the GPU in use. The specific optimizer used in this example maintains a gradient history that grows with iteration number. If you lack VRAM on your hardware, you can adjust the --max_iters command line argument to address this.

Open the _out_train_imgs/train_*.png locally saved image file to see something like below during training.

Instance Segmentation Training

The Code#

  1. Set up the device, dataset, DataLoader, model, and optimizer.

    device = "cuda"
    
    # Setup data
    train_set = RandomObjects(
        args.root, args.categories, num_assets_min=3, num_assets_max=5, max_asset_size=args.max_asset_size
    )
    train_loader = DataLoader(train_set, batch_size=2, collate_fn=lambda x: tuple(zip(*x)))
    
    # Setup Model
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=False, num_classes=1 + len(args.categories))
    model = model.to(device)
    optimizer = torch.optim.Adam(model.parameters(), lr=args.learning_rate)
    

2. Set up the training loop. After sending the data to the GPU, perform a forward pass through the model, calculate the loss, and perform a backward pass to update the model weights.

for i, train_batch in enumerate(train_loader):
    if i > args.max_iters:
        break

    model.train()
    images, targets = train_batch
    images = [i.to(device) for i in images]
    targets = [{k: v.to(device) for k, v in t.items()} for t in targets]
    loss_dict = model(images, targets)
    loss = sum(loss for loss in loss_dict.values())

    print(f"ITER {i} | {loss:.6f}")

    optimizer.zero_grad()
    loss.backward()
    optimizer.step()