Use the SegFormer++ without OpenMMLab:

Building a Model

Use build_model.py to build preset and custom SegFormer++ models

Navigate to model_without_OpenMMLab.

from segformer_plusplus.build_model import create_model
# backbone: choose from ['b0', 'b1', 'b2', 'b3', 'b4', 'b5']
# tome_strategy: choose from ['bsm_hq', 'bsm_fast', 'n2d_2x2']
out_channels = 19  # number of classes, e.g. 19 for cityscapes
model = create_model('b5', 'bsm_hq', out_channels=out_channels, pretrained=True)

Running this code snippet yields our SegFormer++_HQ model pretrained on ImageNet.

Use random_benchmark.py to evaluate a model in terms of FPS

from segformer_plusplus.random_benchmark import random_benchmark
v = random_benchmark(model)

Calculate the FPS using our script.

Loading a Checkpoint

Checkpoints are provided in this Repository. They can be loaded and integrated into the model via PyTorch:

import torch
checkpoint_path = "path_to_your_checkpoint.pth that you downloaded (links in Readme)"
checkpoint = torch.load(checkpoint_path)
model.load_state_dict(checkpoint)

An Example can be found in start_cityscape_benchmark.py

Image Preperation

Images can be imported via PIL and then converted into RGB:

from PIL import Image
image_path = "path_to_your_image.png"
image = Image.open(image_path).convert("RGB")

After that, convert the image into a torch tensor:

import torch
import numpy as np

img_tensor = torch.from_numpy(np.array(image) / 255.0)
img_tensor = img_tensor.permute(2, 0, 1).float().unsqueeze(0)  # (1, C, H, W)
device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
img_tensor = img_tensor.to(device)

Now we can load the model:

from segformer_plusplus.build_model import create_model

out_channels = 19
model = create_model(
    backbone='b5', 
    tome_strategy='bsm_hq', 
    out_channels=out_channels, 
    pretrained=False
).to(device)

model.load_state_dict(torch.load("path_to_checkpoint", map_location=device))
model.eval()

Inference:

with torch.no_grad():
    output = model(img_tensor)
    segmentation_map = torch.argmax(output, dim=1).squeeze().cpu().numpy()

Visualize the results (this is for cityscapes classes):

import numpy as np

# Official Cityscapes colors for train IDs 0-18
cityscapes_colors = np.array([
    [128,  64, 128], # 0: road
    [244,  35, 232], # 1: sidewalk
    [ 70,  70,  70], # 2: building
    [102, 102, 156], # 3: wall
    [190, 153, 153], # 4: fence
    [153, 153, 153], # 5: pole
    [250, 170,  30], # 6: traffic light
    [220, 220,   0], # 7: traffic sign
    [107, 142,  35], # 8: vegetation
    [152, 251, 152], # 9: terrain
    [ 70, 130, 180], # 10: sky
    [220,  20,  60], # 11: person
    [255,   0,   0], # 12: rider
    [  0,   0, 142], # 13: car
    [  0,   0,  70], # 14: truck
    [  0,  60, 100], # 15: bus
    [  0,  80, 100], # 16: train
    [  0,   0, 230], # 17: motorcycle
    [119,  11,  32], # 18: bicycle
], dtype=np.uint8)

# Map each class to its corresponding color
height, width = segmentation_map.shape
color_image = np.zeros((height, width, 3), dtype=np.uint8)
for class_index in range(len(cityscapes_colors)):
    color_image[segmentation_map == class_index] = cityscapes_colors[class_index]

Display and save output:

import matplotlib.pyplot as plt

plt.figure(figsize=(6, 6))
plt.imshow(color_image)
plt.title("Semantic Segmentation Visualization")
plt.axis('off')
plt.show()
# Save the colorized output as an image - important when using a System without GUI
plt.imsave("segmentation_output.png", color_image)

Note: You have to install matplotlib for visualization.

Token-Merge Setting

For information to the settings for the Token Merging look here.