Design of a Novel Spectral Learnable Dynamic Feature Map Semantic Segmentation Model / Mugheera Saleem

By: Saleem, MugheeraContributor(s): Supervisor : Dr. Zaib AliMaterial type: TextTextIslamabad : SMME- NUST; 2025Description: 131p. Soft Copy 30cmSubject(s): MS Robotics and Intelligent Machine EngineeringDDC classification: 629.8 Online resources: Click here to access online
Tags from this library: No tags from this library for this title. Log in to add tags.
Item type Current location Home library Shelving location Call number Status Date due Barcode Item holds
Thesis Thesis School of Mechanical & Manufacturing Engineering (SMME)
School of Mechanical & Manufacturing Engineering (SMME)
E-Books 629.8 (Browse shelf) Available SMME-TH-1166
Total holds: 0

In Computer Vision models, downsampling is a strategy to compress contextual
information spatially while improving model flexibility by adding depth to the layer outputs.
Traditionally, downsampling ratios in segmentation models have been governed by the model
architecture and have remained fixed while being treated as a hyperparameter. Although some
research has been conducted to introduce learnable downsampling in image classification,
similar strategies have not been adopted in segmentation due to the issues in managing dynamic
feature maps while performing upsampling. AdaUNet is an efficient semantic segmentation
model with a novel encoder-decoder design. The encoder incorporates a differentiable stride
learning mechanism and spectral attention to adaptively determine downsampling rates,
reducing redundant spatial information and computational costs. The decoder uses a hypernetwork-based super-resolution model called Continuous Upsampling Filters (CUF) to
smoothly recover high-resolution outputs. This design allows AdaUNet to be the first
segmentation model that optimizes the size of the intermediate feature maps, reducing them by
up to 50 times compared to traditional fixed-pooling methods, drastically cutting FLOPs and
activation memory. On half-size image resolution of Cityscapes, AdaUNet achieves a 61.8%
mean IoU using just 7.4M parameters and 29.36 GFLOPs, outperforming models like
SegFormer and HRNet-V2. On the CamVid (256×256) dataset, the model scores a 72.33%
mean IoU with only 3.14 GFLOPs. Furthermore, a Cityscapes-pretrained AdaUNet surpasses
an ImageNet-1k pretrained (ResNet-101) DeepLabv3 model by 5% on CamVid while requiring
around 20 times fewer FLOPs and 8 times less parameters. The proposed model is highly
suitable for resource-constrained environments where high accuracy and low computational
cost are critical.

There are no comments on this title.

to post a comment.
© 2023 Central Library, National University of Sciences and Technology. All Rights Reserved.