Cross-CvT: An Encoder-Decoder Multi-Level CrossAttentional Architecture for Semantic Segmentation / (Record no. 614839)

000 -LEADER
fixed length control field 02379nam a22001577a 4500
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 629.8
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Shah, Syed Muhammad Ammar
245 ## - TITLE STATEMENT
Title Cross-CvT: An Encoder-Decoder Multi-Level CrossAttentional Architecture for Semantic Segmentation /
Statement of responsibility, etc. Syed Muhammad Ammar Shah
264 ## - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Place of production, publication, distribution, manufacture Islamabad :
Name of producer, publisher, distributor, manufacturer SMME- NUST;
Date of production, publication, distribution, manufacture, or copyright notice 2025.
300 ## - PHYSICAL DESCRIPTION
Extent 100p.
Other physical details Soft Copy
Dimensions 30cm
500 ## - GENERAL NOTE
General note Convolutional Neural Network (CNN) based semantic segmentation algorithms have been<br/>widely used in encoder-decoder framework for semantic segmentation due to their ability<br/>to extract local information efficiently but lack the receptive field to handle long-range<br/>dependencies, especially in shallow layers. Transformer-based algorithms have the<br/>capability to extract global features due to their inherent attention mechanism but require<br/>large amounts of data and computational power to perform at their full potential. Hybrid<br/>CNN-Transformer algorithms are being explored to utilize the strengths of the approaches.<br/>This work introduces one such algorithm called Cross-CvT, which is inspired by<br/>Convolutional Vision Transformer (CvT) paradigm. The encoder adopts the standard CvT<br/>design, employing convolutional patch embeddings and convolutional transformer blocks,<br/>where each MLP feed-forward layer is replaced by an inverted residual block to introduce<br/>local context. The decoder mirrors this design but uses learned upsampling through<br/>transposed convolutions by replacing convolutional patch embeddings. Skip connections<br/>link corresponding encoder and decoder stages, augmented by cross-attention modules that<br/>allow decoder feature queries to attend to encoder outputs, enabling rich multi-scale feature<br/>fusion. The proposed architecture preserves the transformer’s global context while<br/>reintroducing CNN-like inductive biases for detailed high-resolution segmentation. We<br/>evaluate Cross-CvT on the Cityscapes benchmark and achieved a mean Intersection over<br/>Union score of 52.3%, which competes with the state-of-the-art approaches in the realm of<br/>semantic segmentation, which highlights the effectiveness of the Cross-CvT design for<br/>semantic segmentation.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element MS Robotics and Intelligent Machine Engineering
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name Supervisor : Dr. Zaib Ali
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier <a href="http://10.250.8.41:8080/xmlui/handle/123456789/54874">http://10.250.8.41:8080/xmlui/handle/123456789/54874</a>
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme
Koha item type Thesis
Holdings
Withdrawn status Permanent Location Current Location Shelving location Date acquired Full call number Barcode Koha item type
  School of Mechanical & Manufacturing Engineering (SMME) School of Mechanical & Manufacturing Engineering (SMME) E-Books 09/24/2025 629.8 SMME-TH-1167 Thesis
© 2023 Central Library, National University of Sciences and Technology. All Rights Reserved.