Normal view MARC view ISBD view

Cross-CvT: An Encoder-Decoder Multi-Level CrossAttentional Architecture for Semantic Segmentation / (Record no. 614839)

000 -LEADER
fixed length control field	02379nam a22001577a 4500
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	629.8
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name	Shah, Syed Muhammad Ammar
245 ## - TITLE STATEMENT
Title	Cross-CvT: An Encoder-Decoder Multi-Level CrossAttentional Architecture for Semantic Segmentation /
Statement of responsibility, etc.	Syed Muhammad Ammar Shah
264 ## - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE
Place of production, publication, distribution, manufacture	Islamabad :
Name of producer, publisher, distributor, manufacturer	SMME- NUST;
Date of production, publication, distribution, manufacture, or copyright notice	2025.
300 ## - PHYSICAL DESCRIPTION
Extent	100p.
Other physical details	Soft Copy
Dimensions	30cm
500 ## - GENERAL NOTE
General note	Convolutional Neural Network (CNN) based semantic segmentation algorithms have been<br/>widely used in encoder-decoder framework for semantic segmentation due to their ability<br/>to extract local information efficiently but lack the receptive field to handle long-range<br/>dependencies, especially in shallow layers. Transformer-based algorithms have the<br/>capability to extract global features due to their inherent attention mechanism but require<br/>large amounts of data and computational power to perform at their full potential. Hybrid<br/>CNN-Transformer algorithms are being explored to utilize the strengths of the approaches.<br/>This work introduces one such algorithm called Cross-CvT, which is inspired by<br/>Convolutional Vision Transformer (CvT) paradigm. The encoder adopts the standard CvT<br/>design, employing convolutional patch embeddings and convolutional transformer blocks,<br/>where each MLP feed-forward layer is replaced by an inverted residual block to introduce<br/>local context. The decoder mirrors this design but uses learned upsampling through<br/>transposed convolutions by replacing convolutional patch embeddings. Skip connections<br/>link corresponding encoder and decoder stages, augmented by cross-attention modules that<br/>allow decoder feature queries to attend to encoder outputs, enabling rich multi-scale feature<br/>fusion. The proposed architecture preserves the transformer’s global context while<br/>reintroducing CNN-like inductive biases for detailed high-resolution segmentation. We<br/>evaluate Cross-CvT on the Cityscapes benchmark and achieved a mean Intersection over<br/>Union score of 52.3%, which competes with the state-of-the-art approaches in the realm of<br/>semantic segmentation, which highlights the effectiveness of the Cross-CvT design for<br/>semantic segmentation.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	MS Robotics and Intelligent Machine Engineering
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name	Supervisor : Dr. Zaib Ali
856 ## - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier	<a href="http://10.250.8.41:8080/xmlui/handle/123456789/54874">http://10.250.8.41:8080/xmlui/handle/123456789/54874</a>
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme
Koha item type	Thesis

Holdings
Withdrawn status	Permanent Location	Current Location	Shelving location	Date acquired	Full call number	Barcode	Koha item type
	School of Mechanical & Manufacturing Engineering (SMME)	School of Mechanical & Manufacturing Engineering (SMME)	E-Books	09/24/2025	629.8	SMME-TH-1167	Thesis

NUST Institutions Library Catalogue

NUST INSTITUTIONS' LIBRARY CATALOGUE

Cross-CvT: An Encoder-Decoder Multi-Level CrossAttentional Architecture for Semantic Segmentation / (Record no. 614839)