Cross-CvT: An Encoder-Decoder Multi-Level CrossAttentional Architecture for Semantic Segmentation / (Record no. 614839)
[ view plain ]
| 000 -LEADER | |
|---|---|
| fixed length control field | 02379nam a22001577a 4500 |
| 082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER | |
| Classification number | 629.8 |
| 100 ## - MAIN ENTRY--PERSONAL NAME | |
| Personal name | Shah, Syed Muhammad Ammar |
| 245 ## - TITLE STATEMENT | |
| Title | Cross-CvT: An Encoder-Decoder Multi-Level CrossAttentional Architecture for Semantic Segmentation / |
| Statement of responsibility, etc. | Syed Muhammad Ammar Shah |
| 264 ## - PRODUCTION, PUBLICATION, DISTRIBUTION, MANUFACTURE, AND COPYRIGHT NOTICE | |
| Place of production, publication, distribution, manufacture | Islamabad : |
| Name of producer, publisher, distributor, manufacturer | SMME- NUST; |
| Date of production, publication, distribution, manufacture, or copyright notice | 2025. |
| 300 ## - PHYSICAL DESCRIPTION | |
| Extent | 100p. |
| Other physical details | Soft Copy |
| Dimensions | 30cm |
| 500 ## - GENERAL NOTE | |
| General note | Convolutional Neural Network (CNN) based semantic segmentation algorithms have been<br/>widely used in encoder-decoder framework for semantic segmentation due to their ability<br/>to extract local information efficiently but lack the receptive field to handle long-range<br/>dependencies, especially in shallow layers. Transformer-based algorithms have the<br/>capability to extract global features due to their inherent attention mechanism but require<br/>large amounts of data and computational power to perform at their full potential. Hybrid<br/>CNN-Transformer algorithms are being explored to utilize the strengths of the approaches.<br/>This work introduces one such algorithm called Cross-CvT, which is inspired by<br/>Convolutional Vision Transformer (CvT) paradigm. The encoder adopts the standard CvT<br/>design, employing convolutional patch embeddings and convolutional transformer blocks,<br/>where each MLP feed-forward layer is replaced by an inverted residual block to introduce<br/>local context. The decoder mirrors this design but uses learned upsampling through<br/>transposed convolutions by replacing convolutional patch embeddings. Skip connections<br/>link corresponding encoder and decoder stages, augmented by cross-attention modules that<br/>allow decoder feature queries to attend to encoder outputs, enabling rich multi-scale feature<br/>fusion. The proposed architecture preserves the transformer’s global context while<br/>reintroducing CNN-like inductive biases for detailed high-resolution segmentation. We<br/>evaluate Cross-CvT on the Cityscapes benchmark and achieved a mean Intersection over<br/>Union score of 52.3%, which competes with the state-of-the-art approaches in the realm of<br/>semantic segmentation, which highlights the effectiveness of the Cross-CvT design for<br/>semantic segmentation. |
| 650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM | |
| Topical term or geographic name entry element | MS Robotics and Intelligent Machine Engineering |
| 700 ## - ADDED ENTRY--PERSONAL NAME | |
| Personal name | Supervisor : Dr. Zaib Ali |
| 856 ## - ELECTRONIC LOCATION AND ACCESS | |
| Uniform Resource Identifier | <a href="http://10.250.8.41:8080/xmlui/handle/123456789/54874">http://10.250.8.41:8080/xmlui/handle/123456789/54874</a> |
| 942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
| Source of classification or shelving scheme | |
| Koha item type | Thesis |
| Withdrawn status | Permanent Location | Current Location | Shelving location | Date acquired | Full call number | Barcode | Koha item type |
|---|---|---|---|---|---|---|---|
| School of Mechanical & Manufacturing Engineering (SMME) | School of Mechanical & Manufacturing Engineering (SMME) | E-Books | 09/24/2025 | 629.8 | SMME-TH-1167 | Thesis |
