Image Description using Deep Learning / (Record no. 615947)

000 -LEADER
fixed length control field 03928nam a22001817a 4500
003 - CONTROL NUMBER IDENTIFIER
control field NUST
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20260127084907.0
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 005.1,ZIA
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Zia, Usman
9 (RLIN) 124510
245 ## - TITLE STATEMENT
Title Image Description using Deep Learning /
Statement of responsibility, etc. Usman Zia
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Place of publication, distribution, etc. Rawalpindi,
Name of publisher, distributor, etc. MCS (NUST),
Date of publication, distribution, etc. 2022
300 ## - PHYSICAL DESCRIPTION
Extent xiii, 114 p
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note Internet technologies are generating enormous amount of data that merges textual and visual<br/>content: tagged images, descriptions in newspaper, videos with captions, and social media<br/>feeds. Such interaction with technology and devices has become part of everyday life, for<br/>example explaining an image in the context of news, following instructions by interpreting<br/>a diagram or a map, understanding presentations while listening to a lecture. Traditionally,<br/>content providers manually added captions to make this more accessible. These captions<br/>are used by text-to-speech system to generate a natural-language description of images and<br/>videos. Recent years have seen an upsurge of interest in problems that require a combination<br/>of language and visual contents to develop methods for automatically generating image<br/>descriptions.<br/>Due to the potential applications in computer vision, information retrieval, autonomous vehicles<br/>and natural language processing (NLP), automatic generation of sequence of words<br/>known as caption for an image has captured enormous consideration in past decade. Various<br/>techniques have been proposed for automatic generation of image descriptions using<br/>most suitable annotation in the training set. These training annotations are sometimes rearranged<br/>or also boosted by natural language processing (NLP) algorithms. Despite significant<br/>achievements in generating sentences for images, existing models struggle to capture<br/>human-like semantics in generated descriptions.<br/>In this thesis, three novel image description techniques have been proposed to generate semantically<br/>superior captions of the target image. The first proposed technique incorporates<br/>topic sensitive word embedding for generation of image description. Topic Models consider<br/>documents to be associated with different topics based on probability distribution over<br/>words. The proposed approach uses topic modeling to align semantic meaning of words<br/>to image features and generate descriptions that are more relevant to context (topic) of the<br/>target image regions. Compared to traditional models, the proposed approach utilizes high<br/>level semantics of words to represent diversity in the training corpus.<br/>Convolutional layers of the visual encoder used on traditional models generate feature maps<br/>to extract hierarchical information from the visual contents. These convolution layers do<br/>not exploit the dependencies between feature maps which can result in loss of essential information<br/>to guide language model for description generation. The second proposed model<br/>incorporates scene information to capture the overall setting reflected in the visual content<br/>along with object level features using squeeze-and-excitation module and spatial details to<br/>boost the accuracy of caption generation. Visual features are coupled with location information<br/>along with topic modeling to capture semantic word relationships to feed sequence-tosequence<br/>word generation task.<br/>Third proposed approach addresses the challenges in remote sensing image description<br/>due to large variance in the visual aspects of objects. Multi-scale visual feature encoder<br/>is proposed to extract detailed information from remote sensing images. Adaptive attention<br/>decoder dynamically assigns weights to the multi-scale features and textual queues to<br/>strengthen the language model to generate novel topic sensitive descriptions.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element PhD Computer Software Engineering Thesis
9 (RLIN) 132801
651 ## - SUBJECT ADDED ENTRY--GEOGRAPHIC NAME
Geographic name PhD CSE Thesis
9 (RLIN) 132802
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name Supervised by Dr. Abdul Ghafoor
9 (RLIN) 132894
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme
Koha item type Thesis
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Permanent Location Current Location Shelving location Date acquired Total Checkouts Full call number Barcode Date last seen Price effective from Koha item type Public note
          Military College of Signals (MCS) Military College of Signals (MCS) Thesis 01/27/2026   005.1,ZIA MCSPhD CSE-16 01/27/2026 01/27/2026 Thesis Almirah No.68, Shelf No.5
© 2023 Central Library, National University of Sciences and Technology. All Rights Reserved.