Normal view MARC view ISBD view

Image Description using Deep Learning / (Record no. 615947)

000 -LEADER
fixed length control field	03928nam a22001817a 4500
003 - CONTROL NUMBER IDENTIFIER
control field	NUST
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20260127084907.0
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number	005.1,ZIA
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name	Zia, Usman
9 (RLIN)	124510
245 ## - TITLE STATEMENT
Title	Image Description using Deep Learning /
Statement of responsibility, etc.	Usman Zia
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Place of publication, distribution, etc.	Rawalpindi,
Name of publisher, distributor, etc.	MCS (NUST),
Date of publication, distribution, etc.	2022
300 ## - PHYSICAL DESCRIPTION
Extent	xiii, 114 p
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note	Internet technologies are generating enormous amount of data that merges textual and visual<br/>content: tagged images, descriptions in newspaper, videos with captions, and social media<br/>feeds. Such interaction with technology and devices has become part of everyday life, for<br/>example explaining an image in the context of news, following instructions by interpreting<br/>a diagram or a map, understanding presentations while listening to a lecture. Traditionally,<br/>content providers manually added captions to make this more accessible. These captions<br/>are used by text-to-speech system to generate a natural-language description of images and<br/>videos. Recent years have seen an upsurge of interest in problems that require a combination<br/>of language and visual contents to develop methods for automatically generating image<br/>descriptions.<br/>Due to the potential applications in computer vision, information retrieval, autonomous vehicles<br/>and natural language processing (NLP), automatic generation of sequence of words<br/>known as caption for an image has captured enormous consideration in past decade. Various<br/>techniques have been proposed for automatic generation of image descriptions using<br/>most suitable annotation in the training set. These training annotations are sometimes rearranged<br/>or also boosted by natural language processing (NLP) algorithms. Despite significant<br/>achievements in generating sentences for images, existing models struggle to capture<br/>human-like semantics in generated descriptions.<br/>In this thesis, three novel image description techniques have been proposed to generate semantically<br/>superior captions of the target image. The first proposed technique incorporates<br/>topic sensitive word embedding for generation of image description. Topic Models consider<br/>documents to be associated with different topics based on probability distribution over<br/>words. The proposed approach uses topic modeling to align semantic meaning of words<br/>to image features and generate descriptions that are more relevant to context (topic) of the<br/>target image regions. Compared to traditional models, the proposed approach utilizes high<br/>level semantics of words to represent diversity in the training corpus.<br/>Convolutional layers of the visual encoder used on traditional models generate feature maps<br/>to extract hierarchical information from the visual contents. These convolution layers do<br/>not exploit the dependencies between feature maps which can result in loss of essential information<br/>to guide language model for description generation. The second proposed model<br/>incorporates scene information to capture the overall setting reflected in the visual content<br/>along with object level features using squeeze-and-excitation module and spatial details to<br/>boost the accuracy of caption generation. Visual features are coupled with location information<br/>along with topic modeling to capture semantic word relationships to feed sequence-tosequence<br/>word generation task.<br/>Third proposed approach addresses the challenges in remote sensing image description<br/>due to large variance in the visual aspects of objects. Multi-scale visual feature encoder<br/>is proposed to extract detailed information from remote sensing images. Adaptive attention<br/>decoder dynamically assigns weights to the multi-scale features and textual queues to<br/>strengthen the language model to generate novel topic sensitive descriptions.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element	PhD Computer Software Engineering Thesis
9 (RLIN)	132801
651 ## - SUBJECT ADDED ENTRY--GEOGRAPHIC NAME
Geographic name	PhD CSE Thesis
9 (RLIN)	132802
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name	Supervised by Dr. Abdul Ghafoor
9 (RLIN)	132894
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme
Koha item type	Thesis

Holdings
Withdrawn status	Lost status	Source of classification or shelving scheme	Damaged status	Not for loan	Permanent Location	Current Location	Shelving location	Date acquired	Total Checkouts	Full call number	Barcode	Date last seen	Price effective from	Koha item type	Public note
					Military College of Signals (MCS)	Military College of Signals (MCS)	Thesis	01/27/2026		005.1,ZIA	MCSPhD CSE-16	01/27/2026	01/27/2026	Thesis	Almirah No.68, Shelf No.5

NUST Institutions Library Catalogue

NUST INSTITUTIONS' LIBRARY CATALOGUE

Image Description using Deep Learning / (Record no. 615947)