Image Description using Deep Learning / (Record no. 615947)
[ view plain ]
| 000 -LEADER | |
|---|---|
| fixed length control field | 03928nam a22001817a 4500 |
| 003 - CONTROL NUMBER IDENTIFIER | |
| control field | NUST |
| 005 - DATE AND TIME OF LATEST TRANSACTION | |
| control field | 20260127084907.0 |
| 082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER | |
| Classification number | 005.1,ZIA |
| 100 ## - MAIN ENTRY--PERSONAL NAME | |
| Personal name | Zia, Usman |
| 9 (RLIN) | 124510 |
| 245 ## - TITLE STATEMENT | |
| Title | Image Description using Deep Learning / |
| Statement of responsibility, etc. | Usman Zia |
| 260 ## - PUBLICATION, DISTRIBUTION, ETC. | |
| Place of publication, distribution, etc. | Rawalpindi, |
| Name of publisher, distributor, etc. | MCS (NUST), |
| Date of publication, distribution, etc. | 2022 |
| 300 ## - PHYSICAL DESCRIPTION | |
| Extent | xiii, 114 p |
| 505 ## - FORMATTED CONTENTS NOTE | |
| Formatted contents note | Internet technologies are generating enormous amount of data that merges textual and visual<br/>content: tagged images, descriptions in newspaper, videos with captions, and social media<br/>feeds. Such interaction with technology and devices has become part of everyday life, for<br/>example explaining an image in the context of news, following instructions by interpreting<br/>a diagram or a map, understanding presentations while listening to a lecture. Traditionally,<br/>content providers manually added captions to make this more accessible. These captions<br/>are used by text-to-speech system to generate a natural-language description of images and<br/>videos. Recent years have seen an upsurge of interest in problems that require a combination<br/>of language and visual contents to develop methods for automatically generating image<br/>descriptions.<br/>Due to the potential applications in computer vision, information retrieval, autonomous vehicles<br/>and natural language processing (NLP), automatic generation of sequence of words<br/>known as caption for an image has captured enormous consideration in past decade. Various<br/>techniques have been proposed for automatic generation of image descriptions using<br/>most suitable annotation in the training set. These training annotations are sometimes rearranged<br/>or also boosted by natural language processing (NLP) algorithms. Despite significant<br/>achievements in generating sentences for images, existing models struggle to capture<br/>human-like semantics in generated descriptions.<br/>In this thesis, three novel image description techniques have been proposed to generate semantically<br/>superior captions of the target image. The first proposed technique incorporates<br/>topic sensitive word embedding for generation of image description. Topic Models consider<br/>documents to be associated with different topics based on probability distribution over<br/>words. The proposed approach uses topic modeling to align semantic meaning of words<br/>to image features and generate descriptions that are more relevant to context (topic) of the<br/>target image regions. Compared to traditional models, the proposed approach utilizes high<br/>level semantics of words to represent diversity in the training corpus.<br/>Convolutional layers of the visual encoder used on traditional models generate feature maps<br/>to extract hierarchical information from the visual contents. These convolution layers do<br/>not exploit the dependencies between feature maps which can result in loss of essential information<br/>to guide language model for description generation. The second proposed model<br/>incorporates scene information to capture the overall setting reflected in the visual content<br/>along with object level features using squeeze-and-excitation module and spatial details to<br/>boost the accuracy of caption generation. Visual features are coupled with location information<br/>along with topic modeling to capture semantic word relationships to feed sequence-tosequence<br/>word generation task.<br/>Third proposed approach addresses the challenges in remote sensing image description<br/>due to large variance in the visual aspects of objects. Multi-scale visual feature encoder<br/>is proposed to extract detailed information from remote sensing images. Adaptive attention<br/>decoder dynamically assigns weights to the multi-scale features and textual queues to<br/>strengthen the language model to generate novel topic sensitive descriptions. |
| 650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM | |
| Topical term or geographic name entry element | PhD Computer Software Engineering Thesis |
| 9 (RLIN) | 132801 |
| 651 ## - SUBJECT ADDED ENTRY--GEOGRAPHIC NAME | |
| Geographic name | PhD CSE Thesis |
| 9 (RLIN) | 132802 |
| 700 ## - ADDED ENTRY--PERSONAL NAME | |
| Personal name | Supervised by Dr. Abdul Ghafoor |
| 9 (RLIN) | 132894 |
| 942 ## - ADDED ENTRY ELEMENTS (KOHA) | |
| Source of classification or shelving scheme | |
| Koha item type | Thesis |
| Withdrawn status | Lost status | Source of classification or shelving scheme | Damaged status | Not for loan | Permanent Location | Current Location | Shelving location | Date acquired | Total Checkouts | Full call number | Barcode | Date last seen | Price effective from | Koha item type | Public note |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Military College of Signals (MCS) | Military College of Signals (MCS) | Thesis | 01/27/2026 | 005.1,ZIA | MCSPhD CSE-16 | 01/27/2026 | 01/27/2026 | Thesis | Almirah No.68, Shelf No.5 |
