Analysis and Retrieval of Scanned Documents using Word Spotting Techniques / (Record no. 615844)

000 -LEADER
fixed length control field 03511nam a22001817a 4500
003 - CONTROL NUMBER IDENTIFIER
control field NUST
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20260117160958.0
082 ## - DEWEY DECIMAL CLASSIFICATION NUMBER
Classification number 005.1,HUS
100 ## - MAIN ENTRY--PERSONAL NAME
Personal name Hussain, Muhammad Rashid
9 (RLIN) 21264
245 ## - TITLE STATEMENT
Title Analysis and Retrieval of Scanned Documents using Word Spotting Techniques /
Statement of responsibility, etc. Muhammad Rashid Hussain
260 ## - PUBLICATION, DISTRIBUTION, ETC.
Place of publication, distribution, etc. Rawalpindi,
Name of publisher, distributor, etc. MCS (NUST),
Date of publication, distribution, etc. 2017
300 ## - PHYSICAL DESCRIPTION
Extent xii, 98 p
505 ## - FORMATTED CONTENTS NOTE
Formatted contents note Writing is a codified system of standard symbols: the repetition of agreed-upon simple shapes to represent ideas. Language using symbols is assumed to be universal which is easier to interpret and efficient to use. Handwriting has remained one of the most frequently occurring patterns that we come across in everyday life. Handwriting offers a number of interesting pattern classification problems including handwriting recognition, writer identification, signature verification, writer demographics classification and script recognition etc. There is a dire need to address these problems and all out efforts be made to devise a script independent framework that can be applied globally to maximize the advantages of wealth of knowledge contained in the form of handwritten scripts. Lot of research in this area is ongoing. The work presented here is a document indexing and retrieval system using word spotting as the matching technique. Word spotting presents an attractive alternative to the traditional Optical Character Recognition (OCR) systems where instead of converting the image into text, retrieval is based on matching the images of words using pattern classification techniques. Proposed system relies on extracting words from images of handwritten documents and converting each word into a shape represented by its contour. Conversion of words into shapes is an innovation proposed in our framework that will set new avenues of research; as this work has not been experimented before in the history of word spotting. A set of multiple features is then extracted from each shaped word and instances of the same word are grouped into clusters. These clusters are used to train a multi-class Support Vector Machine (SVM) which learns different word classes. The documents to be indexed are segmented into words and the closest cluster for each word is determined using the SVM. An index file is maintained for each word cluster which keeps information on the documents containing the respective word along-with the word locations within each document. A query word presented to the system is matched with the clusters in the database and the documents containing occurrences of the query word are presented to the user. The system evaluated on the handwritten images of IAM database reported promising precision and recall rates. Enhancement of feature vector space by introducing new set of features is also a major contribution. Study has also been carried out to analyze the contribution and significance of different features employed in our study. Use of most relevant feature vector through employment of Principal Component Analysis (PCA) has also been applied to condense the dimensionality. The proposed framework has also been successfully tested in extremely challenging / cursive Urdu language scripts. Promising results in both English and Urdu scripts amply proves script independence that can be applied globally.
650 ## - SUBJECT ADDED ENTRY--TOPICAL TERM
Topical term or geographic name entry element PhD Computer Software Engineering Thesis
9 (RLIN) 132801
651 ## - SUBJECT ADDED ENTRY--GEOGRAPHIC NAME
Geographic name PhD CSE Thesis
9 (RLIN) 132802
700 ## - ADDED ENTRY--PERSONAL NAME
Personal name Supervised by Dr. Asif Masood
9 (RLIN) 132796
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Source of classification or shelving scheme
Koha item type Thesis
Holdings
Withdrawn status Lost status Source of classification or shelving scheme Damaged status Not for loan Permanent Location Current Location Shelving location Date acquired Total Checkouts Full call number Barcode Date last seen Price effective from Koha item type Public note
          Military College of Signals (MCS) Military College of Signals (MCS) Thesis 01/17/2026   005.1,HUS MCSPhD CS-05 01/17/2026 01/17/2026 Thesis Almirah No.68, Shelf No.5
© 2023 Central Library, National University of Sciences and Technology. All Rights Reserved.