Please use this identifier to cite or link to this item:
http://dspace.cityu.edu.hk/handle/2031/9511
Title: | Machine Learning-Based Text Recognition in Images For Visually Impaired |
Authors: | Lee, Sumin |
Department: | Department of Computer Science |
Issue Date: | 2021 |
Supervisor: | Supervisor: Dr. Leung, Wing Ho Howard; First Reader: Dr. Li, Zhenjiang; Second Reader: Dr. Lau, Rynson W H |
Abstract: | Despite the significant advance in technologies, the development of assistive technologies for the visually impaired is in stagnation. At the same time, researches suggested that assistive technologies are effective in improving the visually impaired low quality of life. Existing technologies, especially the OCR (Optical Character Recognition) applications, do not provide a high quality of text detection and recognition for users. Users usually have to pay for the services, or the services are not compatible. Moreover, most alternative solutions do not focus on OCR function but other functionalities, making it difficult to detect and recognize texts from various materials. Thus, current approaches to the problem are unreliable and inefficient for the visually impaired to read the texts. This project aims to develop a machine learning-based text recognition model and a mobile application deploying the model, where the application can capture the image from nature scenes using the camera. Then, machine learning models process the captured image so the texts are detected and recognized. Afterward, the recognized text is converted to speech for the visually impaired. The machine learning model is designed with TensorFlow and Keras API in Python. A 2D CNN(Convolutional Neural Network) model is designed by training and testing EMNIST datasets, which are pre-processed beforehand to improve the efficiency of model training. After saving the model to a TensorFlow Lite model, it is deployed in the mobile application. The mobile application is developed in Android Studio, which implements Firebase ML Kit for text detection. Detected texts are passed into deployed ML model to produce an output, which is converted to speech by using TTS(Text-to-Speech) library in Android. The CNN model designed shows satisfactory accuracy of 85%. However, the accuracy might produce the wrong result depending on the similarity of the characters and digits. Also, the condition of the image: brightness, noises and blurredness plays a significant role affecting the accuracy of the model. The output of the application is also affected by the environment of the image. Nonetheless, the application can successfully detect and recognize texts from various media, including handwriting, books, bills, and magazines with unique text fonts. The project can be further improved by adopting different machine learning algorithms other than CNN. Also, since the mobile application is designed simply in this project, adding other features and functions will lead to a better experience for the visually impaired. |
Appears in Collections: | Computer Science - Undergraduate Final Year Projects |
Files in This Item:
File | Size | Format | |
---|---|---|---|
fulltext.html | 147 B | HTML | View/Open |
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.