Semantic Video Frame Extraction for Comics

Garg, Shrankhla

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/48

Title:	Semantic Video Frame Extraction for Comics
Authors:	Garg, Shrankhla
Department:	Department of Computer Science
Issue Date:	2017
Supervisor:	Supervisor: Dr. Lau, W. H. Rynson; First Reader: Dr. Lu, Zheng; Second Reader: Prof. Zhang, Qingfu
Abstract:	In today’s world, comics are a type of artwork that are quite popular and have become a desired source of entertainment amongst all age groups. Although there are a number of tools available to aid the designing of comics, it is still a time consuming and a tedious job[1]. Many comic artists are accustomed to manually composing the artwork by selecting the appropriate key frames which signify the salient features within a video. Hence intense labor and time is required in this process of comics’ creation. This project aims to generate pictorial video summaries that achieve automatic selection of key frames for comic books. This is done extracting Mc Cloud’s panel to panel transitions. The main goal of the project is to present a terminology that produces a relevant and concise representation of a video. This can be achieved by automatically choosing only the frames that summarize the characteristics of a video. The narrative behind the video should also be traceable with the help of panel transitions extracted. The system architecture comprises of 3 models that can extract 3 type of panel transitions as defined by Scott Mc Cloud. Each model varies in the way the panel transitions are extracted. Some transitions require the detection of scene changes (Scene to Scene), some require the detection and recognition of faces (Subject to Subject) whereas some require detection of action changes (Moment to Moment) and (Action to Action). Hence each model uses different computer vision techniques such as face detection, face recognition and object detection as well Image Processing techniques such as edge detection and histogram comparison. The models are competent to automatically detect and emphasize meaningful events which clinches apt selection of these panel transitions. The different types of Mc Cloud’s panel transitions differ in the way they are extracted. To detect accurate panel transitions, apt detection of scene changes within a video is crucial. Currently in videos, there are many type of shot changes like fade in, fade out, dissolve etc. Hence to surpass all these complex shot changes, a very robust algorithm is required which can detect scene changes in all these conditions. The implemented algorithm does exactly that by detecting 95% of the scene changes in a test video. The idea or the techniques used for the extraction of all the panel transitions is solely a collective work of mine and my supervisor’s (Dr Rynson Lau) discretion. All the techniques have been devised by us with the help of references of other researches as cited. Through a user study, I have demonstrated that my models can assist artist to select appropriate transitions smoothly and more accurately as compared to the manual selection of transitions. The model boasts a recall of 0.84 and a precision of 0.78. Face recognition for the extraction of scene to scene panel transitions also proved to be a demanding task. Currently the most popular dataset for face recognition contains images that are well lit, and captured at the same time in different positions of the head. However, for a robust face recognition system, the system should be able to recognize faces even if they are in a different orientation as that of the query image. The system should also be able to recognize faces of a person taken at significant time gap, for example training images contains captures of a person from 25 till 40 years of his age. The implemented system utilizing advanced face recognition techniques like eigen face system and principal component analysis also gives accurate results in a dataset where the faces are in different orientations compared to a query image.
Appears in Collections:	Computer Science - Undergraduate Final Year Projects

Files in This Item:

File	Description	Size	Format
fulltext.html		147 B	HTML	View/Open

Show full item record