City University of Hong Kong

CityU Institutional Repository >
3_CityU Electronic Theses and Dissertations >
ETD - Dept. of Computer Science  >
CS - Master of Philosophy  >

Please use this identifier to cite or link to this item:

Title: Web document analysis and its application to anti-phishing
Other Titles: Wang lu wen dang fen xi ji qi zai fan wang diao shi qi zha fang mian de ying yong
Authors: Huang, Guanglin (黃光霖)
Department: Dept. of Computer Science
Degree: Master of Philosophy
Issue Date: 2006
Publisher: City University of Hong Kong
Subjects: Internet -- Security measures
Notes: CityU Call Number: TK5105.875.I57 H827 2006
Includes bibliographical references (leaves 68-75)
Thesis (M.Phil.)--City University of Hong Kong, 2006
vii, 75 leaves : ill. ; 30 cm.
Type: Thesis
Abstract: The Web is growing at an astonishing speed and it is now the largest information and knowledge repository. Many web documents are accumulated, which need automatic processing and analysis for intelligent applications. In this thesis, we investigate the web document analysis technique and also develop an application to anti-phishing. For Web document analysis, a visual factor based page segmentation approach is proposed and implemented. Based on the W3C DOM model of HTML, this approach first decomposes the whole web page into many independent salient blocks, which are visually and semantically consistent within each block but distinguishable between adjacent blocks. In the second step, the approach aggregates these salient blocks into semantically meaningful blocks according their positions and visual cues in the web page. In such as bottom-up manner, the approach final builds up a hierarchical segmented blocks tree. We apply our webpage segmentation to the Anti-Phishing problem. Phishing webpages usually exhibit similar visual styles and structure with their target ones. Based on web page segmentation, we propose three metrics (block level similarity, layout similarity, and overall style similarity) to evaluate the visual similarities between a phishing page and its target. If one of them exceeds a specific threshold, a phishing alarm is issued. We have built up a prototype system to demonstrate the business model of our anti-phishing mechanism, and believe our strategy can be utilized as an enterprise solution for anti-phishing.
Online Catalog Link:
Appears in Collections:CS - Master of Philosophy

Files in This Item:

File Description SizeFormat
fulltext.html159 BHTMLView/Open
abstract.html159 BHTMLView/Open

Items in CityU IR are protected by copyright, with all rights reserved, unless otherwise indicated.


Valid XHTML 1.0!
DSpace Software © 2013 CityU Library - Send feedback to Library Systems
Privacy Policy · Copyright · Disclaimer