Skip navigation
Run Run Shaw Library City University of Hong KongRun Run Shaw Library

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/9093
Full metadata record
DC FieldValueLanguage
dc.contributor.authorLiunardo, Mikhaelen_US
dc.date.accessioned2019-01-29T06:57:22Z
dc.date.accessioned2019-02-12T06:54:10Z-
dc.date.available2019-01-29T06:57:22Z
dc.date.available2019-02-12T06:54:10Z-
dc.date.issued2018en_US
dc.identifier.other2018cslm374en_US
dc.identifier.urihttp://144.214.8.231/handle/2031/9093-
dc.description.abstractText mining is a widely researched area in machine learning technology. With various models developed such as text categorization (Moulinier, 1996), Naïve Bayes Model (McCallum and Nigam, 1998) and words to vector machines (Mikolov, Chen, Corrado and Dean, 2013), text mining became a versatile solution in many real-life world problems. The extensive models available for text mining gives a vast opportunity for its application. There are numerous research that are established relevant to legal case documents processing. Each models incorporates different algorithms and approaches which results in a unique forte for each developed model. Although many model has been established, the degree of acceptance of these model are relatively low in real life situations (Remus and Levy, 2015). One among many reasons for rejections of these models is machine learning's limitation to process high-contextual information. This research project aims to utilize text mining and machine learning technology to address the mentioned above concern. Legal documents are often complicated and difficult to be understood by commoners (Howe and Wogalter, 1994). The project is meant to create a machine learning system which produces a categorized and summarized information derived from the original legal documents. The simplified document produced by the model is designed to ease the understanding of the legal documents. The research aims to build a predictive machine learning model by utilizing a series of algorithm to produce a comprehensive automatic summarization machine. Blei, Ng and Jordan's (2003) Latent Dirichlet Allocation algorithm is implemented for identifying the major topics of the legal documents. Word2vec technique (Mikolov et al, 2013) is applied afterwards to convert sentences into vector matices, generating a feature space for LexRank algorithm (Erkan and Radev, 2004) to compute connectivity matrix of intra sentences based on IDF-modified-cosine formula to summarize the corpus. The extracted information is consolidated into a single coherent document at the final stage.en_US
dc.titleMachine Learning Application: Classification and Summarization of Legal Documentsen_US
dc.contributor.departmentDepartment of Computer Scienceen_US
dc.description.supervisorSupervisor: Dr. Chun, Hon Wai Andy; First Reader: Dr. Hou, Junhui; Second Reader: Dr. Yu, Yuen Taken_US
Appears in Collections:Computer Science - Undergraduate Final Year Projects 

Files in This Item:
File SizeFormat 
fulltext.html147 BHTMLView/Open
Show simple item record


Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.

Send feedback to Library Systems
Privacy Policy | Copyright | Disclaimer