Please use this identifier to cite or link to this item:
http://dspace.cityu.edu.hk/handle/2031/8707
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Liu, Jiacheng | en_US |
dc.date.accessioned | 2017-03-08T06:22:15Z | |
dc.date.accessioned | 2017-09-19T08:51:11Z | |
dc.date.accessioned | 2019-02-12T06:53:22Z | - |
dc.date.available | 2017-03-08T06:22:15Z | |
dc.date.available | 2017-09-19T08:51:11Z | |
dc.date.available | 2019-02-12T06:53:22Z | - |
dc.date.issued | 2016 | en_US |
dc.identifier.other | 2016csljc374 | en_US |
dc.identifier.uri | http://144.214.8.231/handle/2031/8707 | - |
dc.description.abstract | This project aims at building an Internet application to retrieve user information from publicly accessible data, and gain a comprehensive understanding on a certain online account. Currently the project can cover the scope of Sina's Weibo.com, the most used microblog service in China. It has the ability to start with either a user id, name or homepage link, collect all the required information with a real-time crawler, and study the user from various perspectives. At the end of the study, a comprehensive report will be shown in the form of webpages, providing information including but not limited to basic attributes of the user and close online friends, tags or keywords that the user is most interested in, the network of the user's interactivity with others, a change in topics or activity in the timeline, and finally predictions on the user's life style. The ultimate goal of this web application is to integrate the user study of multiple social networks, although it concentrates on Weibo.com in the two-semester scope. So it was designed and developed with flexibility and scalability both for future functions and more servers. After two semester's development, the application now has realized all the functions mentioned above, with 9 work modules and approximately 25,000 lines of code including front end, back end and extensive scripts. In terms of machine learning and data mining, it has applied various approaches on classification and clustering. An ensemble of multiple tree models and linear models such as Random Forest, Gradient Boosting Trees, Logistic Regression and so on. K-means++ and Latent Dirichlet Allocation are used to cluster the microblog contents to gain insights from different approaches. Regarding software engineering, this project is one attempt in Grails, a comparatively young and rapidly growing framework on top two established giants in Java industry environments, Spring and Hibernate. This is integrated with another renowned front-end framework Bootstrap 3, to make this project a hybrid of two state-of-the-art frameworks. This application also exploits the advantages of multiple languages, especially the mature libraries behind them, making it able to fuse the merits of different development environments. Moreover, this application visualizes the insights gained from the multiple modules in the front end, mostly in the form of various interactive graphs and diagrams. This makes the result both informative and clear to perceive, bringing more simplicity in future social network user studies. | en_US |
dc.rights | This work is protected by copyright. Reproduction or distribution of the work in any format is prohibited without written permission of the copyright owner. | en_US |
dc.rights | Access is restricted to CityU users. | en_US |
dc.title | User's public privacy mining software | en_US |
dc.contributor.department | Department of Computer Science | en_US |
dc.description.supervisor | Supervisor: Dr. Li, Shuaicheng; First Reader: Dr. Kwok, Lam For; Second Reader: Prof. Jia, Xiaohua | en_US |
Appears in Collections: | Computer Science - Undergraduate Final Year Projects |
Files in This Item:
File | Size | Format | |
---|---|---|---|
fulltext.html | 146 B | HTML | View/Open |
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.