TITLE BASED SCIENTIFIC JOURNAL CLUSTERING

SANTOSO, YOSEFINA OKTAVIANI (2019) TITLE BASED SCIENTIFIC JOURNAL CLUSTERING. Other thesis, UNIKA SOEGIJAPRANATA SEMARANG.

[img] Text (COVER)
15.K1.0014 YOSEFINA OKTAVIANI SANTOSO (9)..pdf COVER.pdf

Download (702kB)
[img] Text (BAB I)
15.K1.0014 YOSEFINA OKTAVIANI SANTOSO (9)..pdf BAB I.pdf

Download (66kB)
[img] Text (BAB II)
15.K1.0014 YOSEFINA OKTAVIANI SANTOSO (9)..pdf BAB II.pdf
Restricted to Registered users only

Download (203kB)
[img] Text (BAB III)
15.K1.0014 YOSEFINA OKTAVIANI SANTOSO (9)..pdf BAB III.pdf

Download (66kB)
[img] Text (BAB IV)
15.K1.0014 YOSEFINA OKTAVIANI SANTOSO (9)..pdf BAB IV.pdf

Download (147kB)
[img] Text (BAB V)
15.K1.0014 YOSEFINA OKTAVIANI SANTOSO (9)..pdf BAB V.pdf

Download (361kB)
[img] Text (BAB VI)
15.K1.0014 YOSEFINA OKTAVIANI SANTOSO (9)..pdf BAB VI.pdf

Download (61kB)
[img] Text (DAFTAR PUSTAKA)
15.K1.0014 YOSEFINA OKTAVIANI SANTOSO (9)..pdf DAPUS.pdf

Download (67kB)
[img] Text (LAMPIRAN)
15.K1.0014 YOSEFINA OKTAVIANI SANTOSO (9)..pdf LAMP.pdf

Download (512kB)

Abstract

Scientific journals develop very rapidly along with the development of science. Reporting from labs.semanticscholar.org/corpus, the number of scientific journals has reached over 39 million. A large number of scientific journals makes it challenging to grouping scientific journals. Grouping becomes more difficult because each scientific journal can have more than one topic. Therefore, special methods are needed to group the scientific journals. One of the well-known topic modeling methods is Latent Dirichlet Allocation (LDA). This research is an implementation of the LDA algorithm to do topic modeling in scientific journals. The topic modeling in this study uses the title as a corpus. Various titles are processed into a bag of words in the pre-processing process so that they can be used to distribute. The results of the distribution stage are used for sampling with the Gibbs Sampling method. Through the sampling process, testing can also be done to determine the optimal parameters. The testing in this study used perplexity to find the most optimal number of iterations and topics. The result from this research is that the LDA Algorithm successfully performs topic modeling in scientific journals by generating a list of keywords for each topic and grouping documents on each topic. The optimal parameters based on the results of perplexity comparison are 3 topics and 500 iterations. Keyword: Topic Modeling, LDA, perplexity, scientific journal

Item Type: Thesis (Other)
Subjects: 000 Computer Science, Information and General Works > 050 Magazines, journals & serials
Divisions: Faculty of Computer Science > Department of Informatics Engineering
Depositing User: Mr Lucius Oentoeng
Date Deposited: 10 Jul 2019 08:22
Last Modified: 10 Nov 2020 05:39
URI: http://repository.unika.ac.id/id/eprint/19651

Actions (login required)

View Item View Item