저자 | Sungwoo Lee, Jaedong Lee, Jaekwang Kim, Jee-Hyong Lee |
---|---|
학회명 | International Conference on Robust Statistics |
학회명 (약자) | ICORS 2012 |
pp. | |
학회시작일 | 2012-08-05 |
학회종료일 | 2012-08-10 |
비고 |
We propose a method of estimation for blog topic variation using TFS(Term Frequency Smoothing) and PLSA(Probabilistic Latent Semantic Analysis).
In the earliest blogging services, the number of blogger published their own contents for daily life.
Over time, blog services as a business model came under the spotlight.
But, SNS were beginning to replace blog's function for personal life.
Bloggers were worried about a steady influx of visitors for profit.
In response to these changes, the purpose of blog was changed to accumulation of specialized information.
When bloggers want to run specialized and profitable blog service, the most important problem is how to select a blog topic.
To solve this problem, many related studies have been performed. But, these studies did not consider the temporal feature of blog.
So, the number of the extracted word were inaccurate.
We present a TFS reflecting the temporal feature of blog.
Blog contents are updated in chronological order.
And, over time, subject of blog contents that are continuously updated has high probability to be similar.
So, we propose the following formular.
First, term frequency of documents posted at a date and term frequency of documents posted before and after a date are added.
Then, term frequency in documents is smoothed at a date.
This consists of document-term matrix reflecting temporal feature of blog.
Through this method and PLSA,
we can estimate more accurately what documents and terms belong to any subject.
We extract documents and terms belonging to each subject following the formular.