Detail publikace
Time series clustering in large data sets
FEJFAR, J. ŠŤASTNÝ, J.
Český název
Time series clustering in large data sets
Anglický název
Time series clustering in large data sets
Typ
článek v časopise - ostatní, Jost
Jazyk
en
Originální abstrakt
The clustering of time series is a widely researched area. There are many methods for dealing with this task. We are actually using the Self-organizing map (SOM) with the unsupervised learning algorithm for clustering of time series. After the first experiment (Fejfar, Weinlichová, Šťastný, 2009) it seems that the whole concept of the clustering algorithm is correct but that we have to perform time series clustering on much larger dataset to obtain more accurate results and to find the correlation between configured parameters and results more precisely. The second requirement arose in a need for a well-defined evaluation of results. It seems useful to use sound recordings as instances of time series again. There are many recordings to use in digital libraries, many interesting features and patterns can be found in this area. We are searching for recordings with the similar development of information density in this experiment. It can be used for musical form investigation, cover songs detection and many others applications. The objective of the presented paper is to compare clustering results made with different parameters of feature vectors and the SOM itself. We are describing time series in a simplistic way evaluating standard deviations for separated parts of recordings. The resulting feature vectors are clustered with the SOM in batch training mode with different topologies varying from few neurons to large maps. There are other algorithms discussed, usable for finding similarities between time series and finally conclusions for further research are presented. We also present an overview of the related actual literature and projects.
Český abstrakt
The clustering of time series is a widely researched area. There are many methods for dealing with this task. We are actually using the Self-organizing map (SOM) with the unsupervised learning algorithm for clustering of time series. After the first experiment (Fejfar, Weinlichová, Šťastný, 2009) it seems that the whole concept of the clustering algorithm is correct but that we have to perform time series clustering on much larger dataset to obtain more accurate results and to find the correlation between configured parameters and results more precisely. The second requirement arose in a need for a well-defined evaluation of results. It seems useful to use sound recordings as instances of time series again. There are many recordings to use in digital libraries, many interesting features and patterns can be found in this area. We are searching for recordings with the similar development of information density in this experiment. It can be used for musical form investigation, cover songs detection and many others applications. The objective of the presented paper is to compare clustering results made with different parameters of feature vectors and the SOM itself. We are describing time series in a simplistic way evaluating standard deviations for separated parts of recordings. The resulting feature vectors are clustered with the SOM in batch training mode with different topologies varying from few neurons to large maps. There are other algorithms discussed, usable for finding similarities between time series and finally conclusions for further research are presented. We also present an overview of the related actual literature and projects.
Anglický abstrakt
The clustering of time series is a widely researched area. There are many methods for dealing with this task. We are actually using the Self-organizing map (SOM) with the unsupervised learning algorithm for clustering of time series. After the first experiment (Fejfar, Weinlichová, Šťastný, 2009) it seems that the whole concept of the clustering algorithm is correct but that we have to perform time series clustering on much larger dataset to obtain more accurate results and to find the correlation between configured parameters and results more precisely. The second requirement arose in a need for a well-defined evaluation of results. It seems useful to use sound recordings as instances of time series again. There are many recordings to use in digital libraries, many interesting features and patterns can be found in this area. We are searching for recordings with the similar development of information density in this experiment. It can be used for musical form investigation, cover songs detection and many others applications. The objective of the presented paper is to compare clustering results made with different parameters of feature vectors and the SOM itself. We are describing time series in a simplistic way evaluating standard deviations for separated parts of recordings. The resulting feature vectors are clustered with the SOM in batch training mode with different topologies varying from few neurons to large maps. There are other algorithms discussed, usable for finding similarities between time series and finally conclusions for further research are presented. We also present an overview of the related actual literature and projects.
Klíčová slova česky
shlukování, Samoorganizující se mapa
Klíčová slova anglicky
clustering, Self-organizing map
Rok RIV
2011
Vydáno
01.01.2011
ISSN
1211-8516
Časopis
Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis
Ročník
2011
Číslo
2
Strany od–do
75–80
Počet stran
6
BIBTEX
@article{BUT74807,
author="Jiří {Fejfar} and Jiří {Šťastný},
title="Time series clustering in large data sets",
journal="Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis",
year="2011",
volume="2011",
number="2",
month="January",
pages="75--80",
issn="1211-8516"
}