==================================================== HG-means Clustering: Mixture of Gaussian instances ==================================================== ---- WHAT Instructions about the format of mixtures of spherical Gaussians datasets. For each dataset, there is a data file, containing the coordinates of the samples, and an accompanying class file, containing the labels (classes) of each sample. ---- WHO Daniel Gribel (dgribel@inf.puc-rio.br) and Thibaut Vidal (vidalt@inf.puc-rio.br) ---- NAME OF DATA FILES The name of a Mixture of Gaussians data file follows the structure below: ------------- Gau-M-D.txt ------------- where M is the number of clusters (Gaussians distributions), and D is the dimensionality of data. Here is an example of the name of a data file: Gau-50-10.txt. ---- CONTENT OF DATA FILES In the first line of a data file, there is the number of data points (n) and the dimensionality of the data (d), separated by a single space. The remaining lines correspond to the coordinates of data points. Each line contains the values of the d features of a sample, where x_ij correspond to the j-th feature of the i-th sample of the data. Each feature value is separated by a single space, as depicted in the scheme below: =========================== | n d | --------------------------- | x_11 x_12 x_13 ... x_1d | --------------------------- | x_21 x_22 x_23 ... x_2d | --------------------------- | ... ... ... ... ... | --------------------------- | x_n1 x_n2 x_n3 ... x_nd | =========================== All Mixture of Gaussians instances consider means and variations uniformly selected in the ranges [0,5] and [1, 10], respectively. ---- NAME OF CLASS FILES The name of the files containing the classes (labels) of each data point follows the structure below: --------------- Y-Gau-M-D.txt --------------- where M is the number of clusters (Gaussians distributions), and D is the dimensionality of data. Here is an example of the name of a class file: Y-Gau-50-10.txt. ---- CONTENT OF CLASS FILES The content of the classes file exhibits the class of each sample of the dataset, i.e., the Gaussian distribution that generated the sample, where y_i correspond to the class (label) of the i-th sample: ===== y_1 ----- y_2 ----- ... ----- y_n =====