J. A. Martín Fernández, J. Palarea Albaladejo, J. A. Soto
A widely-used data set from Hartigan (1975) describes the percentage composition of 24 mammals' milk on the basis of 5 different constituents (water, protein, fat, lactose and ash). A 4-group solution has been usually considered as the optimal grouping of such data. Like most clustering techniques, the Fuzzy C-Means (FCM) algorithm is based upon a distance measure between objects. An appropriate distance between compositions should fulfill two main principles: scale invariance and subcompositional coherence. The Aitchison distance, which satisfies both principles, is equivalent to the usual Euclidean distance applied on coordinates of an orthonormal basis on the simplex. Note that the adequate metric is defined through log-ratio transformations and defines what is called the Aitchison geometry on the simplex. The FCM algorithm is applied on vectors of coordinates and the results suggest a grouping consistent with previous results, whilst particular features are now pointed out.
Palabras clave: fuzzy, log-ratio, simplex, distance
Programado
VA3 Clasificación y análisis multivariante 4
20 de abril de 2012 09:00
Sala Londres