A light pollution map of the United States, a picture at night from a satellite orbiting the earth, is shown below.
Which of the following two representations more closely matches the way you think of this map?
Do you consider population density to be the mixture of distributions represented by the red spikes in the first option?
Or perhaps this mixture model is too passive for you, so that you prefer the air traffic representation in the second option showing separate airplane locations at some point in time.
On the other hand, if airplanes can be considered as messages passed between nodes with greater concentrations (i.e., cities with airports), then the R package performing affinity propagation, apcluster, offers the more "self-organizing" model shown in the second option with many possible ways of defining similarity or affinity. Ease of use should not be a problem with a webinar, a comprehensive manual, and a link to the original Science article. However, the message propagation algorithm requires some work to comprehend the details. Fortunately, one can run the analysis, interpret the output, and know enough not to make any serious mistakes without all the computational intricacies.
And the true representation is? As a marketer, I see it as a dynamic process with concentrations supported by the seaports, rivers, railroad tracks, roads, and airports that served commerce over time. Population clusters continually evolve (e.g., imagine Las Vegas without air travel). They are not natural kinds revealed by craving nature at its joints. Diversity comes in many shapes and forms, each requiring its own model with its unique assumptions concerning the underlying structures. More importantly, cluster analysis serves many different purposes with each setting its own criteria. Haven't we learned that one size does not fit all?