Laster semester we utilize two kinds of clustering algorithms to do our analyze. The first one is distance based clustering, the second one is grid based clustering. Although logically they are very similar, both of them are forming clusters based on distances, they are different in doing this, and results can be different. Below is the logic of these 2 algorithms.
A. distance based clustering:
1. Buffering every single points with a distance which can be set by analyzers.
2. Merging circles which have larger overlaps than the setting number into clusters.
B. Grid based clustering
1. Set the distance of grid lines. Divide the target area by grid.
2. Locate points into cells, then look at neighbor cells of target cell. If there is point in theses neighbor cells, merge these points as core of a cluster.
3. Making convex hulls based on these cores of cluster. There is a parameter through which you can control the size of clusters.
Blow is the SQL for Grid based clustering
WITH clstrtags AS ( SELECT *, tag.geom as tgeom FROM gridcluster(30,’urbantag’,’geom’) as grid
JOIN urbantag as tag
ORDER BY rid,cid
counts AS (SELECT count(tagid) as count, clusterid, activity FROM clstrtags GROUP BY clusterid, activity),
countss AS (SELECT count(tagid) as count, clusterid FROM clstrtags GROUP BY clusterid)
select counts.clusterid, counts.activity as act, counts.count as actct,countss.count as tagid, counts.count/(countss.count + 0.00) as percentage
from counts join countss
order by clusterid