We can implement this algorithm using the following codes in MATLAB:
dbscan.m
dbscan.m
function [class,type]=dbscan(x,k,Eps) % x = dataset, k = no. of %points
within the radius & Eps as the radius
[m,n]=size(x);
if nargin<3 || isempty(Eps)
[Eps]=epsilon(x,k);
end
x = [[1:m]' x];
[m,n] = size(x);
type = zeros(1,m);
no = 1;
touched = zeros(m,1);
for i = 1:m
if touched(i) == 0;
ob = x(i,:);
D =
dist(ob(2:n),x(:,2:n));
ind =
find(D<=Eps);
if length(ind)>1 &&
length(ind)<k+1
type(i) =
0;
class(i)
= 0;
end
if length(ind)==1
type(i) =
-1;
class(i)
= -1;
touched(i) = 1;
end
if length(ind)>= k+1;
type(i) =
1;
class(ind) = ones(length(ind),1)*max(no);
while ~isempty(ind)
ob
= x(ind(1),:);
touched(ind(1)) = 1;
ind(1) = [];
D =
dist(ob(2:n),x(:,2:n));
i1
= find(D<=Eps);
if length(i1)>1
class(i1) = no;
if length(i1) >= k+1;
type(ob(1)) = 1;
else
type(ob(1)) = 0;
end
for i = 1:length(i1)
if touched(i1(i)) == 0
touched(i1(i)) = 1;
ind = [ind i1(i)];
class(i1(i)) = no;
end
end
end
end
no =
no+1;
end
end
end
i1 = find(class == 0);
class(i1) = -1;
type(i1) = -1;
function [Eps] = epsilon(x,k) % Analytical calculation of rad if %not given
[m,n] = size(x);
Eps =
((prod(max(x)-min(x))*k*gamma(.5*n+1))/(m*sqrt(pi.^n))).^(1/n);
function [D] = dist(i,x) %Distance Calculation
[m,n] = size(x);
D = sqrt(sum((((ones(m,1)*i)-x).^2)'));
if n == 1
D =
abs((ones(m,1)*i-x))';
end
plot_cluster.m
data =
[2,1;2,2;2,3;3,2;4,4;10,3;10,7;12,4;11,5;11,3.4];
[class,type] =
dbscan(data,3,3);
for k =
1:length(data)
if(type(k)==1)
plot(data(k,1),data(k,2),'ks');hold on;
elseif(type(k)
==0)
plot(data(k,1),data(k,2),'bo');hold on;
else
plot(data(k,1),data(k,2),'rx');hold on;
end
text(data(k,1),data(k,2),num2str(class(k)))
end
OUTPUT
The output shows the core points as square shaped, border points as circular and noise points as cross. And, the clusters are labeled as 1,2,....
The sample output is shown below:
The sample output is shown below:
I'm trying to use this code to cluster the data found here https://www.dropbox.com/s/y2gfioixz2rt083/XX.mat but it seems not working. I've also reduced the size as shown below but every time its only one cluster, I don't know why. So I'm asking if you could please assist me.
ReplyDeleteMany Thanks for your kind assistance and waiting for your reply.
data=XX(1:300,:);
[class,type] = dbscan(data,100,100);
for k = 1:length(data)
if(type(k)==1)
plot(data(k,1),data(k,2),'ks');hold on;
elseif(type(k) ==0)
plot(data(k,1),data(k,2),'bo');hold on;
else
plot(data(k,1),data(k,2),'gx');hold on;
end
text(data(k,1),data(k,2),num2str(class(k)))
end
Hi haggag,
DeleteBefore answering your question, I'd like to clarify the parameters passed to the dbscan() function. Here, data represents your dataset(m*n), k represents the number of objects in a neighborhood of an object i.e. minimal number of objects that can be considered as a cluster and finally Eps is the neighborhood radius, if not known, then you can easily avoid this parameter or put [], because there is epsilon() function that analytically calculates the radius if not specified.
So, when I tried with your dataset taking different values of k =2, 10,...the program is showing two clusters which seems to be rational based on your data distribution. Thus, my advice is- select appropriate value of k and avoid Eps if you are unknown about its value.
Thanks for your reply.
ReplyDeleteI've tested it again but the data now is 35000x3 so its taking so long, I even stopped the running after couple of hours. Anyway to make it more efficient? Thanks alot
Hi
ReplyDeleteFirts i want to thank for this blog and the effort on this code. I have a question, when i use my data, i can see the label of the clusters.
Ok, that's good. These labels represent the different clusters based on your data set & the DBSCAN parameters. But, if you are trying to say that the labels are not being displayed, then you should check your plot window's background. Otherwise, it will show label '1' near to the square or circle, even if there is only one cluster.
DeleteIt is funny because I use the DBSCAN at first to identify the outliers but only one class was generate, then i remove the outlier and use the function again and two classes were generate. I thought DBSCAN can handle noise or outliers, am i right?
DeleteAlgorithmically, DBSCAN marks those points as outliers/noise which are neither core nor the border points. And, of course, this algorithm is resistant to noise (it can eliminate those noise points) and can generate clusters of different shapes & sizes e.g. 'S', Oval shaped etc.
DeleteSo, as shown in figure above it shows three points as the noise points with single cluster and if I remove those points and run the program, then again it shows the single cluster with same points as the core and border points. So, I think the number of clusters should be the same before & after removing the noise points in your data set too. However, the other points should be kept intact without removal.
Jivan,
ReplyDeleteThanks a lot for your contribution!
Hi
ReplyDeleteWhen I implement DBSCAN in matlab I am getting negative values in CLASS [-1,1,2-1,2...
then how the assignment is done to which cluster it is ..
Can you explain..
Thank you in advance
This comment has been removed by the author.
ReplyDeleteHI!
ReplyDeleteThank you for the code.I have a small question or request- Can you share the source of the formula of "Eps" as used in the function "epsilon(x,k)" ?
Thank You in advance.
The function definition is given in the post above! The Eps function is used to calculate the radius if not given by using gamma function: http://www.mathworks.com/help/matlab/ref/gamma.html
DeletePlease find the definition of Eps:
function [Eps] = epsilon(x,k)
[m,n] = size(x);
Eps = ((prod(max(x)-min(x))*k*gamma(.5*n+1))/(m*sqrt(pi.^n))).^(1/n);
Hi,Jivan!
DeleteThank You for the prompt reply!
Yes,I have understood the formula,but is there any source for the formula? Like any publication,where the formula is proposed?
I went through a few papers and I found a paper properly explaining the efficient method for calculating the EPS parameter in DBSCAN.
Delete'Karami, Amin, and Ronnie Johansson. "Choosing dbscan parameters automatically using differential evolution." International Journal of Computer Applications 91.7 (2014).'
This comment has been removed by the author.
ReplyDeletehi when i was running this code i was getting an error
ReplyDeleteUndefined function or variable 'class'.
Error in dbscan (line 64)
i1 = find(class == 0);
so please help me resolving that error.