Implementation of DBSCAN Algorithm in MATLAB

We can implement this algorithm using the following codes in MATLAB:

dbscan.m

function [class,type]=dbscan(x,k,Eps) % x = dataset, k = no. of %points within the radius & Eps as the radius

[m,n]=size(x);

if nargin<3 || isempty(Eps)
   [Eps]=epsilon(x,k);
end

x = [[1:m]' x];
[m,n] = size(x);
type = zeros(1,m);
no = 1;
touched = zeros(m,1);

for i = 1:m
    if touched(i) == 0;
       ob = x(i,:);
       D = dist(ob(2:n),x(:,2:n));
       ind = find(D<=Eps);
   
       if length(ind)>1 && length(ind)<k+1      
          type(i) = 0;
          class(i) = 0;
       end
       if length(ind)==1
          type(i) = -1;
          class(i) = -1; 
          touched(i) = 1;
       end

       if length(ind)>= k+1;
          type(i) = 1;
          class(ind) = ones(length(ind),1)*max(no);
         
          while ~isempty(ind)
                ob = x(ind(1),:);
                touched(ind(1)) = 1;
                ind(1) = [];
                D = dist(ob(2:n),x(:,2:n));
                i1 = find(D<=Eps);
    
                if length(i1)>1
                   class(i1) = no;
                   if length(i1) >= k+1;
                      type(ob(1)) = 1;
                   else
                      type(ob(1)) = 0;
                   end

                   for i = 1:length(i1)
                       if touched(i1(i)) == 0
                          touched(i1(i)) = 1;
                          ind = [ind i1(i)];  
                          class(i1(i)) = no;
                       end                   
                   end
                end
          end
          no = no+1;
       end
   end
end

i1 = find(class == 0);
class(i1) = -1;
type(i1) = -1;



function [Eps] = epsilon(x,k) % Analytical calculation of rad if %not given

[m,n] = size(x);

Eps = ((prod(max(x)-min(x))*k*gamma(.5*n+1))/(m*sqrt(pi.^n))).^(1/n);



function [D] = dist(i,x) %Distance Calculation

[m,n] = size(x);
D = sqrt(sum((((ones(m,1)*i)-x).^2)'));

if n == 1
   D = abs((ones(m,1)*i-x))';
end

plot_cluster.m
data = [2,1;2,2;2,3;3,2;4,4;10,3;10,7;12,4;11,5;11,3.4];
[class,type] = dbscan(data,3,3);
for k = 1:length(data)
    if(type(k)==1)
       plot(data(k,1),data(k,2),'ks');hold on;
    elseif(type(k) ==0)
            plot(data(k,1),data(k,2),'bo');hold on;
    else
            plot(data(k,1),data(k,2),'rx');hold on;
    end
    text(data(k,1),data(k,2),num2str(class(k)))
   
end

OUTPUT
The output shows the core points as square shaped, border points as circular and noise points as cross. And, the clusters are labeled as 1,2,....

The sample output is shown below:


DBSCAN output with [class,type] = dbscan(data,3,3) and 'data' as specified in the code

16 comments:

  1. I'm trying to use this code to cluster the data found here https://www.dropbox.com/s/y2gfioixz2rt083/XX.mat but it seems not working. I've also reduced the size as shown below but every time its only one cluster, I don't know why. So I'm asking if you could please assist me.

    Many Thanks for your kind assistance and waiting for your reply.

    data=XX(1:300,:);
    [class,type] = dbscan(data,100,100);
    for k = 1:length(data)
    if(type(k)==1)
    plot(data(k,1),data(k,2),'ks');hold on;
    elseif(type(k) ==0)
    plot(data(k,1),data(k,2),'bo');hold on;
    else
    plot(data(k,1),data(k,2),'gx');hold on;
    end
    text(data(k,1),data(k,2),num2str(class(k)))

    end

    ReplyDelete
    Replies
    1. Hi haggag,
      Before answering your question, I'd like to clarify the parameters passed to the dbscan() function. Here, data represents your dataset(m*n), k represents the number of objects in a neighborhood of an object i.e. minimal number of objects that can be considered as a cluster and finally Eps is the neighborhood radius, if not known, then you can easily avoid this parameter or put [], because there is epsilon() function that analytically calculates the radius if not specified.

      So, when I tried with your dataset taking different values of k =2, 10,...the program is showing two clusters which seems to be rational based on your data distribution. Thus, my advice is- select appropriate value of k and avoid Eps if you are unknown about its value.

      Delete
  2. Thanks for your reply.
    I've tested it again but the data now is 35000x3 so its taking so long, I even stopped the running after couple of hours. Anyway to make it more efficient? Thanks alot

    ReplyDelete
  3. Hi
    Firts i want to thank for this blog and the effort on this code. I have a question, when i use my data, i can see the label of the clusters.

    ReplyDelete
    Replies
    1. Ok, that's good. These labels represent the different clusters based on your data set & the DBSCAN parameters. But, if you are trying to say that the labels are not being displayed, then you should check your plot window's background. Otherwise, it will show label '1' near to the square or circle, even if there is only one cluster.

      Delete
    2. It is funny because I use the DBSCAN at first to identify the outliers but only one class was generate, then i remove the outlier and use the function again and two classes were generate. I thought DBSCAN can handle noise or outliers, am i right?

      Delete
    3. Algorithmically, DBSCAN marks those points as outliers/noise which are neither core nor the border points. And, of course, this algorithm is resistant to noise (it can eliminate those noise points) and can generate clusters of different shapes & sizes e.g. 'S', Oval shaped etc.

      So, as shown in figure above it shows three points as the noise points with single cluster and if I remove those points and run the program, then again it shows the single cluster with same points as the core and border points. So, I think the number of clusters should be the same before & after removing the noise points in your data set too. However, the other points should be kept intact without removal.

      Delete
  4. Jivan,

    Thanks a lot for your contribution!

    ReplyDelete
  5. Hi
    When I implement DBSCAN in matlab I am getting negative values in CLASS [-1,1,2-1,2...
    then how the assignment is done to which cluster it is ..
    Can you explain..

    Thank you in advance

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. HI!
    Thank you for the code.I have a small question or request- Can you share the source of the formula of "Eps" as used in the function "epsilon(x,k)" ?
    Thank You in advance.

    ReplyDelete
    Replies
    1. The function definition is given in the post above! The Eps function is used to calculate the radius if not given by using gamma function: http://www.mathworks.com/help/matlab/ref/gamma.html

      Please find the definition of Eps:

      function [Eps] = epsilon(x,k)
      [m,n] = size(x);
      Eps = ((prod(max(x)-min(x))*k*gamma(.5*n+1))/(m*sqrt(pi.^n))).^(1/n);

      Delete
    2. Hi,Jivan!
      Thank You for the prompt reply!
      Yes,I have understood the formula,but is there any source for the formula? Like any publication,where the formula is proposed?

      Delete
    3. I went through a few papers and I found a paper properly explaining the efficient method for calculating the EPS parameter in DBSCAN.

      'Karami, Amin, and Ronnie Johansson. "Choosing dbscan parameters automatically using differential evolution." International Journal of Computer Applications 91.7 (2014).'

      Delete
  8. This comment has been removed by the author.

    ReplyDelete
  9. hi when i was running this code i was getting an error
    Undefined function or variable 'class'.

    Error in dbscan (line 64)
    i1 = find(class == 0);
    so please help me resolving that error.

    ReplyDelete