Skip to content Skip to sidebar Skip to footer

Vectorizing Outer Loop Of Euclidean Distance Using Numpy On Multi-dimensional Data

I have a 2D matrix of values. Each row is a data point. data = np.array( [[2, 2, 3], [4, 2, 4], [1, 1, 4]]) Now if my test point is a single 1D numpy array like: test =

Solution 1:

use broadcasting to do that :

from numpy.linalg import norm
norm(data-test[:,None],axis=2)

for

[ 1.          2.44948974  2.44948974]
[ 2.44948974  2.23606798  3.60555128]

Some explanations. It is easier to understand with different shapes, four and two points for exemple:

ens1 = np.array(
   [[2, 2, 3],
    [4, 2, 4],
    [1, 1, 4],
    [2, 4, 5]])


ens2 = np.array([[2,3,3],
                 [4,1,2]])  


In [16]: ens1.shape
Out[16]: (4, 3)

In [17]: ens2.shape
Out[17]: (2, 3)   

Then :

In [21]: ens2[:,None].shape 
Out[21]: (2, 1, 3) 

add a new dimension. now we can make the 2X4= 8 subtractions :

In [22]: (ens1-ens2[:,None]).shape
Out[22]: (2, 4, 3)       

and take the norm along last axis, for 8 distances :

In [23]: norm(ens1-ens2[:,None],axis=2)
Out[23]: 
array([[ 1.        ,  2.44948974,  2.44948974,  2.23606798],
       [ 2.44948974,  2.23606798,  3.60555128,  4.69041576]])     

Solution 2:

What about np.meshgrid?

import numpy as np

data = np.array(
   [[2, 2, 3],
    [4, 2, 4],
    [1, 1, 4]])


test = np.array([[2,3,3],
                 [4,1,2]])   


d = np.arange(0,3)
t = np.arange(0,2)
d, t = np.meshgrid(d, t)

# print test[t]
# print data[d]
print np.sqrt(np.sum((test[t]-data[d])**2,axis=2))  

output:

[[ 1.          2.44948974  2.44948974]
 [ 2.44948974  2.23606798  3.60555128]]

Solution 3:

You could use a list comprehension:

result = np.array([np.sqrt(np.sum((t - data)**2, axis=1)) for t in test])

Post a Comment for "Vectorizing Outer Loop Of Euclidean Distance Using Numpy On Multi-dimensional Data"