Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    text
    copied!<p>Here is a solution. I'm not really sure that it's possible to vectorize it. If you want to make it resistant to "float comparing error" you should modify <code>is_less</code> and <code>is_greater</code>. The whole algo is just a binary search.</p> <pre><code>import numpy as np #lexicographicaly compare two points - a and b def is_less(a, b): i = 0 while i&lt;len(a): if a[i]&lt;b[i]: return True else: if a[i]&gt;b[i]: return False i+=1 return False def is_greater(a, b): i = 0 while i&lt;len(a): if a[i]&gt;b[i]: return True else: if a[i]&lt;b[i]: return False i+=1 return False def binary_search(a, x, lo=0, hi=None): if hi is None: hi = len(a) while lo &lt; hi: mid = (lo+hi)//2 midval = a[mid] if is_less(midval, x): lo = mid+1 elif is_greater(midval, x): hi = mid else: return mid return -1 def lex_sort(v): #sort by 1 and 2 column respectively #return v[np.lexsort((v[:,2],v[:,1]))] order = range(1, v.shape[1]) return v[np.lexsort(tuple(v[:,i] for i in order[::-1]))] def sort_and_index(arr): ind = np.indices((len(arr),)).reshape((len(arr), 1)) arr = np.hstack([ind, arr]) # add an index column as first column arr = lex_sort(arr) arr_cut = arr[:,1:] # an array to do binary search in arr_ind = arr[:,:1] # shuffled indices return arr_ind, arr_cut #lat1 = np.array(([1,2,3], [3,4,5], [5,6,7], [7,8,9])) # ~ 200000 rows lat1 = np.arange(1,800001,1).reshape((200000,4)) #lat2 = np.array(([3,4,5], [5,6,7], [7,8,9], [1,2,3])) # same number of rows as time lat2 = np.arange(101,800101,1).reshape((200000,4)) lat1_ind, lat1_cut = sort_and_index(lat1) time_arr = np.zeros(200000) import time start = time.time() for ii, elem in enumerate(lat2): pos = binary_search(lat1_cut, elem) if pos == -1: #Not found continue pos = lat1_ind[pos][0] #print "element in lat2 with index",ii,"has position",pos,"in lat1" print time.time()-start </code></pre> <p>The commented print is the place where you have corresponding indices of lat1 and lat2. Works for 7 seconds on 200000 rows. </p>
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload