Note that there are some explanatory texts on larger screens.

plurals
  1. PO
    primarykey
    data
    text
    <p>Ok, this took longer that I expected, but here's a more general answer that works with an arbitrary number of choices per individual. I'm sure there are simpler ways, so it would be great if somebody can chime in with something better for some of the following code.</p> <pre><code>df = pd.DataFrame( {'location' : ['A', 'A', 'A', 'B', 'B', 'B'], 'dist_to_A' : [0, 0, 0, 50, 50, 50], 'dist_to_B' : [50, 50, 50, 0, 0, 0], 'location_var': [10, 10, 10, 14, 14, 14], 'ind_var': [3, 8, 10, 1, 3, 4]}) </code></pre> <p>which gives</p> <pre><code> dist_to_A dist_to_B ind_var location location_var 0 0 50 3 A 10 1 0 50 8 A 10 2 0 50 10 A 10 3 50 0 1 B 14 4 50 0 3 B 14 5 50 0 4 B 14 </code></pre> <p>Then we do:</p> <pre><code>df.index.names = ['ind'] # Add choice var df['choice'] = 1 # Create dictionaries we'll use later ind_to_loc = dict(df['location']) # gives ind_to_loc equal to {0 : 'A', 1 : 'A', 2 : 'A', 3 : 'B', 4 : 'B', 5: 'B'} ind_dict = dict(df['ind_var']) #gives { 0: 3, 1 : 8, 2 : 10, 3: 1, 4 : 3, 5: 4} loc_dict = dict( df.groupby('location').agg(lambda x : int(np.mean(x)) )['location_var'] ) # gives {'A' : 10, 'B' : 14} </code></pre> <p>Now I create a Multi-Index and do a re-index to get a long shape</p> <pre><code>df = df.set_index( [df.index, df['location']] ) df.index.names = ['ind', 'location'] # re-index to long shape loc_list = ['A', 'B'] ind_list = [0, 1, 2, 3, 4, 5] new_shape = [ (ind, loc) for ind in ind_list for loc in loc_list] idx = pd.Index(new_shape) df_long = df.reindex(idx, method = None) df_long.index.names = ['ind', 'loc'] </code></pre> <p>The long shape looks like this:</p> <pre><code> dist_to_A dist_to_B ind_var location location_var choice ind loc 0 A 0 50 3 A 10 1 B NaN NaN NaN NaN NaN NaN 1 A 0 50 8 A 10 1 B NaN NaN NaN NaN NaN NaN 2 A 0 50 10 A 10 1 B NaN NaN NaN NaN NaN NaN 3 A NaN NaN NaN NaN NaN NaN B 50 0 1 B 14 1 4 A NaN NaN NaN NaN NaN NaN B 50 0 3 B 14 1 5 A NaN NaN NaN NaN NaN NaN B 50 0 4 B 14 1 </code></pre> <p>So now fill the NaN values with the dictionaries: </p> <pre><code>df_long['ind_var'] = df_long.index.map(lambda x : ind_dict[x[0]] ) df_long['location'] = df_long.index.map(lambda x : ind_to_loc[x[0]] ) df_long['location_var'] = df_long.index.map(lambda x : loc_dict[x[1]] ) # Fill in choice df_long['choice'] = df_long['choice'].fillna(0) </code></pre> <p>Finally, all that is left is creating dist_S<br> I'll cheat here and assume I can create a nested dictionary like this one</p> <pre><code>nested_loc = {'A' : {'A' : 0, 'B' : 50}, 'B' : {'A' : 50, 'B' : 0}} </code></pre> <p>(This reads: if you're in location A, then location A is at 0 km and location B at 50 km)</p> <pre><code>def nested_f(x): return nested_loc[x[0]][x[1]] df_long = df_long.reset_index() df_long['dist_S'] = df_long[['loc', 'location']].apply(nested_f, axis=1) df_long = df_long.drop(['dist_to_A', 'dist_to_B', 'location'], axis = 1 ) df_long </code></pre> <p>gives the desired result</p> <pre><code> ind loc ind_var location_var choice dist_S 0 0 A 3 10 1 0 1 0 B 3 14 0 50 2 1 A 8 10 1 0 3 1 B 8 14 0 50 4 2 A 10 10 1 0 5 2 B 10 14 0 50 6 3 A 1 10 0 50 7 3 B 1 14 1 0 8 4 A 3 10 0 50 9 4 B 3 14 1 0 10 5 A 4 10 0 50 11 5 B 4 14 1 0 </code></pre>
    singulars
    1. This table or related slice is empty.
    plurals
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
    1. This table or related slice is empty.
 

Querying!

 
Guidance

SQuiL has stopped working due to an internal error.

If you are curious you may find further information in the browser console, which is accessible through the devtools (F12).

Reload