Skip to content Skip to sidebar Skip to footer

Labelled Datatypes Python

I am computing geodesic distances between a point and multiple line segments. Each line segment has a unique identifying number. I want to return distances from my distances functi

Solution 1:

So I think your overall problem is you are creating a DataFrame where the column label is the intercept value. I think what you want to do is create a DataFrame where one column contains the intercept values, while another contains the distances. I will try to give you code that I think will help, but it is hard to be certain without having your original data so you many need to modify it somewhat to get it to work perfectly.

First, I would modify vect_dist_funct so if the first argument is a scalar, it creates the correct-length list, and if the second is empty it returns NaN.

Next I would add all the useful values as columns to the DataFrame:

points['intersect'] = points['geometry'].apply(lambda x: np.array(tree_idx.intersection(x.bounds)))
points['polygons'] = points['intersect'].apply(lambda x: centroid.loc[x].values)
points['coords0'] = points['geometry'].apply(lambda x: x.coords[0])
points['dist'] = points.apply(lambda x: vect_dist_funct(x.coords0, x.polygons), axis=1)

This will give you a column with all the distances in it. If you really want the intercept values to be accessible, you can then create a DataFrame with just the intercepts and distances, and then put the intercepts as another multiindex level to avoid too many NaN values:

pairs = points.apply(lambda x: pd.DataFrame([x['intersect'], x['dist']], index=['intersect', 'dist']).T.stack(), axis=1)
pairs = pairs.stack(level=0).set_index('intersect', append=True)
pairs.index = pairs.index.droplevel(level=2)

This should give you a Series where the first index is the id, the second is the percent, the third is the intersect, and the value is the distance.

Solution 2:

So, I think a data-frame whose index is the labels is probably the simplest

distances = {25622 : 296780.2217658355,
 25621 : 296572.4476883276,
 25620 : 296364.21166884096,
 25619 : 296156.4366241771,
 25618 : 295948.6610171968}

df = pd.DataFrame([tup for tup in distances.items()],columns=["label", "dist"]).sort_values('dist').set_index('label')
df

Outputs:

    dist
label25618295948.66101725619296156.43662425620296364.21166925621296572.44768825622296780.221766

Then if you want to access a distance by label name

df.loc[25620]Out:dist296364.211669Name:25620,dtype:float64

And then if you want to find labels 'near' that point, you can get the row-number with

row_num = df.index.get_loc(25620)
print(row_num)
Out: 2

And then you can access "near" points with df.iloc[row_number]

df.iloc[3]Out:dist296572.447688Name:25621,dtype:float64

Does that cover everything you need?

Solution 3:

After everything, and after trying to make TheBlackCat's answer work for about 3 hours, I have decided to use xarray. So now the pointer function looks like this:

def pointer(point, centroid, tree_idx):
    intersect = list(tree_idx.intersection(point.bounds))
    if len(intersect) > 0:
        points = pd.Series([point.coords[0]]*len(intersect)).values
        polygons = centroid.loc[intersect].values
        dist = vect_dist_funct(points, polygons)
        sorter = np.argsort(dist)
        return xr.DataArray(dist[sorter], [('dim0', np.asarray(intersect)[sorter])])
    else:
        return xr.DataArray(np.nan)

Done. This works for my needs. I have the distances and the segment ID from which they have been computed together, such that transformations on one, affect the other. And the distances are still operable, and xarray also gives me advanced functionality in terms of grouping, merging, etc.

Also, this takes about a minute to run on 0.1% of the data for a state, and 10 minutes for 10% of the data. Therefore, I am expecting a 100% of the data is about 100 minutes. But honestly, even if it took 3 hours for a state, I can still finish all 50 states within a day (using multithreading on a 16 core server). So I am happy with this for the time being. Thanks to all suggestions I got. Especially @TheBlackCat, @michael_j_ward, and @hpaulj.

Post a Comment for "Labelled Datatypes Python"