Add a blame option #6

burrscurr · 2021-01-05T10:04:10Z

As a user, I don't just want to know the aggregated quality measure of my recorded line (for instance the average deviation), but also which parts of my line were especially bad, thus contributing greatly to the end result.

This could be implemented with a --blame flag, that triggers creating an hierarchical clustering of deviation measurements along the line and outputs all sections that contribute more than a certain threshold (for instance 5%) to the end result.

The text was updated successfully, but these errors were encountered:

burrscurr · 2021-01-13T19:47:13Z

This represents a clustering problem (find areas with many points or high deviations, respectively). As a gps track very well could be composed of 10^5 or more points (especially when re-sampled/interpolated), the clustering algorithm must be efficient. Additionally, the algorithm should be able to find clusters of varying length and be robust regarding noise (e.g. a point that is close to the reference line, but surrounded by high-deviation points).

DBSCAN clustering algorithm fulfills these requirements. To translate the algorithm into linesman's domain, one iterate over the line's points and create a cluster when the deviation of surrounding points (epsilon parameter in the original algorithm) is higher than a certain threshold value (minPts parameter in the original algorithm). When a cluster is found, following points are added to the cluster as long as the points in their surrounding pass the minimum threshold. If a point does not meet the threshold, it is not added to any cluster.

When the clusters were calculated, each cluster can be evaluated regarding its influence on the end result (e.g. in form of a percentage) and be presented to the user.

User's could customize the neighborhood size of points to consider when evaluating whether a point is added to a cluster and the minimum threshold. The former represents the minimal length of a bad (high deviation) section, the latter one the sensitivity, e.g. how bad a section must be to be considered bad. This should both be powerful and understandable for users not familiar with DBSCAN or clustering in general. In any case, reasonable defaults should be provided, maybe based on the average deviation of the line.

Additionally, implementation becomes fairly simple compared to regular DBSCAN, since the points given here have linear order. That makes it really easy to calculate the neighborhood of each point. Complexity should be n^2 in worst case, but if the epsilon parameter is relatively small, more like n (which is great).

burrscurr added the enhancement New feature or request label Jan 5, 2021

burrscurr self-assigned this Jan 5, 2021

burrscurr added the cli Alters command line interface label Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a blame option #6

Add a blame option #6

burrscurr commented Jan 5, 2021

burrscurr commented Jan 13, 2021

Add a blame option #6

Add a blame option #6

Comments

burrscurr commented Jan 5, 2021

burrscurr commented Jan 13, 2021