diff --git a/doc/DataSet.rst b/doc/DataSet.rst index 36d8c1c..f5093e8 100644 --- a/doc/DataSet.rst +++ b/doc/DataSet.rst @@ -89,6 +89,14 @@ Merge sourceDataSet in the current DataSet. It will update the value of points with the same identifier if overwrite is set to 1. ​To add columns instead, see the 'transformJoin' method of FluidDataSetQuery. +:message kNearest: + + :arg buffer: A |buffer| containing a data point to match against. The number of frames in the buffer must match the dimensionality of the DataSet. + + :arg k: The number of nearest neighbours to return. The identifiers will be sorted, beginning with the nearest. + + Returns the identifiers of the ``k`` points nearest to the one passed. Note that this is a brute force distance measure, and comparatively inefficient for repeated queries against large datasets. For such cases, :fluid-obj:`KDTree` will be more efficient. + :message print: Post an abbreviated content of the DataSet in the window by default, but you can supply a custom action instead. diff --git a/doc/KDTree.rst b/doc/KDTree.rst index 9c6d840..2252819 100644 --- a/doc/KDTree.rst +++ b/doc/KDTree.rst @@ -7,6 +7,8 @@ :discussion: :fluid-obj:`KDTree` facilitates efficient nearest neighbour searches of multi-dimensional data stored in a :fluid-obj:`DataSet`. + k-d trees are most useful for *repeated* querying of a dataset, because there is a cost associated with building them. If you just need to do a single lookup then using the kNearest message of :fluid-obj:`DataSet` will probably be quicker + Whilst k-d trees can offer very good performance relative to naïve search algorithms, they suffer from something called “the curse of dimensionality” (like many algorithms for multi-dimensional data). In practice, this means that as the number of dimensions of your data goes up, the relative performance gains of a k-d tree go down. :control numNeighbours: diff --git a/example-code/sc/DataSet.scd b/example-code/sc/DataSet.scd index 4dc6d95..bfb966f 100644 --- a/example-code/sc/DataSet.scd +++ b/example-code/sc/DataSet.scd @@ -250,4 +250,29 @@ fork{ } ) +:: +strong::Nearest Neighbour Search in a DataSet:: + +Note: A FluidDataSet can be queried with an input point to return the nearest match to that point. Note: This feature is can be computationally expensive on a large dataset, as it needs to compute the distance of the queried point to each point in the dataset. If you need to perform multiple nearest neighbour queries on a fluid.dataset~ it is recommended to use FluidKDTree. This facility is most useful with smaller, ephemeral datasets such as those returned by FluidDataSetQuery. + +code:: + +// create a small DataSet... +f = FluidDataSet(s) +// and fill it with a grid of data +f.load(Dictionary.newFrom(["cols", 2, "data", Dictionary.newFrom(9.collect{|i|["item-%".format(i), [i.div(3), i.mod(3)] / 2]}.flatten(1))])) + +// the data looks like this +// (item-0 -> [ 0.0, 0.0 ]) (item-1 -> [ 0.0, 0.5 ]) (item-2 -> [ 0.0, 1.0 ]) +// (item-3 -> [ 0.5, 0.0 ]) (item-4 -> [ 0.5, 0.5 ]) (item-5 -> [ 0.5, 1.0 ]) +// (item-6 -> [ 1.0, 0.0 ]) (item-7 -> [ 1.0, 0.5 ]) (item-8 -> [ 1.0, 1.0 ]) + +// create a query buffer... +b = Buffer.alloc(s,2) + +// and fill it with a point +b.sendCollection([1,0]); + +// and request 9 nearest neighbours +f.kNearest(b,9,{|x|x.postln;}) :: \ No newline at end of file