Spatial Hashing in C++, part 2

  • Posted on: 10 August 2016

This is part 2 of a 3 part series. The first part covers spatial hashing in general, and some of the fundamental implementation details. The second part will cover a few algorithms, and the third part will describe some test cases.

The code is available here, but please note, it is still a work in progress / experiment.

Review and Warmup

In part 1, I briefly discussed what spatial hashing is, and why it might be useful. Then, we talked about how we'd implement some kind of 3D spatial hash container in C++. I went over the fundamental storage, came up with some type of iterator, and went over some basic operations.

In this part, I want to discuss a few "higher-level" algorithms or operations on the hash...the fun stuff.

As a sort of quick review or "warmup" for what's ahead, let's see an insertion function for the hash:

template <typename U>
void insert(U&& t)
{
	idx_t       idx = hash_func(t);
	m_bins[idx].push_back(std::forward<U>(t));
}

hash_func calls a function T::get_xyz(const T& t), which returns some 3D vector type, and then finds out which bin of the spatial has the t should be inserted into. m_bins is a std::map with keys that represent the bins' 3D index, and values of type std::vector, which represents the per-bin storage. I used the U&& / forward forwarding idiom. (Here's an article about that).

Seems simple enough. One issue, however, warrants further discussion: how do we handle T's other than points? For example, how about a triangle? For objects like these, the above function, along with the get_xyz boilerplate discussed in part 1, inserts the object in one bin, when parts of the entity might lie in other bins. Additionally, we could even have an object larger than 2 or 3 bins. This really causes a problem, because the whole point of the spatial hash is to be able to say things like, "for this item, I could only possibly collide with items that are also in this bin". If T's can be many bins wide, that's not the case. This presents a challenge.

As it stands, my hash3 "experiment" operates as follows:

  • regardless of type, hash3 inserts a T into one bin only. Other implementations have gone the other way with this. I wasn't a huge fan of having a lot of duplicate items in the container, but there are pros and cons.
  • Algorithms in hash3 that check for nearby objects to a given T check that element's bin, as well as "1-layer" of surrounding bins, therefore...
  • ...The user is on their own to size the bins such that objects aren't larger than 1 bin in any dimension. Hopefully, sometime in the future, I can cook up a check for this upon insertion and etc.

Okay, let's move on. There are more utility-type functions I should implement, but I want to move on to the higher-level stuff.

Nearest Neighbor Search

Given an object of type T, we want to search the hash for the nearest neighbor to that element. Applications like this are why spatial hashing is useful. Instead of searching all of our particles, for example, we just need to search the bin in which the test element would belong.

First, let's search a bin for an element (as always, I'm just posting snippets. Details can be found with the complete code):

//get the nearest neighbor to 'test' in bin 'idx'
//note this may not be the absolute nearest neighbor to test
//use this like: nearest_neighbor_in_bin(test) != get_bin(idx).end()
//note, nn.dist is really dist2
nearest nearest_neighbor_in_bin(const T& test, const idx_t& idx)
{
	nearest nn;
	nn.it = m_bins[idx].end();
	nn.dist = std::numeric_limits<double>::max();

	for(auto it = m_bins[idx].begin(); it != m_bins[idx].end(); it++)
	{
		const T& t = *it;

		//avoid finding self
		if(test == t){
			continue;
		}

		double cur_dist = dist2(
			value_t::get_xyz(t),value_t::get_xyz(test));

		if(cur_dist < nn.dist){
			nn.it = it;
			nn.dist = cur_dist;
		}
	}

	return nn;
}

Nothing too complicated there, even if it seems so at first. We supply a test item, and the bin it belongs to (or any bin id, I suppose). Then, amongst each item in that bin, we find the nearest T. At the end, we return an iterator to the nearest element, along with the squared distance between the test object and the nearest neighbor. Similar to STL algorithms, we can compare the iterator to get_bin(idx).end() to see if we've come up empty.

So that's a bin search. But, if we really set out to find the nearest neighbor in the entire hash, we have two problems:

  • A T in a neighboring bin might be closer to the closest T in "our" bin -- the bin given by hash_func(test).
  • The bin given by hash_func(test), and its neighbors, may all be empty, but that doesn't mean the whole spatial hash is empty. We may still have a nearest neighbor somewhere.

I think the coolest, and in some ways most efficient way to write up the complete nearest-neighbor algorithm would be to iteratively search further and further away bins. But for now, I did something a little simpler. For a given test object, we search the item's bin, and "1-layer" of surrounding bins. If we still haven't found an item at this point, we resort so searching the whole hash.

Here's some code:

    nearest nearest_neighbor(const T& test)
    {
        idx_t idx = hash_func(test);

        nearest nn;
        nn.it = m_bins[idx].end();
        nn.dist = std::numeric_limits<double>::max();

        if(this->size() == 0){
            return nn;
        }

        //this can be improved.  if we already have
        //a nn in bin idx, we don't need to search
        //bins that are further away, sim. to kd tree
        for(int i = idx.x - 1 ; i < idx.x + 1; i++){
        for(int j = idx.y - 1 ; j < idx.y + 1; j++){
        for(int k = idx.z - 1 ; k < idx.z + 1; k++)
        {
            idx_t idx_cur(i,j,k);
            nearest cur_near = nearest_neighbor_in_bin(
                test,idx_cur);

            if(cur_near.it != this->get_bin(idx_cur).end())
            {
                if(cur_near.dist < nn.dist){
                    nn.it = cur_near.it;
                    nn.dist = cur_near.dist;
                }
            }
        }}}

        //we found something
        if(nn.it != m_bins[idx].end()){
            return nn;
        }

        //resort to O(n) search
        return nearest_neighbor_naive(test);
    }

Note there is at least one nice performance improvement that could be made. If we searched the center bin first, we would have a nn.dist, if we found a neighbor in the center bin. If this distance is less than the minimum "straight-line" distance between test and a neighboring bin, we don't need to search that bin. There couldn't possibly be something closer in that bin.

Not perfect, but this will do for now.

Ray Tracing

Spatial hashing can also accelerate tray tracing. Ray tracing is most often associated with computer graphics and rendering, but also pops up in some numerical simulations and algorithms.

I won't go over ray tracing in detail, but idea of the method is rather simple: given a point in 3D space and a 3D direction, "trace" the direction from the selected point and see what entities are intersected or hit by our "ray". What we do with a "hit" depends on the application. For example, for rendering, if our ray hits an entity, we might want to compute a reflected ray, and then begin tracing that ray.

Of course, once you get into a real ray tracing application, things get more complex and difficult, but as our spatial hash is just the container, the application isn't our focus.

Each bin of our hash is basically a 3D box with side lengths hash3::m_d. If the ray doesn't intersect the bin, it can't possibly intersect the bin's entities.

Well, that is the simple explanation, and unfortunately, due to the design decision we made earlier, things are actually a little more complicated.

Since we decided to keep only one copy of each T in our spatial hash, and since a T's bin is based on T::get_xyz, we can certainly have entities that extend outside of its bin. So our assertion above, that if a ray doesn't intersect a bin, it can't possibly intersect its entities, is not correct. In fact, our bins are a little more "greedy".

As part of the "unique T" philosophy, I (somewhat questionably) left it to the user to make sure each entity in the hash has size less than hash3::m_d. Then, for a given bin of our spatial hash, we can say the following: If a ray doesn't intersect an axis-aligned bounding box defined by "one level" of surrounding bins, it can't possibly intersect any entities in that bin.

So for each bin, we need to generate a sort of "extended bounding box", and then test a ray for intersection. If the ray hits the extended bounding box, we have to check the bin's entities for hits.

The image below illustrates the situation:

To summarize the above, imagine we have a hash3 of triangles. The triangle above is stored in the center bin. The ray does not intersect the center bin, but clearly intersects something stored in that bin. We need to see if the ray intersects the center bin's "extended bounding box". If yes, we check all the center bin's entities for a hit.

Let's see some code we need to implement the above. First, an axis-aligned bounding-box class is useful. Here's a summary of that:


template <typename T>
class aabb
{
public:

	typedef vector3<T>              my_vect3_t;

    aabb() = default;
	aabb(const aabb& other) = default;

	aabb(const my_vect3_t& mn, const my_vect3_t& mx):
		m_min(mn),m_max(mx)
	{}

    //from http://www.scratchapixel.com/lessons/3d-basic-rendering/
    //minimal-ray-tracer-rendering-simple-shapes/ray-box-intersection

	template<typename RAY>
	bool ray_intersect(const RAY& r);
	
	~aabb() = default;

    my_vect3_t m_min;
    my_vect3_t m_max;
};

Our AABB class has 3D vectors representing the min. and max. of our bounding box, and a function to see if a given ray intersects it. The actual intersection function is from from http://www.scratchapixel.com/lessons/3d-basic-rendering/. RAY is a template parameter mostly to avoid maintaining my own ray class for hash3 :). I didn't want this project to creep into a complete graphics library.

Now our actual ray intersection function for the spatial hash:


    template<typename RAY, typename FUNC>
    void ray_intersect(const RAY& ray, const FUNC& f)
    {
        for( auto& bin : m_bins)
        {
            aabb<double> bb =
                get_bb<use_expanded_bb<T>::value>::get(
                    bin.first,m_d);

            if(bb.ray_intersect(ray))
            {
                //not sure what the best thing to do here is
                //but we'll leave it to the client to have
                //its own ref. to ray in one way or another
                for(/*const*/ auto& t: bin.second){
                    f(t);
                }
            }
        }
    }

The above mostly just tests each bin's extended bounding-box for a hit, and then does the ray intersection with each entity in the bin if the extended-bb is hit.

get_bb<use_expanded_bb<T>::value>::get is some trait-like machinery to get a bin's extended bounding-box, or something different if wanted (to implement that specialization). If we had a type (perhaps a point or particle) for which we knew we don't need such a big bounding box to test, we could use this machinery to get a smaller aabb. Performance-wise, making a bin less "ray-greedy" would be a win.

We could use ray_intersect like this:


       ray r(  myvect3d(12,12,12),
                myvect3d(-1/sqrt(3.0),-1/sqrt(3.0),-1/sqrt(3.0)));

        my_hash.ray_intersect(r,[&](particle& p)
        {
            ray_particle_intersection(r,p);
        });

As an example, the image below shows a ray, and shows bins that must be checked for intersection in red. If the above made sense, it should be clear why the ray doesn't need to hit all the red bins. If not, I suppose I've done a poor job explaining things.

Collisions

In part 1, I mentioned that one of the main purposes of using a data structure like a spatial hash is to improve the performance of pair-wise collisions. By pair-wise collisions, I basically mean testing each pair of entities, and doing something if they're touching or near enough to each other. This is a pretty common operation in any game engine, particle simulator, etc.

If we just stored all of our particles, for example, in a vector, we'd end up with some code like this:

for(particle& p1 : particles_vector)
{
	for(particle& p2 : particles_vector){
		collide_f(p1,p2);
	}
}

If we have for example, 10,000 particles, we have to call collide_f 100,000,000 times! Our goal here is call collide_f fewer times.

Obviously, with a spatial hash, we know that T's stored in a far-away bin can't possibly collide or intersect with something in "this" bin, so we don't have to do that test. In fact, we only have to check items in a given entity's bin, and its neighboring bins.

Exact performance gains can be hard to calculate, but we certainly avoid the pure O(n^2) calculation from the code above. While this appears to be a win, and it often is, there are some cases where we won't see much improvement. This will be discussed more in part 3.

Instead of writing a function to collide the whole hash, I just wrote a function to test for collision / collide a given T with the relevant T's in the hash. This gives us a little flexibility, since const T& test may or may not refer to something in the hash.

The function below implements the idea I've discussed:

template<typename FUNC>
void for_each_neighbor(const T& test, const FUNC& f)
{
	idx_t idx = hash_func(test);

	if(this->size() == 0){
		return;
	}

	//for each neighbor (and this) bin
	for(int i = idx.x - 1 ; i < idx.x + 1; i++){
	for(int j = idx.y - 1 ; j < idx.y + 1; j++){
	for(int k = idx.z - 1 ; k < idx.z + 1; k++)
	{
		idx_t idx_cur(i,j,k);

		if(m_bins.find(idx_cur) == m_bins.end()){
			continue;
		}

		typename bin_type<T>::type& cur_bin = m_bins[idx_cur];

		//still a little work to do, what should
		//this type be?
		for(T& t: cur_bin)
		{
			if(t == test){
				continue;
			}

			f(test,t);
		}
	}}}

	return;
}

Here, for a given test item, we call the user-defined function f(T,T) on all the entities in test's bin, and the neighboring bins. There's a little extra machinery to see if we have a neighboring bin at all, and also to avoid self-collisions. Here, f has the responsibility of testing for collision, and also doing whatever happens if they are indeed colliding. Here's an example f:

//note: this is not real physics
auto collide_f = [](particle& p1, particle& p2)
{
	double d2 = (p1.m_r.x-p2.m_r.x)*(p1.m_r.x-p2.m_r.x) +
				(p1.m_r.x-p2.m_r.y)*(p1.m_r.x-p2.m_r.y) +
				(p1.m_r.x-p2.m_r.z)*(p1.m_r.x-p2.m_r.z);

	if(d2 < 1e-5){
		p1.m_v.x *= -1.0;
		p1.m_v.y *= -1.0;
		p1.m_v.z *= -1.0;
	}
}

In part 1, we showed how we developed an iterator for the spatial hash. We can combine that with our new for_each_neighbor function as follows:

std::for_each(my_hash.begin(), my_hash.end(),
	[&](particle &p)
	{
		my_hash.for_each_neighbor(p,collide_f);
	});

The above will compute all collisions in my_hash, as defined by collide_f, and then do the collision action. Note that if we wish to use std::for_each, we have to setup our collision function to only do "half" of the collision action. For the current example, that means we only do the collision action on p1 in collide_f, even though at the end of all colliding, both particles will be altered if they've collided. That might be less than optimal, efficiency wise, but the ability to use the idiom above is pretty nice.

Real Pair-wise Operations

As an aside, if we really did want to do a true pair-wise operation, we could of course do something like:

std::for_each(my_hash.begin(), my_hash.end(),
    [&](particle &p1)
    {
            std::for_each(my_hash.begin(), my_hash.end(),
        [&](particle &p2)
        {
            some_function(p1,p2);
        });
      });

This requires the forward iterator developed in part 1.

Colliding With Other Types

This is just a little kernel of an idea. Another thing to implement!

What if we wanted to collide an item of type other than T with the hash? That might be useful sometimes. One example is the case of having something like hash3<T*>, and we want to do something along the lines of for_each_neighbor(a_derived_type(arg),some_collision_function).

To do this, we would need to change our declaration of for_each_neighbor to:

template<typename U>
void for_each_neighbor(const U& test, const std::function<void(T,U)>& f )
{
//...
}

We then probably have to cook up a type-trait to avoid self-comparison if U is not the same as T.

Wrap-Up

The goal of this article was mainly to introduce a few high-level (ie, fun) algorithms over a spatial hash. To that end, I discussed nearest neighbor, collisions, and ray-tracing.

In both part 1 and part 2, I mentioned a laundry list of to-dos and potential improvements. I think I'm going to dink around with those awhile before I write part 3, so that one might be a while.

Some of these todos are:

  • More real testing, on a class other than particle.
  • Take responsibility for my "rule" that T'scan't be larger than a bin.
  • Experiment with alternatives to std::map
  • Implement for_each_neighbor with other types, as mentioned above.
  • Many more!

In part 3, I plan to do some real benchmarking, for example, how much faster is the hash-based collision than the naive version?

Meta note: I welcome all feedback and critique on twitter! I don't do comments on here because it's a tad labor intensive.