Spatial Hashing in C++, part 1

  • Posted on: 23 July 2016

This is part 1 of a 3 part series. The first part covers spatial hashing in general, and some of the fundamental implementation details. The second part will cover a few algorithms, and the third part will describe some test cases.

The code is available here, but please note, it is still a work in progress

Lately, I have been experimenting with a spatial hashing class template in C++. It has been fun, interesting, and, maybe, useful. As noted above, I plan to write three posts about this. In this post, I'm going to talk a bit about spatial hashing, and discuss some of the "container"-related stuff, like fundamental storage and an iterator.

Introduction to Spatial Hashing

A spatial hash is basically a data structure that subdivides 3D space into uniformly-sized "bins" or "buckets", and then stores our data elements in these bins.

The image below shows a visualization of spatial hashing. It suffers a bit from 3d/perspective issues, but hopefully, the idea is clear. We have a sort of uniform grid, and the boxes on that grid contain our data elements, which appear as points in the image below. In reality, we want our hash to be able to store anything that has a position in 3D space -- not just points. We'll return to this idea later.

Spatial hashing can help us speed up certain operations. For example, imagine we are storing a cloud of 1,000 particles. For each particle, perhaps we want to collide it with all the other particles in the cloud. If we just stored the particles in an array, our collision function would look something like:

std::vector<particle> particles_copy(1000);

//stuff

for(particle& p1 : particles_copy)
{
     for(particle& p2 : particles_copy){
         collide_f(p1,p2);
     }
}

We have to call collide_f 1,000*1,000 = 1 million times (Okay, perhaps half of that, depending on how you go at it).

In a spatial hash, we know that each particle could only possibly collide with other particles in the same bin, so for each particle, we should end up with many fewer calls to collide_f:


spatial_hash<particle> particles;

//stuff

for(particle& p1 : particles)
{
    bin_t b = get_bin(p1);
    for(particle& p2 : b){
        collide_f(p1,p2);
    }
}

A spatial hash can be used to improve the performance of other operations, like nearest neighbor search, or ray-tracing operations.

For a given element, we obtain which bin it should go in, or which bin contains its neighbors, by applying a hash function to the element. In a more simple sense, for a given element, we just need a way to get the right bin. In other words, we need some form of an associative array. We could use something like a hash table, or we could use something like a map or dictionary (okay, these are all really maps, conceptually).

So, that's a brief overview of what we're trying to get done, as well as the reason for doing so. I kept this brief on purpose, since there are a lot of good spatial hashing articles around already:

The C++ Spatial Hash

What would the fundamentals of a C++ spatial hash look like? I’ll write it down, and then discuss it:

template <typename T>
class hash3
{
    my_vect3_t  m_d;
    std::map< int3, std::vector<T> > m_bins;    
};

First we have some 3D vector member, m_d. This is the 3d size of each bin in the spatial hash.

We use a std::map as our associative array. Each bin of our hash is just a simple std::vector of Ts. Note that our std::map can be used in a way that it only stores non-empty bins. For the keys, we use a simple int3 type -- something like:

class int3{
    int i, j,k;
};

Performance wise, there are better options than std::map. We could have used unordered_map or unordered_multimap as in one of the links above, but I'd say the std::map as used above is simplest, especially for non-C++ gurus. We don't even have to write a real hash function! We can always change the underlying storage later on.

Okay, now let's try to write a function that performs the most fundamental operation with our spatial hash. Given a T, how do we find what bin it goes into? Or, for a T, how do we get the bin that contains its near neighbors?

This is our “hash function”, even though it might not really be one. It’s more of an “address calculator” or a “key calculator”. It gives us the key for our std::map for which the value is the bin we want, provided we have a way to get t's spatial information:

int3 hash3::hash_func(const T& t)
{
        idx_t ret;
        my_vect3_t  r = my_vect3_t( value_t::get_xyz(t));

        ret.x = r.x / this->m_d.x;
        ret.y = r.y / this->m_d.y;
        ret.z = r.z / this->m_d.z;

        // a / b is int(a) / int(b) for >=, ie,
        //bins are [**,**)
        if(r.x < 0.0){ret.x--;}
        if(r.y < 0.0){ret.y--;}
        if(r.z < 0.0){ret.z--;}

        return ret;
}

Nothing overly complicated there. In a 1D sense, bin 0 goes from 0.0 to m_d.x, and bin -1 goes from -m_d.x, to 0.0. Using that logic, we compute a int3 to be used with our std::map. We use value_t::get_xyz to get t's position. We'll discuss that more later.

At this point, we can perhaps see how we'd insert something into the hash:

hash3<particle> foo;
particle baz(1.0,1.0,1.0);

foo.m_bins[ foo.hash_func(baz) ].push_back( baz );

Okay, that’s the guts of our C++ spatial hashing template. We have the underlying storage set up, and we have some idea how to do the “hashing”.

Design Goals: What Do We Really Want This Thing To Do?

Now, let’s discuss the design goals of the template. What do we want this thing to do?

  • Like std::vector, we want to be able to use any T we can think of.
  • We want some semblance of an iterator, so we can go through all the elements of the spatial hash with eg std::for_each
  • We want to implement the operations that a user would expect, like insert, collide, find_nearest, etc

Let’s go over the first two items in a little more detail. We'll cover the third point in part 2 of this series. For more insight, you can always take a look at the code.

Use anything for T.

Since we use std::vector as our bin type, which can already handle anything for T, we have a pretty good head start here.

However, we have some additional work to do, since for each different type T, we need a way to get its 3D position, or something that has members .x, .y, and .z. We can probably use some SFINAE trickery to do this, but I opted to enforce some boilerplate on the T’s used in the hash. I suppose there are pros and cons and here.

To see this in action, let’s look at an example particle class:

class particle
{
    public:

    /***
     * to work with hash3, the below needs to be
     * added -- one typedef and the two get_xyz
     * functions. The get_xyz only can return anything
     * that has T.x, T.y, T.z
     **/

    typedef double num_type;
    typedef myvect3d vect_type;

    static const vect_type& get_xyz(const particle& p){
            return p.m_r;
    }

    static const vect_type& get_xyz(const particle* p){
            return p->m_r;
    }

    /***
     * end hash3 boilerplate
     **/

    particle(const vect_type& v, const vect_type& r,
        long id):
        m_r(r),m_v(v),m_idx(id)
    {}

    particle(const particle& x):
        m_r(x.m_r),m_v(x.m_v),m_idx(x.m_idx)
    {
    }

    //more methods here

    vect_type     m_r;
    vect_type     m_v;
    long int    m_idx;

};

The hash3 template will look for get_xyz(const particle& p) if you have a hash3<particle>, and get_xyz(const particle* p) if you have a hash3<particle*>. Perhaps, it would have been a better decision to have these functions be external, rather than static members.

The spatial hash also has its own quick and dirty 3D vector class, but doesn’t require the client code (eg class particle) to use it. class particle must only implement the specified boilerplate (sorry), which returns any type that has .x, .y, and .z, and has operator==.

There is a little more work to do to get hash3 to work nice with both T and T*, but it all stems from the need to get a 3D position for either the T or the T*.

One goal is to minimize the change in semantics for T and T*, eg, places where we need t. or t->. That is not a great idea for a generic container. Luckily, with the above-mentioned boilerplate, we never need . or ->. Unfortunately, we do need something like a remove_ptr trait to get a value_t so we can say value_t::get_xyz(t). This is probably another place where SFINAE trickery could be used, but hash3 currently just has something like:

template<typename T>
struct remove_ptr{
    using type = T;
};

template<typename T>
struct remove_ptr<T*>{
    using type = T;
};

template<typename T>
struct remove_ptr<std::shared_ptr<T>>{
    using type = T;
};

//more *-like things we know about

That's not a perfect solution, so there is room for improvement here.

hash3<particle*> and hash3<particle> being on equal footing is obviously important if we are truly aiming to create a generic container, but also allows for an important performance improvement, which will be discussed later on in part two.

A Simple FowardIterator

The goal here is to be able to iterate over all the elements in the hash, so that we can take advantage of existing STL algorithms, like std::for_each:

        hash3::hash3<particle> storage(particles,0.1,0.1,0.1);

            std::for_each(storage.begin(), storage.end(),
                [&advance_f,dt](particle &p)
                {   
                    //advance the particle
                    advance_f(p,dt);
                });

Conceptually, when we are iterating through the spatial hash, we are iterating through the map, and then, for each bin, we iterate through the bin, which is a std::vector. In some sense, we want our iterator to mock the loop below:

for( const auto& bin : m_bins){
    for(const auto& t : bin.second){
            // ... an element-wise operation
    }
}

The krux of the iterator can be understood by looking at operator++ (int):

        ForwardIterator operator++ (int) // Post-increment
        {
            ForwardIterator tmp(*this);

            iter_vect_iter_t new_vect_it =
                m_hash->next_vect_it(m_m_itr,m_v_itr);

            iter_map_iter_t new_map_it =
                m_hash->next_map_it(m_m_itr,m_v_itr);

            m_m_itr = new_map_it;
            m_v_itr = new_vect_it;

            return tmp;
        }

Basically, as we iterate through the container, we need a way, based on the current iterator, to update the current map iterator, and the current vector iterator.

Here's the update method for the std::map iterator:

    map_iter_t next_map_it(map_iter_t mit, vect_iter_t vit) const{

        //we have more bins.  we iterate to map.begin(), vect.end(),
        //so we should never get to mit.end()
        if(mit != --(m_bins.end()))
        {
            if(vit == --(mit->second.end())){
                mit++;
                return mit;
            }

            return mit;
        }

        return mit;
    }

To summarize the above, we need to increment the map iterator if we're at the end of a bin. We should never get to mit.end(), because, conceptually, the last iterator we're interested in is the last bin's .end(). That's the iterator we use in eg hash3::end():

    iterator end(){
        return iterator(m_bins.rbegin()->second.end(),
                        m_bins.end(),   //not valid or checked, only vector iterator is compared
                        this);
    }

Here's the update for the ForwardIterator's vector iterator:


    //check the state of mit and vit, and return a new vector iterator
    vect_iter_t next_vect_it(map_iter_t mit, vect_iter_t vit) const{

        //we have more bins
        if(mit != --(m_bins.end()))
        {
            if(vit == --(mit->second.end())){
                mit++;
                return mit->second.begin();
            }
            else{
                return ++vit;
            }
        }

        //we're on the last bin, if vit is at .back(), go to
        //.end, if we're already at .end(), we've made a mistake
        if(m_bins.rbegin()->second.end() == vit){
            throw "iterator out of bounds!\n";
        }

        return ++vit;
    }

To summarize, if we're in the middle of a bin, we do a normal vector iterator increment. If we're on the last element of a bin, we need to return a vector iterator to the first element of the next bin.

So, that's the "theory of operation" for our iterator. As it stands, it does allow usage of algorithms which require a forward iterator, eg std::for_each, but, I must say, it has a few drawbacks:

  • Since hash3 must compute the next map and vector iterator, the ForwardIterator has to have a pointer back to the container. That is kind of strange for an iterator.
  • The iterator cannot handle 0-sized bins, so the std::map must be "pruned" in hash3::begin(). That might be an expensive operation. The alternative is to not allow hash3 to create empty bins, and implement some more bookkeeping in some of the methods. Painful, but perhaps a better solution.
  • The iterator being quite complex is a little bit of bait and switch. People expect iterators to be simple and fast, or at minimum, predictable. This may not be the case here. For example, for a hash3 with many bins, but few T's in each bin, the iterator will look more like a map iterator, performance-wise. For a hash with many T's in each bin, the performance will be different in a "per-T" sense.

From here, I followed this Stack Overflow thread to fill in the rest of the forward iterator.

Next Steps

In this article, I discussed what a spatial hash is, as well as the reason for cooking one up. I went over some implementation details and difficulties for a generic C++ spatial hash. At this point, we have completed the "container", at least at a pre-production level :). In part 2, we'll look at some algorithms and operations.