At Double Negative, RnD has “hackathons”. A hackathon means you get two weeks to work on an idea you’ve had, or something you want to learn, or … The possibilities are endless, as long as the final result is broadly related to Dneg’s field, i.e. Visual Effects.
For the summer 2017 hackathon, I wanted to practice two things I have been teaching myself about for a while:
- Performance Engineering
- Working with the Houdini HDK
For the first one, I wanted to employ the strategies I learned about in this online open course by MIT, and also use intel TBB to parallelize the code. I had played around with TBB before but more practice never hurts.
For the second goal, I had two stages in mind: Initially I wanted to simply import and export geometry in the BGEO format to seed the algorithm. If I had time left, I wanted to try to get the code itself running as a custom SOP node in Houdini.
Obviously I needed a project to work with to achieve these goals. Given my interest in simulation algorithms, I chose to implement the curly hair paper presented by Pixar a while back for their show “Brave”. It is a mass spring model with a very small time-step. You can find the paper here. It looks something like this:
So, wrapping it all up, I had three goals:
- Implement the basics of the paper, with the main focus on the internal spring model.
- Analyse the performance of the code and improve it. Use TBB to get parallelism.
- Make use of the Houdini HDK to seed my simulation, either as a standalone or as a SOP running in houdini.
A pretty steep order for a two week project. But hey, what’s life without a challenge.
Obviously I had to start by implementing the hair model. At this stage I opted to not worry about optimization, in line with Donald Knuth’s famous quote!
The internal force model for Pixars curly hair consists of three different springs:
- A stretch spring (no surprises there)
- A rotation-free bending spring
- A 'core spring'
The stretch spring is a standard spring between the particles of the hair to make sure it doesn’t extend (too much), I won’t cover this as it is straightforward.
The bending spring is somewhat specialized since the authors want to avoid the hairs to twist around their own axis when extending. This is a bit hard to explain in words. The authors provide a video to compare it to the discrete elastic rod model (bergou et al.). Check out Pixars project page to see the video. Basically, when a hair extends, the natural thing for it to do is to uncurl (as in the elastic rod model). However, Pixar quotes artistic reasons for wanting to avoid this. They formulate their bending springs with this goal in mind. I’ll refer to the paper for the maths behind all of this. It is based on parallel transport of the root frame along a smoothed representation of the hair. Here’s a video of my implementation of the model showing this behaviour correctly:
As you can see, the hair doesn’t twist/uncurl as it extends.
In the previous video, we see the hair showing the wanted rotation-free behaviour, so the bending spring is working. It doesn’t look great, however, because the curl extends a lot, especially near the root. While this is again a natural thing for it to do, the Pixar artists wanted the curls to keep their shape, even when the hair is undergoing extreme acceleration (which typically does happen in feature animation). An obvious solution is to increase the stiffness of the bending springs, resulting in this behaviour:
The curl is now no longer extending, but it pretty much looks like it’s a steel wire or a solid object… also not great! To get flexible, slinky-like behaviour without the uncurling, the authors introduce a third spring: the core spring. This spring is meant to combat the uncurling of the hair without stiffening up the curls. It basically applies a force in the direction of a smoothed hair representation like the one used for the bending spring. Again, check out the maths for this in the paper itself if you’re interested in that sort of thing.
I managed to also implement the core spring succesfully. In my implementation, all three springs working together (with low stiffness on the bending springs), look something like this:
In order for the algorithm be non-trivial to parallelize, I added basic collision detection between hairs and between the hairs and animated collision geo. For this I implemented a simple hashgrid to avoid squared running time. The collision handling was penalty-force based. Here’s some videos of it in action:
Given the time restrictions I didn’t spend a lot of time on it. It added enough flavour to the algorithm for it to be interesting for the next step.
Getting it running was just step one, I wanted to get it running fast(er) as well. I am very keen to learn more about optimization and parallelisation, so this was really the most important part of the project to me. My strategy was simply to analyse the performance of my code using perf and callgrind (and vtune when parallelizing), look at the current bottleneck, and try to get rid of it/speed it up. Some of the things I did that gave performance improvements were:
- Re-order some of the spring calculations to make better use of locality in cache.
- Switched from Eigen's Vec3f to Vec4f, which can make use of vectorization. This also means using their allocator to avoid crashes.
- Getting rid of unnecessary function calls
- re-write the hashgrid using TBB's concurrent_unordered_map
- Using TBB's parallel_for construct to divide the main workload over the different cores of my machine
- re-ordering some of the operations to maximize available work and limit synchronization wait time/locking. Especially allowing the cores to keep computing the new velocities while still clearing out the hashgrid proved pretty vital. I accomplished this using a tbb taskgroup combined with a parallel_for for the hair updates.
Keep in mind I only spent 4 days on this, working with a trial-and-error strategy. I don’t have exact numbers for the speedup and efficiency as I worked with different testcases throughout the four days, and the testcases I used near the end would have taken days with the initial code. As an example, just the parallelisation brought one testcase consisting of 101k particles down from 44 seconds per frame to under 2 seconds per frame, on 40 cores (so Speedup above 22). The algorithm has really small timesteps so it’s just kinda slow in general. This is what the performance looked like in vtune:
Which I was happy enough with, given that there is collision detection happening every subframe (meaning communication and synchronization). I am very much looking to improve my skills in this field, and I think this has been a very worthwhile exercise in that direction.
This is probably the part I was least familiar with, and unfortunately the HDK isn’t extensively documented. Luckily the basics were easy to get together. For the most part, especially while doing the performance tests, I relied on a standalone application that used my little simulation library while reading in and writing out BGEOs. I was then able to visualize the results in houdini with a simple File SOP.
Obviously this wouldn’t make for a great artist workflow, so near the end of my hackathon I spent a day or two making a custom SOP node. Just two days with the HDK is not nearly enough to get really comfortable with it, but I did manage to get my simulation working properly (and controllable) in Houdini as a SOP, complete with caching and everything, which was much easier to work with:
In 2 weeks, I made a prototype of the curly hair algorithm by Pixar that runs in a somewhat efficient manner on multiple processors and is easy to use inside a Houdini scene. I found it to be a very rewarding two weeks. It’s only a proof of concept but for 2 weeks I think it’s quite okay.
One nice upside of having it working in Houdini is that you can use the Houdini hair tools to render something prettier than the particle representations above:
Let me know if you want to know more! This project is property of Double Negative as it was made as part of the Dneg hackathon, so I will not be able to give out the code.