Dev Diary #76 - Performance
Hello and welcome to this week's Victoria 3 dev diary. This time we will be talking a bit about performance and how the game works under the hood. It will get somewhat detailed along the way and if you are mostly interested in what has improved in 1.2 then you can find that towards the end. For those of you who don’t know me, my name is Emil and I’ve been at Paradox since 2018. I joined the Victoria 3 team as Tech Lead back in 2020 having previously been working in the same role on other projects.
What is performance
It’s hard to talk about performance without first having some understanding of what we mean by it. For many games it is mostly about how high fps you can get without having to turn the graphics settings down too far. But with simulation heavy games like the ones we make at PDS another aspect comes into play. Namely tick speed. This metric is not as consistently named across the games industry as fps is, but you might be familiar with the names Ticks Per Second or Updates Per Second from some other games. Here I will instead be using the inverse metric, or how long a tick takes on average to complete in either seconds or milliseconds. Some graphs will be from debug builds and some from release builds, so numbers might not always be directly comparable. What exactly a tick means in terms of in game time varies a bit. In CK3 and EU4 a tick is a single day, while on HOI4 it's just one hour. For Victoria 3 a tick is six hours, or a quarter of a day. Not all ticks are equal though. Some work might not need to happen as often as others, so we divide the ticks into categories. On Victoria 3 we have yearly, monthly, weekly, daily, and (regular) ticks.
If you thought 1.1 was slow you should have seen the game a year before release…
Content of a tick
Victoria 3 is very simulation driven and as such there is a lot of work that needs to happen in the tick. To keep the code organized we have our tick broken down into what we call tick tasks. A tick task is a distinct set of operations to perform on the gamestate along with information on how often it should happen and what other tick tasks it depends on before it is allowed to run.
An overview of some of the tick tasks in the game. Available with the console command TickTask.Graph.
Many of the tick tasks are small things just updating one or a few values. On the other hand some of them are quite massive. Depending on how often they run and what game objects they operate on their impact on the game speed will vary. One of the most expensive things in the game is the employment update, followed by the pop need cache update and the modifier update.
Top ten most expensive tick tasks in our nightly tests as of Feb 15. Numbers in seconds averaged from multiple runs using a debug build.
As you can see from the graph above many of our most expensive tick tasks are run on a weekly basis. This combined with the fact that a weekly tick also includes all daily and tickly tick tasks means it usually ends up taking quite long. So let’s dive a bit deeper into what’s going on during a weekly tick. To do this we can use a profiler. One of the profilers we use here at PDS is Optick which is an open source profiler targeted mainly at game development.
Optick capture of a weekly tick around 1890 in a release build.
There’s a lot going on in the screenshot above so let’s break it down a bit. On the left you see the name of the threads we are looking at. First you have the Network/Session thread which is the main thread for the game logic. It’s responsible for running the simulation and acting on player commands. Then we have the primary task threads. The number will vary from machine to machine as the engine will create a different number of task threads depending on how many cores your cpu has. Here I have artificially limited it to eight to make things more readable. Task threads are responsible for doing work that can be parallelized. Then we have the Main Thread. This is the initial thread created by the operating system when the game starts and it is responsible for handling the interface and graphics updates. Then we have the Render Thread which does the actual rendering, and finally we have the secondary task threads. These are similar to the primary ones, but are generally responsible for non game logic things like helping out with the graphics update or with saving the game.
All the colored boxes with text in them are different parts of the code that we’ve deemed interesting enough to have it show up in the profiler. If we want an even more in depth we could instead use a different profiler like Superluminal or VTune which would allow us to look directly at function level or even assembly.
The pink bars indicate a thread is waiting for something. For the task threads this usually means they are waiting for more work, while for the session thread it usually means it is blocked from modifying the game state because the interface or graphics updates need to read from it.
When looking at tick speed we are mostly interested in the session thread and the primary task threads. I’ve expanded the session thread here so we can see what is going on in the weekly tick. There are some things that stand out here.
First we have the commonly occurring red CScopedGameStateRelease blocks. These are when we need to take a break from updating to let the interface and graphics read the data it needs in order to keep rendering at as close to 60 fps as possible. This can’t happen anywhere though, it’s limited to in between tick tasks or between certain steps inside the tick tasks. This is in order to guarantee data consistency so the interface doesn’t fetch data when say just half the country budget has been updated.
The next thing that stands out is again the UpdateEmployment tick task just as seen in the graph above. Here we get a bit more information though. Just at a glance we can see it’s split into (at least) two parts. One parallel and one serial. Ideally we want all work to be done in parallel because that allows us to better utilize modern cpus. Unfortunately not all of the things going on during employment can be done in parallel because it needs to do global operations like creating and destroying pop objects and executing script. So we’ve broken out as much as possible into a parallel pre-step to reduce the serial part as much as possible. There is actually a third step in between here that can’t be seen because it’s too quick, but in order to avoid issues with parallel execution order causing out of syncs between game clients in multiplayer games we have a sorting step in between.
Closer look at the UpdateEmployment tick task.
Modifiers are slow
One concept that’s common throughout PDS games is modifiers and Victoria 3 is no exception. Quite the opposite. Compared to CK3 our modifier setup is about an order of magnitude more complex. In order to manage this we use a system similar to Stellaris which we call modifier nodes. In essence it’s a dependency management system that allows us to flag modifiers as dirty and only recalculate it and the other modifiers that depend on it. This is quite beneficial as recalculating a modifier is somewhat expensive.
However, this system used to be very single threaded which meant a large part of our tick was still spent updating modifiers. If you look at the graph at the top of this dev diary you can see that performance improved quite rapidly during early 2022. One of the main contributors to this was the parallelization of the modifier node calculations. Since we know which nodes depend on which we can make sure to divide the nodes into batches where each batch only depends on previous batches.
Closer look at the RecalculateModifierNodes tick task.
Countries come in all sizes
A lot of the work going on in a tick needs to be done for every country in the world. But with the massive difference in scale between a small country like Luxembourg and a large one like Russia some operations are going to sometimes take more than a hundred times as long for one country compared to another. When you do things serially this doesn’t really matter because all the work needs to happen and it doesn’t really matter which one you do first. But when we start parallelizing things we can run into an issue where too many of the larger countries end up on the same thread. This means that after all the threads are done with their work we still have to wait for this last thread to finish. In order to get around this we came up with a system where tick tasks can specify a heuristic cost for each part of the update. This then allows us to identify parts that stand out by checking the standard deviation of the expected computation time and schedule them separately.
One place where this makes a large difference is the country budget update. Not having say China, Russia, and Great Britain all update on the same thread significantly reduces the time needed for the budget update.
(And this is also why the game runs slower during your world conquest playthroughs!)
Closer look at the WeeklyCountryBudgetUpdateParallel tick task. Note the Expensive vs Affordable jobs.
Improvements in 1.2
I’m going to guess that this is the part most of you are interested in. There have been many improvements both large and small.
If you’ve paid attention to the open beta so far you might have noticed some interface changes relating to the construction queue. With how many people play the game the queue can end up quite large. Unfortunately the old interface here was using a widget type that needs to compute the size of all its elements to properly layout them. Including the elements not visible on screen.
New construction queue interface.
To compound this issue even further the queued constructions had a lot of dependencies on each other in order to compute things like time until completion and similar. This too has been addressed and should be available in today’s beta build.
Side by side comparison of old vs new construction queue.
One big improvement to tick speed is a consequence of changes we’ve done to our graphics update. Later in the game updating the map could sometimes end up taking a lot of time which then in turn led to the game logic having to wait a lot for the graphics update. There’s been both engine improvements and changes to our game side code here to reduce the time needed for the graphics update. Some things here include improving the threading of the map name update, optimizing the air entity update, and reducing the work needed to find out where buildings should show up in the city graphics.
Graphics update before/after optimization.
As we talked about above, the employment update has a significant impact on performance. This is very strongly correlated with the number of pops in the game. As in the number of objects, not the total population. Especially in late game you could end up with large amounts of tiny pops which would make the employment update extremely slow. To alleviate this design has tweaked how aggressively the game merges small pops which should improve late game performance. For modders this can be changed with the POP_MERGE_MAX_WORKFORCE and POP_MERGE_MIN_NUM_POPS_SAME_PROFESSION defines.
Another improvement we’ve done for 1.2 is replacing how we do memory allocation in Clausewitz. While we’ve always had dedicated allocators for special cases (pool allocators, game object “databases”, etc) there were still a lot of allocations ending up with the default allocator which just deferred to the operating system. And especially on Windows this can be slow. To solve this we now make use of a library called mimalloc. It’s a very performant memory allocator library and basically a drop in replacement for the functionality provided by the operating system. It’s already being used by other large engines such as Unreal Engine. While not as significant as the two things above, it did make the game run around 4% faster when measured over a year about two thirds into the timeline. And since it’s an engine improvement you can likely see it in CK3 as well some time in the future.
In addition to these larger changes there’s also been many small improvements that together add up to a lot. All in all the game should be noticeably faster in 1.2 compared to 1.1 as you can see in the graph below. Unfortunately the 1.1 overnight tests weren’t as stable as 1.2 so for the sake of clarity I cut the graph off at 1871, but in general the performance improvements in 1.2 are even more noticeable in the late game.
Year by year comparison of tick times between 1.1 and 1.2 with 1.2 being much faster. Numbers are yearly averages from multiple nightly tests over several weeks using debug builds.
That’s all from me for this week. Next week Nik will present the various improvements done to warfare mechanics in 1.2, including the new Strategic Objectives feature.