Most important things first: download the preprint of our paper par-gem5: Parallelizing gem5’s Atomic Mode here.

What is the paper about?
The gist of it is a parallelized version of gem5’s atomic mode. Note that this is for the atomic mode only! If you are intersted in the timing mode, feel free to read our sequel parti-gem5: gem5’s Timing Mode Parallelised, which is available on Arxiv.

How fast is par-gem5?
For completely parallel benchmarks we managed to reach speedups of ~25x when simulating a 128-core ARM system on a 128-core x64 host system. More realistic parallel benchmarks like NPB “only” attain speedups of up to ~12x. Since par-gem5 creates a thread for each simulated CPU core, the maximum attainable speedup depends on several factors. This includes: the number of available host threads, the number of simulated target CPUs, and the degree of parallelization in the executed benchmark. Especially the latter is important. If you are looking to speedup the execution of a single-core benchmark like Dhrystone, par-gem5 is probably not the right tool for you!

Is par-gem5 easy to use?
I would say it is fairly simple if you are already familiar with vanilla gem5. You only have to set a CPU’s event queue and choose a reasonable quantum. This can all be done in the python setup scripts with the following lines:

if args.parallel:
    print("gem5 going parallel")
    m5.ticks.fixGlobalFrequency()
    root.sim_quantum = m5.ticks.fromSeconds(m5.util.convert.anyToLatency("500us"))
    cpus = system.cpu_cluster[0].cpus
    # Note: child objects usually inherit the parent's event queue.
    if len(cpus) > 1:
        first_cpu_eq = 1
        for idx, cpu in enumerate(cpus, first_cpu_eq):
            cpu.eventq_index = idx

How accurate and reliable is par-gem5?
The parallelization approach of par-gem5 is in many regards similar to SystemC TLM-2.0’s so-called temporal decoupling. That means, rather than having one global time as in vanilla gem5, each simulated CPU resides in its own time and occasionally synchronizes with the rest of the system at certain barrier points. The distance of the barrier points is determined by the aforementioned quantum. For instance, if the quantum is set to 500µs, the maximum time two CPUs can diverge is 500µs.

Surprisingly, the hardware and software of most modern general purpose CPU systems is pretty resilient to a certain amount of time skew. If you do not yeet up the quantum to values like 1 second, you can boot linux systems and run arbitrary software workloads without encountering any problems. Nevertheless, we are changing the semantics of the simulation and this has a non-negligible impact on multiple aspects.

For instance, if CPUs are communicating with each other, certain messages may be postponed to a barrier point, which in general leads to prolonged simulation times (the time that is provided in the gem5 statistics, not the the so-called wall clock time). As shown in the paper, a quantum of 1µs seems to keep inaccuracies in a single-dit percentage while still achieving significant speedups in most benchmarks.

The different time domain are also a problem for some of gem5’s hardware models. For instance, the ARM timer model casts time differences to unsigned integers, which may result in trouble if the deltas are negative. Here’s a snippet of the unfixed timer’s impact on the Linux boot timestamps.

gem5       par-gem5
[0.000385] [0.000385]     Mount-cache hash table entries: 32768 [...]
[0.000396] [0.000396]     Mountpoint-cache hash table entries: [...]
[0.024140] [422.828066]   ASID allocator initialised with 128 entries
[0.032140] [3495.801687]  Hierarchical SRCU implementation.
[0.048162] [845.656091]   smp: Bringing up secondary CPUs ...
[0.080218] [5877.941435]  Detected PIPT-Icache on CPU1

As you can see, at some point the timer blows up. That was a pain to debug, but we eventually managed to find the error and fix the timer model. After fixing some other issues, par-gem5 is now in a state, which I would consider as quite reliable. I would not launch a space craft with, but it’s good enough for software development and design space exploration.

Will par-gem5 be open source?
Since par-gem5 is the result of an industry project, the source code is not going to be disclosed.

Any Questions?
Feel free to write me a mail (see About).