Is it alright to display mean CPU usage of algorithm instead of CPU cores available?

Question

In many OR articles I have read, there is a numerical experiments part which explains the configuration used and in particular, the number of CPU available.

However, I am obliged to run my experiments on a shared remote server because this is the only place where the ILP solver is installed (the license is too expensive to install it on my PC).

I truly tried to find a way to use a fix number of CPUs, but as I do not have root privileges, I would like to know if displaying CPU mean usage load by my algorithms is alright instead of number of CPUs available?

For instance, writing the following:

We launched the numerical experiments on [specific machine] with 32 CPU cores shared among other users, and restricted our algorithms to use at most 4 cores, the mean CPU usage, ranging from 0.73 to 3.8 is displayed alongside each instance solved.

Robert Bassett · Accepted Answer · 2024-07-08 14:39:21Z

7

It sounds like you’re using slurm as a resource manager on a shared machine? If so, I suggest you describe both the hardware and the resources you requested. Something like, “We conducted our experiments on [your institution]’s shared research cluster. We requested [some number of] cores across [some number of] node(s) along with [some amount of] memory on machine(s) with [some model of CPU and some operating system].”

I review computationally minded papers like this all the time and I would expect something like this in the experimental description. People know what shared resources are like so don’t sweat it if you can’t use a standalone machine.

answered Jul 8 at 14:39

Robert Bassett

7207 silver badges12 bronze badges

$\begingroup$ I believe my server does not use SLURM, because running this code says so. Apart from that, your answer is great :) $\endgroup$
– JKHA
Commented Jul 8 at 14:45
$\begingroup$ What command do you use to request resources? Is the machine running Linux? $\endgroup$
– Robert Bassett
Commented Jul 8 at 15:12
$\begingroup$ I either use a .sh script that will call a Julia project or the Julia code directly. Yes the machine is running Linux. $\endgroup$
– JKHA
Commented Jul 8 at 15:16
$\begingroup$ It seems like you may be sharing a large machine without a resource manager to make sure that compute jobs do not compete with one another. This is madness from the perspective of reproducible experiments because someone could always launch a job that slows yours down. $\endgroup$
– Robert Bassett
Commented Jul 8 at 15:20
$\begingroup$ My best attempt would be to wait until the weekend, check that the server is not busy with top, and launch your job. You can also use taskset in Linux to restrict your job to some number of CPU cores. I hope it’s installed by default with your Linux distribution so you won’t need admin access. $\endgroup$
– Robert Bassett
Commented Jul 8 at 15:23

| Show 1 more comment

sascha · Accepted Answer · 2024-07-08 20:38:27Z

General remarks potentially less applicable to OPs case (as admins control tooling)

Opinion: Modern Approach

I think the modern approach to resource-control (without cooperating software) would be cgroups, as it's done in the cloud all the time (e.g. CPU bandwidth control for CFS (pdf)).

This, among other things, allows to limit cpu-bandwith (which is not the same as using taskset to limit assignable cores), but also memory for example. I expect the latter to be very important in cases like yours, as often those machines have 1TB of RAM and presenting this as your bound is often useless.

Example Tooling: benchexec

benchexec seems to apply these techniques and there are academic papers provided.

"BenchExec is a framework for reliable benchmarking and resource measurement and provides a standalone solution for benchmarking that takes care of important low-level details for accurate, precise, and reproducible measurements as well as result handling and analysis for large sets of benchmark runs."

"Unlike other benchmarking frameworks, BenchExec is able to reliably measure and limit resource usage of the benchmarked tool even if the latter spawns subprocesses. In order to achieve this, it uses the cgroups feature of the Linux kernel to correctly handle groups of processes."

One-off example

On modern distributions the following might work:

systemd-run --scope -p CPUQuota=400% --user MY_PROGRAM

See docs.

Remarks on sampling-based observation

Sampling-based observations like mean cpu-usage and especially memory-usage are potentially very wrong! It's not that trivial to recognize this way, that your algorithm had a 0.01s memory peak of 5000% the amount of the next highest peak.

One issue surely is sampling-frequency. Another is process-forking and other things potentially happening.

This would not happen when controlled through cgroups.

"These groups can be hierarchical, meaning that each group inherits limits from its parent group."

General remarks: uncoordinated shared resources

There are really lots of red flags here as already discussed and while it might not be an issue in terms of your papers results (you will underestimate your performance), it might be an issue for your own performance understanding.

Uncoordinated shared resources, especially nowadays with thermal throttling, instruction-counter based throttling (too many AVX512 instructions), shared caches and co. needs careful statistics (and multiple runs) to be meaningful.

I claim, that it's possible to slow down the performance for everyone on a modern system by 30% just by inducing 1% of cpu-load (by AVX512-induced throttling which also has some kind of hysteresis window).

"But those few % surely are irrelevant"

It depends on what you are measuring.

One of the less fun examples of discrete-optimization on shared resources is side-load combined with time-based limits. In my experience this is more of a debugging / analysis topic, but performance analysis might be affected too:

Probably more robust: measuring time to reach MIPGap of X%
Probably less robust: run 100 experiments and calculate aggregate-statistics of MIPgap with an active time-limit
- Interfering side-load might have lead to running into our time-limit before we reached the new incumbent we would have seen without side-load!
  - "We needed 3 secs more time to got our 0.3% GAP instead of the 20% GAP"

Some weakly related thoughts: Consistency in Solvers..

Stack Exchange Network

Is it alright to display mean CPU usage of algorithm instead of CPU cores available?

2 Answers 2

Opinion: Modern Approach

Example Tooling: benchexec

One-off example

Remarks on sampling-based observation

General remarks: uncoordinated shared resources

"But those few % surely are irrelevant"

Not the answer you're looking for? Browse other questions tagged
academia
computational-experiments
cpu
or ask your own question.

Hot Network Questions

Is it alright to display mean CPU usage of algorithm instead of CPU cores available?

2 Answers 2

Opinion: Modern Approach

Example Tooling: benchexec

One-off example

Remarks on sampling-based observation

General remarks: uncoordinated shared resources

"But those few % surely are irrelevant"

Not the answer you're looking for? Browse other questions tagged academiacomputational-experimentscpu or ask your own question.

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
academia
computational-experiments
cpu
or ask your own question.