0

Linux, C, GCC, ARM (Allwinner H3). Is it possible to get the following equivalent functionality (pseudocode):

#pragma optimizations-stop
#pragma registers-flush

a=10;
while(*b != 0) c++;
if(a == 0) { do something; }

#pragma optimizations-start

with result of location of a being explicitly written with value 10, then every loop cycle value of b is read from the memory and then read value from location b is compared to 0, and then if not zero c is rmw'ed with +1, and when loop breaks location of a is being explicitly read from the memory and compared to 0?

And, what's important, instructions are executed in the specific order programmer laid them out in the program, not as compiler or CPU thinks the best.

Please withhold with discussion of what is impossible, let's discuss how to achieve the required result.

If it is not possible at the C/GCC level, I will be glad for suggestions to implement that as inline assembly.

8
  • 3
    It’s pretty straightforward by declaring a, b and c as volatile. Is that what you’re after? Commented Jul 9 at 6:57
  • Of course not. See the question I asked here stackoverflow.com/questions/78722496/…. In addition I learnt that CPU itself (especially ARM) is going to decide for me how to execute MY program.
    – Anonymous
    Commented Jul 9 at 6:59
  • 4
    Hmm. As the comments there point out, volatile does not perform synchronisation, which is necessary for multithreading. Nevertheless, volatile does ensure that every variable is read and written to as specified in your program. The reason concurrent access isn’t well-defined is due to where these variables are stored on multiprocessor architectures (simplified). Commented Jul 9 at 7:06
  • Anyway: if your question is “how do I disable the CPU cache and prevent instruction prefetching, etc.?” then I think the answer is: you can’t. (Though I might be wrong) Commented Jul 9 at 7:08
  • There's 3 different issues one has to keep separated: 1) prevent compiler optimizations, 2) prevent race condition bugs and 3) prevent instruction re-ordering. volatile definitely does 1), it definitely does not do 2) and it may or may not do 3) depending on how literal one reads the C standard. The most sensible way to read the standard is that the implementation (compiler + system) is not allowed to perform re-ordering across a volatile access = memory barrier. However, some chose to read the standard as volatile access might not be re-ordered in relation to another side effect.
    – Lundin
    Commented Jul 9 at 7:30

2 Answers 2

7

The "big hammer" is a GCC extension, asm __volatile__("": : :"memory"). This is the full memory barrier used by the Linux kernel.

Note that this does not do what you ask for in the title. It doesn't stop optimizations at all. It does however achieve the ordering that you describe.

Also note that c++ is a read-modify-write. It's not atomic, and there are no memory barriers between the parts. The compiler can reasonably keep c in a register, and do a read-modify-write on the register. You'd need something like volatile int* pc = &c; to avoid that.

Finally, to take your other post into account, the memory barrier on one thread won't magically make another thread cooperate. You'd need barriers on both threads. And that's why they're rare - you can usually use proper synchronization instead.

4
  • Could atomic_thread_fence(memory_order_seq_cst); (#include <stdatomic.h>) be used as the memory barrier?
    – Ian Abbott
    Commented Jul 9 at 9:57
  • "you'd need something like volatile int* pc = &c; to avoid that.": Even if the access is via volatile I do not think that there is any guarantee that an increment will be performed in one combined load/store. If I remember correctly, there was even implementation divergence on that between GCC and Clang. Commented Jul 9 at 10:52
  • @user17732522: Correct. You only prevent the "store in register while incrementing" part. It's definitely not atomic, nor does it synchronize with other threads.
    – MSalters
    Commented Jul 9 at 11:13
  • 1
    asm __volatile__("": : :"memory") seems to be only a compile-time reordering barrier. It doesn't add any instructions for memory synchronization (godbolt.org/z/3z6o3h9xb). So I am not sure whether this is enough to solve OP's problems. On x86 it will behave like acquire/release, but not sequential consistency. That's enough for OP, but e.g. on ARM I do not think it results in sufficient memory ordering guarantees. Commented Jul 9 at 19:47
1

Using _Atomic for all the variables in question will achieve this:

  1. Every read and write to an _Atomic variable will correspond to an actual load or store instruction being executed. (While this is not a formal guarantee of the C standard, in practice compilers do not try to optimize them out.)

  2. Those loads and stores will not be reordered by the compiler.

  3. Memory barrier instructions will be inserted where needed, to prevent any reordering by the CPU and ensure that they are observed in the same order by every other thread / CPU which also uses proper barriers. (The default memory ordering for atomic is sequential consistency. To avoid any of these memory barriers, you'd need to use functions like atomic_load_explicit).

  4. The c++ will be performed with an atomic read-modify-write instruction or sequence of instructions (e.g. ARM LDREX/STREX), so that no writes to c can be performed by any other thread in between the read and write.

  5. Accesses to non-_Atomic variables will not be reordered after a store, nor before a load. (You can't prevent reorderings in the other directions, except by making the other variables _Atomic as well.)

Note that using volatile, as suggested elsewhere, achieves 1 and 2, but typically not 3, 4, 5. Moreover, any unsynchronized concurrent access by multiple threads to a variable which is not _Atomic, even if it is volatile, causes undefined behavior for the entire program according to the C standard (a data race).

0

Not the answer you're looking for? Browse other questions tagged or ask your own question.