Programming @beehaw.org aeon_flux @sh.itjust.works 1y ago

(C++) Does anyone know the difference between Critical sections and Reduction in OpenMP?

Hi,

I'm studying for an exam coming up soon and I'm trying to figure out what is the functional difference between these two approaches:

#pragma omp parallel for
	for(unsigned int i = 0; i < G.Out.bucket_count(); i++){
		for(auto itv = G.Out.begin(i); itv != G.Out.end(i); itv++){

			unsigned int v = itv->first;

			#pragma omp parallel for
			for(unsigned int j = 0; j < G.Out[v]._map.bucket_count(); j++){
				for(auto ite = G.Out[v]._map.begin(j); ite != G.Out[v]._map.end(j); ite++){

					unsigned int w = ite->first;

					if(o[v] > o[w])
					 #pragma omp critical
					 p[o[v]] = min(p[o[v]],o[w]);

				}
			}

		}

and

	#pragma omp parallel for
	for(unsigned int i = 0; i < G.Out.bucket_count(); i++){
		for(auto itv = G.Out.begin(i); itv != G.Out.end(i); itv++){

			unsigned int v = itv->first;

			if(p[v] == v){

				unsigned int parent = v; //=p[v]

				#pragma omp parallel for reduction(min : parent)
				for(unsigned int j = 0; j < G.Out[v]._map.bucket_count(); j++){
					for(auto ite = G.Out[v]._map.begin(j); ite != G.Out[v]._map.end(j); ite++){
						unsigned int w = ite->first;
						if(v > w)
						 parent = min(parent,w);
					}
				}

				p[v] = parent;
				if(p[v] != v) changes = 1;

			}

		}
	}

In the first example, the minimum between the two values is calculated using a critical section, while the second one uses a reduction. Both work, and they seem equivalent to me, when would one choose one or the other?

Thanks, and sorry if the question is too niche. Any other info about OpenMP is greatly appreciated :D

You're viewing a single thread.

2 comments

The critical section makes sure that only ever a single thread can execute the section at a time. So when a thread what's to execute the section, it first needs to make sure no other thread is executing it and potentially wait for the other threads to finish executing the section.

Reductions however don't induce this synchronization overhead, instead each thread executes with an independent parent value, and after the loop is done, the reduction is applied to merge all parent values. The following, is essentially what the #pragma omp parallel for reduction(min : parent) is equivalent to:

unsigned int parents[8] = {v, v, v, v, v, v, v, v}; #pragma omp parallel for num_threads(8) for(unsigned int j = 0; j < G.Out[v]._map.bucket_count(); j++) { for(auto ite = G.Out[v]._map.begin(j); ite != G.Out[v]._map.end(j); ite++) { unsigned int w = ite->first; if(v > w) parents[omp_get_thread_num()] = min(parents[omp_get_thread_num()],w); } } unsigned int parent = v; for (unsigned int i = 0; i < 8; ++i) { parent = min(parent, parents[i]); }