Parallel Computing Toolbox
We demonstrate three concepts in this demo:
To demonstrate these concepts we use the example of computing using the Monte Carlo method.
Given an area A of a square, and an area C of a circle inscribed in the square, we can compute from A and C via . The Monte Carlo method for computing is based on this relationship and assumes that when we throw a sufficiently large number of darts into the square, and count the number of darts that fall within a circle inscribed in this square; the ratio of darts that fall inside of the circle to the total number of darts thrown at the square approximates the ratio of C over A or . Thus the value of is computed as the ratio of darts which fall inside of the circle to the total number of darts thrown at the square times 4.
In this demo, for purposes of simplicity and symmetry, we focus only on the first quadrant of a unit circle inscribed in a square with side equal to 2. We assume that the X and Y coordinates of the darts are uniformly distributed between 0 and 1.
The figure above is a visual demonstration of how the Monte Carlo method works. To generate this figure, we throw 50 darts into the upper right quadrant of a square with side equal to 2. The o marks represent the darts which fell inside of the circle of radius 1, and the x marks represent the darts that fell outside of the circle. With 50 darts thrown we only get a crude approximation of . To approximate more accurately we need to throw more darts. Note that this method for approximating is not particularly effective. However, it serves to demonstrate how to convert for loop iterations into parfor and how to use parallel reduction variables.
In this section, we present an implementation of a sequential-loop Monte Carlo method for estimating . The method is implemented as a nested function SequentialMonteCarloPI. First, we set the number of darts that is thrown at the square and initialize the radius R to 1.
darts = 1e8; R = 1; tic
Call the sequential Monte Carlo method for computing .
myPI = sequentialMonteCarloPI(darts, R);
Measure the sequential elapsed time and display the results.
seqElapsedTime = toc; fprintf('The computed value of pi is %8.7f.\n',myPI);
The computed value of pi is 3.1419218.
This function uses a for-loop in place of a more optimized "vectorized" implementation to serve as a demonstration on how to use parfor and parfor reduction variables.
function myPI = sequentialMonteCarloPI(darts, R) count = 0; % Number of darts that fell inside of the circle. for i = 1:darts % Compute the X and Y coordinates of where the dart hit the % square using uniform distribution. x = R*rand(1); y = R*rand(1); if x^2 + y^2 <= R^2 % Increment the count of darts that fell inside of the % circle. count = count + 1; end end % Compute PI using the ratio of darts that fell inside of % the circle to the total number of darts thrown. myPI = 4*count/darts; end
In this section, we modify our sequential implementation of the Monte Carlo method for computing to use parallel for-loops (parfor), and reduction variables.
We start the process by analyzing the sequential-loop Monte Carlo method. The increment operations on the count variable appear to introduce data dependencies between iterations of the for-loop. However, the addition operation is associative, so the order of operations does not matter. Using this property, parfor can compute local copies of the count variable in parallel on each worker in the MATLAB pool, and then accumulate the partial results together at the end of the parfor loop. This accumulation of a result is called a reduction operation, and the count variable is called a reduction variable. To indicate that the count variable is a reduction variable, it is declared before the parfor code block, and is used after the parfor code block.
A MATLAB pool is a collection of MATLAB workers that can be used to offload work from the MATLAB client interactively and in parallel. To take full advantage of parfor to speed up computations a MATLAB pool has to be open. If the MATLAB pool is not open the parfor executes sequentially in the client MATLAB session.
Refer to the parfor documentation for more details. See the Advanced Topics section of the parfor documentation for more information on the classification of variables and reduction variables.
First, we verify that the MATLAB pool is open.
poolSize = matlabpool('size'); if poolSize == 0 error('parallel:demo:poolClosed', ... 'This demo needs an open MATLAB pool to run.'); end fprintf('This demo is running on %d MATLABPOOL workers.\n', ... matlabpool('size'));
This demo is running on 8 MATLABPOOL workers.
We scale the number of darts by the size of the MATLAB pool to demonstrate the parfor weak scaling capability.
parforDarts = darts * poolSize; R = 1; tic
Call the parallel Monte Carlo method for computing .
myPI = parallelMonteCarloPI(parforDarts,R);
Measure the parallel elapsed time and display the results.
parElapsedTime = toc; fprintf('The computed value of pi is %8.7f.\n',myPI);
The computed value of pi is 3.1417299.
function myPI = parallelMonteCarloPI(darts, R)
Each iteration is independent, and effectively only involves the use of a reduction variable count. Therefore we only have to change the for to parfor to execute loop iterations in parallel.
count = 0; parfor i = 1:darts % Compute the X and Y coordinates of where the dart hit the % square using Uniform distribution. x = R*rand(1); y = R*rand(1); if x^2 + y^2 <= R^2 % Increment the count of darts that fell inside of the % circle. count = count + 1; % Count is a reduction variable. end end
parfor automatically accumulates all the partial results of the reduction variable count.
% Compute pi myPI = 4*count/darts;
We compare the execution times of the sequential and parallel loop Monte Carlo method implementations. We observe that even though the execution times are comparable for both implementations, the parfor implementation executes a poolSize times larger number of loop iterations, and computes a more accurate value for . In addition, the code change required to accomplish this significant increase in performance involves just changing the for keyword to parfor in the main loop of the program.
fprintf('The sequential-loop Monte Carlo method executed in %8.2f',... seqElapsedTime); fprintf(' seconds\n and computed %4.2e darts per second.\n',... darts/seqElapsedTime); fprintf('The parallel-loop Monte Carlo method executed in %8.2f',... parElapsedTime); fprintf(' seconds\n and computed %4.2e darts per second.\n',... parforDarts/parElapsedTime);
The sequential-loop Monte Carlo method executed in 12.38 seconds and computed 8.07e+06 darts per second. The parallel-loop Monte Carlo method executed in 13.29 seconds and computed 6.02e+07 darts per second.