Thread Subject: Preallocation of cell arrays?

 Subject: Preallocation of cell arrays? From: the cyclist Date: 22 Sep, 2010 21:33:05 Message: 1 of 16 For a long, long time I have had it stuck in the back of my head that for cell arrays, preallocation of memory was not important, because they need not be stored in contiguous memory. But I was nagged by the fact that MATLAB does give warnings about it when you don't. So, I compared the following two code snippets: ===================== % Don't preallocate clear x; N=100000; tic; for n=1:N,   x{n} = rand(100,1); end; toc % Typically about 11 seconds ===================== ===================== % Do preallocate clear x; N=100000; tic; x = cell(1,N); for n=1:N,   x{n} = rand(100,1); end; toc % Typically about 0.3 seconds ===================== I was quite surprised by the timing results. I don't understand this at all, particularly because in the preallocation step, I am not even telling MATLAB anything about the contents of the cells, so it can't know how much memory to reserve. Any help in understanding this behavior would be appreciated.
 Subject: Preallocation of cell arrays? From: Walter Roberson Date: 22 Sep, 2010 21:39:02 Message: 2 of 16 On 10-09-22 04:33 PM, the cyclist wrote: > For a long, long time I have had it stuck in the back of my head that > for cell arrays, preallocation of memory was not important, because they > need not be stored in contiguous memory. But I was nagged by the fact > that MATLAB does give warnings about it when you don't. A cell array's data block is a list of pointers. When you change the number of items in the cell array, the list of pointers has to be copied. Pre-allocating 100000 cells and writing to them only involves one search for a memory block, plus writing one pointer value per iteration, for a total of 100000 pointers written. Not preallocating and expanding a cell to length 100000 involves writing 1 pointer on iteration #1, copying that 1 pointer and writing a new one on iteration #2, copying those 2 points and writing a new one on iteration #3, and so on, which makes for 100000 * 100001 / 2 = 5000050000 pointer writes _plus_ 100000 searches for memory blocks. Not surprising that it takes longer. The advantage of cell arrays is that the _contents_ of the cells do not have to be copied around when the array is expanded.
 Subject: Preallocation of cell arrays? From: Sean Date: 22 Sep, 2010 21:45:23 Message: 3 of 16 "the cyclist" wrote in message ... > For a long, long time I have had it stuck in the back of my head that for cell arrays, preallocation of memory was not important, because they need not be stored in contiguous memory. But I was nagged by the fact that MATLAB does give warnings about it when you don't. > > So, I compared the following two code snippets: > > ===================== > % Don't preallocate > clear x; > N=100000; > tic; > for n=1:N, > x{n} = rand(100,1); > end; > toc > % Typically about 11 seconds > ===================== > > ===================== > % Do preallocate > clear x; > N=100000; > tic; > x = cell(1,N); > for n=1:N, > x{n} = rand(100,1); > end; > toc > % Typically about 0.3 seconds > ===================== > > I was quite surprised by the timing results. I don't understand this at all, particularly because in the preallocation step, I am not even telling MATLAB anything about the contents of the cells, so it can't know how much memory to reserve. > > Any help in understanding this behavior would be appreciated. A quote from John D'Errico: "A problem in MATLAB is dynamic growth of arrays, when you do not know the final size of that array. Concatenation forces re-allocation of memory for the entire array at each step. For large arrays this becomes very inefficient." (http://www.mathworks.com/matlabcentral/fileexchange/7490) Thus it would imply that the cell only needs to reallocate memory for the current cell and not the whole array.
 Subject: Preallocation of cell arrays? From: Cris Luengo Date: 23 Sep, 2010 07:53:04 Message: 4 of 16 "Sean " wrote in message ... > "the cyclist" wrote in message ... > > For a long, long time I have had it stuck in the back of my head that for cell arrays, preallocation of memory was not important, because they need not be stored in contiguous memory. But I was nagged by the fact that MATLAB does give warnings about it when you don't. > > A quote from John D'Errico: > "A problem in MATLAB is dynamic growth of arrays, when you do not know the final size of that array. Concatenation forces re-allocation of memory for the entire array at each step. For large arrays this becomes very inefficient." > (http://www.mathworks.com/matlabcentral/fileexchange/7490) > > Thus it would imply that the cell only needs to reallocate memory for the current cell and not the whole array. There's a misunderstanding here about how cell arrays work. A cell array is just like a numeric array, except the values stored in the array are references (pointers) to other arrays. The cell array does not physically contain other arrays. The arrays contained within the cell array are not stored in contiguous memory. But the cell array itself, the list of pointers to arrays, does need to be stored in contiguous memory. Thus, it will be reallocated when the size of the cell array changes; but when you change the contents of a cell, the pointer is just overwritten with a new pointer. Just as when you change one value in a double array. Cheers, Cris.
 Subject: Preallocation of cell arrays? From: Oleg Komarov Date: 23 Sep, 2010 09:04:07 Message: 5 of 16 I think Cris and Walter made it clear. What about structures, is there any difference among scalar structures and non scalar? a.a = []; b(1).a = []; b(2).a = []; Should I preallocate, say: b(100).a = []; Oleg
 Subject: Preallocation of cell arrays? From: Bruno Luong Date: 23 Sep, 2010 09:33:07 Message: 6 of 16 "the cyclist" wrote in message ... > For a long, long time I have had it stuck in the back of my head that for cell arrays, preallocation of memory was not important, because they need not be stored in contiguous memory. No you confuse between two different things: preallocate the *contents* of the cell and preallocate the cell itself. We still have to preallocate the cell, but there is no advantage to preallocate the contents (each element under each). http://www.mathworks.com/matlabcentral/newsreader/view_thread/258931 Bruno
 Subject: Preallocation of cell arrays? From: Matt J Date: 23 Sep, 2010 09:37:04 Message: 7 of 16 "Oleg Komarov" wrote in message ... > I think Cris and Walter made it clear. > > What about structures, is there any difference among scalar structures and non scalar? > Should I preallocate, say: > b(100).a = []; =============== Yep, as the following timing test shows (modified from the OP). Reallocation may even be a bit worse for structs because structs also have to carry around extra user-defined characteristics like fieldnames. % Don't preallocate clear x; N=100000; d=rand(100,1); tic; for n=1:N,   x(n).field = d; end; toc % Elapsed time is 15.306710 seconds. % Do preallocate clear x; N=100000; tic; x(N).field=[]; for n=1:N,   x(n).field = d; end; toc % Elapsed time is 0.022479 seconds.
 Subject: Preallocation of cell arrays? From: Bruno Luong Date: 23 Sep, 2010 10:30:31 Message: 8 of 16 "Oleg Komarov" wrote in message ... > I think Cris and Walter made it clear. > > What about structures, is there any difference among scalar structures and non scalar? > > a.a = []; > > b(1).a = []; > b(2).a = []; > > Should I preallocate, say: > b(100).a = []; Again, any kind array (regular, cell or struct) need to be preallocated at the root level. However there is no advantage to preallocate the subfields or subelements if they will be erased later (assigned without subindex on the lhs). Hope it's clear. Bruno
 Subject: Preallocation of cell arrays? From: Jan Simon Date: 23 Sep, 2010 10:47:05 Message: 9 of 16 Dear readers, Internally structs and cells are almost equal except for the field name. This is the cause for the very fast STRUCT2CELL and CELL2STRUCT functions. In consequence, appending fields to a struct dynamically is equivalent to a growing cell:   tic;   for i = 1:10000     S.(sprintf('field%d', i)) = 5;   end   toc; Unfortunately you cannot pre-allocate the number of fields. (Inside a Mex this is possible, btw.) Therefore it should be faster to create a cell at first:   tic;   Field = cell(10000, 1);   Data = cell(10000, 1);   for i = 1:10000     Field{i} = sprintf('field%d', i);     Data{i} = 5;   end   S = cell2struct(Data, Field);   toc; But what happens??? The dynamic field name method is remarkably faster!   Growing struct: 0.63 sec   Pre-allocated cell: 2.0 sec   (a slow Pentium-M 1.5GHz, Matlab 2009a, code inside a function)   (can somebody confirm this? My computer had some strage timimgs in the past) Let's examine the effects of the fieldname length: With 'f%d' instead of 'field%d':   Growing struct: 0.42 sec   Pre-allocated cell: 1.5 sec 25% faster - interesting. This could be caused by 3 factors:   - Conversion from a Unicode mxChar to a C-string field name   - Checking for unique fieldnames needs more time for longer strings   - Checking for valid fieldnames needs more time for longer strings My conclusion: Do not create structs with thousands of fields. With substructs, struct arrays and cells such objects are handled more efficiently. Kind regards, Jan
 Subject: Preallocation of cell arrays? From: the cyclist Date: 23 Sep, 2010 12:31:19 Message: 10 of 16 Walter Roberson wrote in message ... > On 10-09-22 04:33 PM, the cyclist wrote: > > For a long, long time ... Thanks to everyone who took the time to reply, and especially to Walter for his clear, concise, and rapid-fire response.
 Subject: Preallocation of cell arrays? From: Matt J Date: 23 Sep, 2010 13:14:05 Message: 11 of 16 "Jan Simon" wrote in message ... > (can somebody confirm this? My computer had some strage timimgs in the past) ====== I'm seeing similar trends, but have no idea why. I'm more puzzled, though, over the following. If I add these 3 lines of code to yours, it makes it look like struct indexing is much faster than cell indexing.  N=1e6;     tic; for ii=1:N; z=S.field5000; end ;toc;   % Elapsed time is 0.002997 seconds.   tic; for ii=1:N; z=Data{5000}; end ;toc;  %Elapsed time is 0.194040 seconds. On the other hand, if I include an additional line on top of this, the first two timing results are affected significantly  N=1e6;     tic; for ii=1:N; z=S.field5000; end ;toc;   %Elapsed time is 0.513582 seconds.      tic; for ii=1:N; z=Data{5000}; end ;toc;   %Elapsed time is 0.504524 seconds.      tic; for ii=1:N; z=S.field5000(1); end ;toc;   %Elapsed time is 0.340295 seconds.
 Subject: Preallocation of cell arrays? From: Jan Simon Date: 23 Sep, 2010 15:35:09 Message: 12 of 16 Dear Matt J, > I'm more puzzled, though, over the following. If I add these 3 lines of code to yours, it makes it look like struct indexing is much faster than cell indexing. > N=1e6; > > tic; for ii=1:N; z=S.field5000; end ;toc; > % Elapsed time is 0.002997 seconds. > > tic; for ii=1:N; z=Data{5000}; end ;toc; > %Elapsed time is 0.194040 seconds. The JIT-acceleration has such effects. [S.field5000] is obviously converted to a fixed pointer access, while Data{5000} is determined in each loop again. I include a CLEAR in each time measurement to reduce the influence of the JIT:    tic; for ii=1:N; z=S.field5000; clear('z'); end ;toc;    % Elapsed time is 1.17 sec      tic; for ii=1:N; z=Data{5000}; clear('z'); end ;toc;   %Elapsed time is 1.15 dec (Matlab 2009a, 1.5GHz PentiumM, WinXP32) If I split the lines such, that each line contains one command, the JIT has even more power. And finally moving this into a function would cause real-world timings only. However, the growing of a struct seems to be less susceptive compared to a growing cell. Funny. Kind regards, Jan
 Subject: Preallocation of cell arrays? From: Matt J Date: 23 Sep, 2010 15:53:05 Message: 13 of 16 "Jan Simon" wrote in message ... > The JIT-acceleration has such effects. [S.field5000] is obviously converted to a fixed pointer access, while Data{5000} is determined in each loop again. ==== OK > However, the growing of a struct seems to be less susceptive compared to a growing cell. Funny. ====== My only guess was that new fieldnames are added as if in a linked list, and thus are not memory continguous??? I would think that would slow down field access though...
 Subject: Preallocation of cell arrays? From: the cyclist Date: 23 Sep, 2010 16:11:08 Message: 14 of 16 "Bruno Luong" wrote in message ... > "the cyclist" wrote in message ... > > For a long, long time I have had it stuck in the back of my head that for cell arrays, preallocation of memory was not important, because they need not be stored in contiguous memory. > > No you confuse between two different things: preallocate the *contents* of the cell and preallocate the cell itself. > > We still have to preallocate the cell, but there is no advantage to preallocate the contents (each element under each). > http://www.mathworks.com/matlabcentral/newsreader/view_thread/258931 > > Bruno That was an interesting thread. I am not sure why I didn't find it when I searched (before starting this thread). I ended up writing an expanded version of your preallocation testing function (pasted below) in which I compare five different ways of preallocating and then writing data to the cell array: -- No preallocation -- Preallocate the cell, but not the contents -- Three different methods of preallocating the contents (from your older post) You had included timing results from the data-writing, but not from the preallocation itself. This function shows both. In this small example, preallocating the cell but not the contents seems to be a pretty clear winner, when one accounts for both preallocation and data-writing time. the cyclist %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% function [] = preallocationOfCellArraysTest() N = 1000; disp(' ') disp('Preallocation timings') % No preallocation tic toc % Preallocate cell, but not contents tic W = cell(N,N); toc % Jan Simon's method tic X = cell(N,N); X(:) = {[0 0 0; 0 0 0; 0 0 0]}; toc % Matt Fig tic Y(1:N,1:N) = {[0 0 0; 0 0 0; 0 0 0]}; toc % Jim Rockford tic Z = cell(N,N); for ii = 1:N     for jj = 1:N         Z{ii,jj} = zeros(3); % force caling a function     end end toc clear ii jj % Bruno's change such that only the reallocation during copy-on-write % better stands out disp(' ') disp('Data-writing timings') tic for ii = 1:N     for jj = 1:N         V{ii,jj}(3,3) = 1;     end end toc tic for ii = 1:N     for jj = 1:N % W{ii,jj}(3,3) = 1;         W{ii,jj} = [0 0 0; 0 0 0; 0 0 1];     end end toc tic for ii = 1:N     for jj = 1:N         X{ii,jj}(3,3) = 1;     end end toc tic for ii = 1:N     for jj = 1:N         Y{ii,jj}(3,3) = 1;     end end toc tic for ii = 1:N     for jj = 1:N         Z{ii,jj}(3,3) = 1;     end end toc
 Subject: Preallocation of cell arrays? From: Jan Simon Date: 23 Sep, 2010 16:30:48 Message: 15 of 16 Dear Matt, > My only guess was that new fieldnames are added as if in a linked list, and thus are not memory continguous??? I would think that would slow down field access though... I'm not sure. I've examined the C-string-pointers replied by mxGetFieldNameByNumber(). For 10 different test structs, the field names had a constant distance of 64 bytes, which would match the 63 characters limit and the trailing \0. Then the fieldnames could be stored in contiguos block of memory such that searching for a specific name can be implemented fast. But then I stopped to dig deeper, because I decided not to rely on this undocumented detail, e.g. for the RenameField-Mex. Although the linked list is a good argument for the speed of a growing struct, the growing cell is a linked list also... Kind regards, Jan
 Subject: Preallocation of cell arrays? From: Matt J Date: 23 Sep, 2010 16:45:05 Message: 16 of 16 "Jan Simon" wrote in message ... > Although the linked list is a good argument for the speed of a growing struct, the growing cell is a linked list also... ===== Don't follow. I thought the conclusion of previous posts was that cells are pointed to be contiguous arrays of pointers. Why would you need to maintain cell pointers as a linked list if they are to be continguously stored?

Everyone's Tags:

Separated by commas
Ex.: root locus, bode

What are tags?

A tag is like a keyword or category label associated with each thread. Tags make it easier for you to find threads of interest.

Anyone can tag a thread. Tags are public and visible to everyone.