Advantages of Using Categorical Arrays
Natural Representation of Categorical Data
categorical
is a data type to store data
with values from a finite set of discrete categories. One common alternative
to using categorical arrays is to use character arrays or cell arrays
of character vectors. To compare values in character arrays and cell
arrays of character vectors, you must use strcmp
which
can be cumbersome. With categorical arrays, you can use the logical
operator eq
(==
) to compare
elements in the same way that you compare numeric arrays. The other
common alternative to using categorical arrays is to store categorical
data using integers in numeric arrays. Using numeric arrays loses
all the useful descriptive information from the category names, and
also tends to suggest that the integer values have their usual numeric
meaning, which, for categorical data, they do not.
Mathematical Ordering for Character Vectors
Categorical arrays are convenient and memory efficient containers
for nonnumeric data with values from a finite set of discrete categories.
They are especially useful when the categories have a meaningful mathematical
ordering, such as an array with entries from the discrete set of categories {'small','medium','large'}
where small
< medium < large
.
An ordering other than alphabetical order is not possible with character arrays or cell arrays of character vectors. Thus, inequality comparisons, such as greater and less than, are not possible. With categorical arrays, you can use relational operations to test for equality and perform element-wise comparisons that have a meaningful mathematical ordering.
Reduce Memory Requirements
This example shows how to compare the memory required to store data as a cell array of character vectors versus a categorical array. Categorical arrays have categories that are defined as character vectors, which can be costly to store and manipulate in a cell array of character vectors or char
array. Categorical arrays store only one copy of each category name, often reducing the amount of memory required to store the array.
Create a sample cell array of character vectors.
state = [repmat({'MA'},25,1);repmat({'NY'},25,1);... repmat({'CA'},50,1);... repmat({'MA'},25,1);repmat({'NY'},25,1)];
Display information about the variable state
.
whos state
Name Size Bytes Class Attributes state 150x1 16200 cell
The variable state
is a cell array of character vectors requiring 17,400 bytes of memory.
Convert state
to a categorical array.
state = categorical(state);
Display the discrete categories in the variable state
.
categories(state)
ans = 3x1 cell
{'CA'}
{'MA'}
{'NY'}
state
contains 150 elements, but only three distinct categories.
Display information about the variable state
.
whos state
Name Size Bytes Class Attributes state 150x1 476 categorical
There is a significant reduction in the memory required to store the variable.
See Also
Related Examples
- Create Categorical Arrays
- Convert Text in Table Variables to Categorical
- Compare Categorical Array Elements
- Access Data Using Categorical Arrays