Language: C/C++
(Pre-requisites: moderate understanding of the C 99 and ISO C++ languages)
So, Thearis jumped in IRC a little while ago asking if it was possible to create a 2D array (of some sort) that was dynamic in one dimension but fixed in the other. Before I get to that, I need to go over exactly what an array is, and how it is accessed.
Back before the advent of containers, vectors, objects, and classes, there where only a few ways of storing your stuff in ANSI-C, a single variable, an array, or a structure.
Variables and their implementation in Memory
Variables store a single value in memory, and how much memory said variable actually takes up depends on the system and the nature of the variable's data type. To avoid a somewhat lengthy explanation of integer sizes vs. system types, I'm going to use the following C++x11 datatypes in the discussions:
int8_t // 8-bit, 1 byte integral type
int16_t // 16 bit, 2 yte integral type
int32_t // 32 bit, 4 byte integral type
Arrays and their implementation in Memory
In memory, arrays store their values
right next to each other.
int16_t element[6]; // an array of 6 elements
// In memory, this is the same as:
int16_t element_0;
int16_t element_1;
int16_t element_2;
int16_t element_3;
int16_t element_4;
int16_t element_5;
If you wanted to access an element in the array, you could do one of two things:
int16_t element[6];
element[ element_number ] = 0;
// Or
int16_t *element_pointer = starting_address_of_array;
*(element_pointer + ( element_number * int_16t_size_in_bytes )) = 0;
The first notation should be very familiar to people who mess with C/C++, as it's very clean and quick to access.
The second notation reveals a nasty method of accessing the elements: pointer arithmetic. The pointer is first assigned the memory address of the first element in the array. Each individual element is then accessed by multiplying the element's index number by the size, in bytes, of the data type of the array. This method works because each integer value is butted up next to each other in memory, and each integer value has a constant (and known) size in bytes.
Now, for something
completely different:
int16_t array[6];
int16_t* element = array;
Yes, that's right, "array" is, in actuality, a pointer, which means:
int16_t array[6];
int16_t* element = array;
array[5] = 6;
return *(element + ( 5 * int16_t_size) );
The return value would not only be equal to 6, but it would be from the exact same memory location. These leads us to conclude that the [] operators are doing the pointer arithmetic behind the scenes,
and everything you thought you knew was wrong.
Vectors
In laymens terms, a vector can be considered as an array that automatically takes care of its size requirements. While this is somewhat true, std::vector's are actually objects that stores their data in a private array. Memory allocation/management is "automatically" done whenever an item is being added to the vector or when the vector is being resized. How it does this is fairly simple:
- First, it checks to see that it's internal array has enough space for the element(s) it is adding, or if doing a resize checks to see if the array is already that size. If it large enough for the new element(s), then the elements are added. Similarly, if the array is already the same size as the new requested size, then nothing is really done.
- However, if the array is too small for either case, a "hidden" function is called to create a new array on the stack that is the required size.
- If there is enough room on the stack for a newer, larger array, all current contents of the vector are copied over and any new elements are appened to the end.
- Finally, the old array is trashed/returned to the stack heap, and the internal array pointer is set to the address of the new array
Any future access to the vector will access the "new" array, instead of the "old" one. Vectors thus stay dynamic, because they're usually able to access enough space on the stack.
Vector Arrays
I now come to the topic of Vector arrays (or, arrays of vectors). This can be thought of as a 2 dimensional array with one dimension as a dynamic vector and the other a "fixed" array. An example of a vector array would be:
vector <int16_t> array[6];
Here, you can see already that the [] operator is going to cause mass confusion and headache when trying to manipulate the data you want.
Remember what I said earlier about the [] operator masking the pointer arithmetic? well, that's what happens here:
array[3] = some_vector;
What this does is access the
forth vector in the array and set it equal to another vector. Moving on:
array[3][1] = 5;
What this does is access the
second element in the
forth vector and set it equal to the number 5. The reason this works is because:
- array is a vector pointer, pointing at the first vector in a vector array.
- array[vector_index] is a vector in the array, and
- array[vector_index][element_index] is the element in that vector.
2D Vectors
The last thing I'd like to talk about is multi-dimensional vectors. These are very nasty, and should be avoided when possible. Read: get a friggin matrix library already.
Multi-dimensional vectors are achieved by making a vector of a vector. i.e.
vector< vector< int_16t > > data;
In this case, access to each individual element is done by liberal usage of the [] operator, as with multidimensional arrays. However, this can lead to problems because each vector is dynamic, which means it's wholly possible to end up with a "matrix" that has non-uniform rows and columns (like, a size of 5 entries on one row, a size of 2 entries on the next, and a size of 100 entries on the last.)
Initialization of the 2D vector would be like so:
vector< vector< int_16t > > data;
data.resize( first_dim_size ); // Make the first dimension the desired size
for( int_16t i = 0; i < first_dim_size; i++ ) // Make the second dimension the desired size on all of the first dimensional elements
{
data[i].resize( second_dim_size, 0 );
}
// As an illustration to the usage of 2D vectors:
for( int i = 0; i < first_dim_size; i++)
{
for( int j = 0; j < second_dim_size; j ++ )
{
data[i][j] = 0;
}
}
Yep, it's a big headache. Try to avoid such nonsense if you can.