Author Topic: Arrays, Vectors, and combinations thereof (Read 2147 times)

z64555 · **on:** July 21, 2012, 01:36:39 am

Language: C/C++
(Pre-requisites: moderate understanding of the C 99 and ISO C++ languages)

So, Thearis jumped in IRC a little while ago asking if it was possible to create a 2D array (of some sort) that was dynamic in one dimension but fixed in the other. Before I get to that, I need to go over exactly what an array is, and how it is accessed.

Back before the advent of containers, vectors, objects, and classes, there where only a few ways of storing your stuff in ANSI-C, a single variable, an array, or a structure.

Variables and their implementation in Memory
Variables store a single value in memory, and how much memory said variable actually takes up depends on the system and the nature of the variable's data type. To avoid a somewhat lengthy explanation of integer sizes vs. system types, I'm going to use the following C++x11 datatypes in the discussions:

Code: [Select]

int8_t     // 8-bit, 1 byte integral type
int16_t    // 16 bit, 2 yte integral type
int32_t    // 32 bit, 4 byte integral type

Arrays and their implementation in Memory
In memory, arrays store their values right next to each other.

Code: [Select]

int16_t element[6];  // an array of 6 elements

// In memory, this is the same as:
int16_t element_0;
int16_t element_1;
int16_t element_2;
int16_t element_3;
int16_t element_4;
int16_t element_5;

If you wanted to access an element in the array, you could do one of two things:

Code: [Select]

int16_t element[6];

element[ element_number ] = 0;

// Or

int16_t *element_pointer = starting_address_of_array;

*(element_pointer + ( element_number * int_16t_size_in_bytes )) = 0;

The first notation should be very familiar to people who mess with C/C++, as it's very clean and quick to access.

The second notation reveals a nasty method of accessing the elements: pointer arithmetic. The pointer is first assigned the memory address of the first element in the array. Each individual element is then accessed by multiplying the element's index number by the size, in bytes, of the data type of the array. This method works because each integer value is butted up next to each other in memory, and each integer value has a constant (and known) size in bytes.

Now, for something ~~completely~~ different:

Code: [Select]

int16_t array[6];
int16_t* element = array;

Yes, that's right, "array" is, in actuality, a pointer, which means:

Code: [Select]

int16_t array[6];
int16_t* element = array;

array[5] = 6;
return *(element + ( 5 * int16_t_size) );

The return value would not only be equal to 6, but it would be from the exact same memory location. These leads us to conclude that the [] operators are doing the pointer arithmetic behind the scenes, ~~and everything you thought you knew was wrong~~.

Vectors
In laymens terms, a vector can be considered as an array that automatically takes care of its size requirements. While this is somewhat true, std::vector's are actually objects that stores their data in a private array. Memory allocation/management is "automatically" done whenever an item is being added to the vector or when the vector is being resized. How it does this is fairly simple:

First, it checks to see that it's internal array has enough space for the element(s) it is adding, or if doing a resize checks to see if the array is already that size. If it large enough for the new element(s), then the elements are added. Similarly, if the array is already the same size as the new requested size, then nothing is really done.
However, if the array is too small for either case, a "hidden" function is called to create a new array on the stack that is the required size.
If there is enough room on the stack for a newer, larger array, all current contents of the vector are copied over and any new elements are appened to the end.
Finally, the old array is trashed/returned to the stack heap, and the internal array pointer is set to the address of the new array

Any future access to the vector will access the "new" array, instead of the "old" one. Vectors thus stay dynamic, because they're usually able to access enough space on the stack.

Vector Arrays
I now come to the topic of Vector arrays (or, arrays of vectors). This can be thought of as a 2 dimensional array with one dimension as a dynamic vector and the other a "fixed" array. An example of a vector array would be:

Code: [Select]

vector <int16_t> array[6];
Here, you can see already that the [] operator is going to cause mass confusion and headache when trying to manipulate the data you want.

Remember what I said earlier about the [] operator masking the pointer arithmetic? well, that's what happens here:

Code: [Select]

array[3] = some_vector;
What this does is access the forth vector in the array and set it equal to another vector. Moving on:

Code: [Select]

array[3][1] = 5;
What this does is access the second element in the forth vector and set it equal to the number 5. The reason this works is because:

array is a vector pointer, pointing at the first vector in a vector array.
array[vector_index] is a vector in the array, and
array[vector_index][element_index] is the element in that vector.

2D Vectors
The last thing I'd like to talk about is multi-dimensional vectors. These are very nasty, and should be avoided when possible. Read: get a friggin matrix library already.

Multi-dimensional vectors are achieved by making a vector of a vector. i.e.

Code: [Select]

vector< vector< int_16t > > data;
In this case, access to each individual element is done by liberal usage of the [] operator, as with multidimensional arrays. However, this can lead to problems because each vector is dynamic, which means it's wholly possible to end up with a "matrix" that has non-uniform rows and columns (like, a size of 5 entries on one row, a size of 2 entries on the next, and a size of 100 entries on the last.)

Initialization of the 2D vector would be like so:

Code: [Select]

 vector< vector< int_16t > > data;

data.resize( first_dim_size );  // Make the first dimension the desired size

for( int_16t i = 0; i < first_dim_size; i++ )    // Make the second dimension the desired size on all of the first dimensional elements
{
    data[i].resize( second_dim_size, 0 );
}

// As an illustration to the usage of 2D vectors:

for( int i = 0; i < first_dim_size; i++)
{
    for( int j = 0; j < second_dim_size; j ++ )
    {
        data[i][j] = 0;
    }
}

Yep, it's a big headache. Try to avoid such nonsense if you can.

The E · **Reply #1 on:** July 21, 2012, 02:21:44 am

Little question. If you're using C++11, why aren't you using std::array and iterators?

What's the difference here, you may ask? First of all, std::array is just syntactic sugar that adds iterator types for arrays. It is a lot like std::vector, but with a fixed size.

Iterators are special pointers to vector/array members; the important part for them is that the ++ operator is overloaded to mean "move this iterator to the next element in the array/vector", and there's a special std::(array|vector)::end() iterator that signals the end of a container.

Usage is like this:

Code: [Select]


#include <array>
#include <vector>

using namespace std; //So we don't have to prefix everything with std

void main() {
array<int_16t, 6> element16; //Declares a 6-element array

//Let's declare a multidimensional array with a dynamic second dimension.
array<vector<int_16t>, 6> multidim;

//use iterators to iterate through the array to set the size of the vectors to some predetermined sane value.
//Do note that I use ::reserve() here; which unlike ::resize() does not actually change the vector size,
//but rather tells the vector to reserve a block of memory to hold a vector of the requested size. By doing this, I can be sure that the vector has enough space to
//accommodate n elements without having to do an expensive malloc()/memmove() operation.

for(auto arrit = multidim.begin(); arrit != multidim.end(); ++arrit) {
//Several notes: "auto"  is a redefined keyword in C++11. It tells the compiler to make the variable that follows
//into a variable of the type returned by the next assignment; this is particularly handy with template classes and
//iterators, since it saves me the hassle of having to declare it explicitly as "array<vector<int_16t>, 6>::iterator arrit"
//thus making it easier to later change the type of multidim
//I used a preincrement here, since preincrement is faster and less memory intensive than postincrement when using
//iterators.
    arrit->reserve(32);
}

//Member access can be done through [x][y], however this only works when you know that neither x nor y are 
//out of bounds. For safety, use of the std::(array|vector)::at() function is recommended, which will throw an
//exception if out-of-bounds access is attempted.
//It is much safer to use iterators here as well.

//In this example, let us assume we want to access an element at [4][16].
size_t x = 4;
size_t y = 16;
//size_t is typedef'ed as an unsigned integer, and is the type returned and expected as input by all array/vector
//accessors

//The simplest and fastest way to access it is using the [][] operators.
int_16t result = multidim[x][y];

//A bit more complex, and a bit more safe, is using the ::at() function; however, its safety depends on you implementing
//an exception handler
int_16t result2 =multidim.at(x).at(y);

//Finally, we can use iterators, if we want absolute safety and don't mind the cost of having to iterate
//over the array

int_16t result3 = 0;
for (auto arrit = multidim.begin(); arrit != multidim.end(); ++arrit) {
   if (arrit - multidim.begin() == x) {
      for (auto vecit = arrit->begin(); vecit != arrit->end(); ++vecit) {
          if (vecit - arrit->begin() == y) {
//NOTE: This kind of pointer arithmetic is only valid as long as the iterators are valid. Iterators become invalid
//whenever the vector's size is changed.
             result3 = *vecit;
          }
      }
   }
}

}

z64555 · **Reply #2 on:** July 21, 2012, 01:59:40 pm

Sorry for the confusion, E, but this isn't using the C++11 language. Yes, the C++11 stdint notation is used, but that was for a demonstration so as to prevent going into the whole "32-bit systems have 32 bit integers, and 64-bit systems have 64 bit integers" discussion.

The concept of int16_t, int32_t and the like have been around. The JSF AV rules, (and probably MISRA rules too) have a requirement that a standardtypes.h be defined for all projects... which essentially typedef the uint16, int16, etc. types to be used within the projects so as to allow cross-system development.

I view the std::(array | vector)::at() function to be unnecessary when accessing an array within the for loops, because 1. You know both the beginning and the end of the array | vector, and 2. You know that you can access the next element in the array | vector by simply incrementing/decrementing it (depending on where you start).

That being said, I believe the std::(array | vector)::at() function should only be used in cases where it is unknown if the iterator or index is valid, like how you did in your example:

Code: [Select]

size_t x = 4;
size_t y = 16;

int16_t result = multidim[x][y];   // Error prone, It's currently unknown if x or y will be valid indices
int16_t result = multidim.at( x ).at( y );  // Far safer, since std::array::at( size_t index )  verifies if the index is legitimate

I also don't think iterators should be used with array and vector types, since really they're pointers with special type sensitive increment/decrement operators.

News:

Author Topic: Arrays, Vectors, and combinations thereof (Read 2147 times)

z64555

Arrays, Vectors, and combinations thereof

The E

Re: Arrays, Vectors, and combinations thereof

z64555

Re: Arrays, Vectors, and combinations thereof