Hard Light Productions Forums

Off-Topic Discussion => Programming => Topic started by: z64555 on January 23, 2017, 10:29:49 am

Title: Vectorized Lists
Post by: z64555 on January 23, 2017, 10:29:49 am
Prereqs: Watch this talk by Google's Chandler Carruth (http://channel9.msdn.com/Events/CPP/C-PP-Con-2014/Efficiency-with-Algorithms-Performance-with-Data-Structures)

What I call a "vectorized list" is a list that's implemented within a vector or array container, and thus can be cached on the CPU if the list is small. I first heard of the idea from that very video, and was excited when I figured out how to implement it. (excited enough to write this post, at least)

As a quick refresher, lists operate by having their elements contain a reference (or link) to another element that's next in line. Doubly linked lists have both a forward reference (next in line) and a reverse (previous in line) reference. In order to access a specific element, you must iterate through each of the members. Thus, random access is slow, but you can insert/remove an element anywhere within the list far quicker than it would be to do so in an array or vector. Another disadvantage lists have is that the elements can potentially span the entire memory space, disrupting CPU caches whenever the next list element is not within the cache. A worst-case scenario with lists is when each iteration between elements is a cache miss.

Cache misses with lists can be mitigated if they are restrained within a contiguous memory block, such as when they are in an array or vector container. Even though cache misses are still possible with large lists, the issue is less problematic than it would be if the list elements were scattered in memory since there is a higher probability of a cache hit. Should the data be significantly huge, the links could be held in a separate array/vector for faster iteration/seeking at the possible cost of a slower access time.

FreeSpace already has an example of a vectorized list with a fixed size, just take a look at object.cpp. The object class has two pointers in it that point to other objects, and all object instances are contained with the Objects[] array. There are three notable object instances that are not within the array, which are obj_free_list, obj_used_list, and obj_create_list.

These objects are in fact heads to lists within the Objects array, and have a specific function within the object management code.

For FSO's purposes, the links there serve to speed up execution time when it needs to do iterations across Object[] for its various tasks. For example, when doing physics and AI routines, you'd iterate over the obj_used_list and not worry about needing to check if the instance is used or not. Likewise, if you are creating a new object you'd use the first element on the obj_free_list, init a few things, then pass it to the obj_create_list for some potential parallel processing goodness. (As far as I'm aware, FSO doesn't do a parallel process here, it allocates objects first then proceeds to create them, but it is entirely possible for the two to run in separate threads).

To get a vectorized list with a flexible size, you'd use a vector instead of an array as the container, and then implement the various management functions. Of particular interest is the resize() function:

Title: Re: Vectorized Lists
Post by: z64555 on January 23, 2017, 01:12:34 pm
I was thinking it may be possible to upgrade bmpman to use this sort of structure.

Code: [Select]
struct bmp_slot {
bmp_slot* prev;    // For lists, link to previous element
bmp_slot* next;   // For lists, link to next element
size_t size;          // Nonzero size of the memory block, if this slot is the head. Otherwise zero.
// ... And all the rest of the data

Animated textures require a contigous block of bmpman slots, so instead of managing individual slots, the bmpman code will be working with blocks of them.

This would work like so:
The used_list and free_list (and create_list, if necessary) would address slots by the block, instead of the individual frame.

Initially, bmpman would start with one large empty block. As blocks are used up the initial free block becomes smaller. As blocks are freed, they are linked to the free_list and merged with other free blocks if they are adjacent.

Block merges may only occur on free memory.

Blocks can check for adjacency by comparing the distance between the block head's locations (pointer arithmetic, or ideally with std::distance) to see if it matches the size of the lowest slot in memory.

Merging blocks. The lowest block in memory becomes the head, with its "next" point set to the higher block's "next" value. The size is added together, and lastly the higher block's values are nullfied.

Splitting blocks. Bmpman advances to where the split will occur and makes a new head there. It references the old head as its previous, and the old head's next as its next, and its size is calculated as the difference between the distance to the old head and the old head's size. The old head's next is then set to the new head's location, and the old head's size is reduced accordingly. The old head can now be used.
Title: Re: Vectorized Lists
Post by: z64555 on February 25, 2021, 10:29:47 pm
Addendum: Reference Validation.

When I wrote this post some time ago, it was extrapolating on observations of a non-dynamic array and the possibility of vectorizing a list.  One crucial item that I failed to cover was validation of the points during resize operations.

The problem being, is that should a vector ever change in size or moved to a different location in memory, all of the pointers within the list will still be pointing at the memory locations they originally were with, and thus no longer valid.

Before getting into some possible workarounds to this, I'll mention several cases.

About Iterators
Iterators are in the same boat as pointers and can be invalidated according to certain situations, and usually mirror the behavior of pointers.

About other containers
Other standard containers may handle pointers and iterators differently, you'll have to consult their formal specification to see the differences.  But in general, whenever a container has to be moved or resized, the pointers/iterators will be invalidated.

Keeping References Valid
In many cases, the only sure method of keeping every reference to a valid location, and the correct location, is to somehow iterate through the list and re-validate the pointers to the new location.

Pointer arithmetic is an option in this case!  Given the location of the 0th element in the list, you can compute the distance to the next element in the list and transfer it over. Yes, its still pointer arithmetic here, so you have to step around it carefully like glass.

Iterators might be able to do arithmetic as well, but you have the problem of doing cross-referencing between the list and vector of what is where.

Finally, to avoid pointer arithmetic altogether you would use indexes in the list implementation instead of pointers.  These indexes are the location in the vector that the linked element is at.  Indices won't be invalidated during moves or copies of the vector, but will get invalidated when insertions or deletions occur in the vector.  Indices are not affected by changes in the list.
Title: Re: Vectorized Lists
Post by: katepalmer on March 12, 2021, 07:32:38 am
Would be better to find a software engineer for this task.

edit by ngld: removed spam link
Title: Re: Vectorized Lists
Post by: Phantom Hoover on March 12, 2021, 07:49:44 am
...what do you think SCP developers do for their day jobs?
Title: Re: Vectorized Lists
Post by: Strygon on March 12, 2021, 07:56:07 am
Develop SCP Foundation articles of course.  ;)
Title: Re: Vectorized Lists
Post by: The E on March 12, 2021, 08:48:03 am
Would be better to find a software engineer for this task.

hopefully one who's better at their job than whoever programmed this bot.

or at the very least more ethical.