Author Topic: A subtle bug for you all (Read 12589 times)

portej05 · **on:** October 24, 2009, 11:11:58 am

Lets play spot the bug in this section of code:

// HACK - Create a dummy structure to pass to the XyzConnect
// function to avoid AV within the function.
int dummy = 0;    
if ( NO_ERROR != ( XyzConnect( 0, L"", L"", (void**)&dummy ) )
{
    TRACE( L"XyzConnect failed." );
    return FALSE;
}

A hint: Whether or not a crash occurs is dependent on the order that variables are initialised in the function with that code.

If you think you've spotted the answer, click here to find out.

This is done all over the place in the FSO engine, and is one reason why it needs to be cleaned up if we ever want to get a 64bit build working.

Mongoose · **Reply #1 on:** October 24, 2009, 12:10:23 pm

So something that works just fine in a 32-bit environment can crash and burn in a 64-bit environment? That seems foolishly shortsighted from a standards standpoint.

portej05 · **Reply #2 on:** October 24, 2009, 12:25:22 pm

It is actually spelt out in the standards

int doesn't have to be any particular size

From section 5.2.4.2.1 'Sizes of integer types' of ISO/IEC 9899:TC2

Quote

The values given below shall be replaced by constant expressions suitable for use in #if
preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the
following shall be replaced by expressions that have the same type as would an
expression that is an object of the corresponding type converted according to the integer
promotions. Their implementation-defined values shall be equal or greater in magnitude
(absolute value) to those shown, with the same sign.

Quote

minimum value for an object of type int
INT_MIN -32767 // −(215 − 1)
— maximum value for an object of type int
INT_MAX +32767 // 215 − 1

i.e. int can be anything from 16 bits up.

Flipside · **Reply #3 on:** October 24, 2009, 02:59:51 pm

Stuff like this is why I try to avoid re-casting in Java, with a bit of careful work, and a lot of interfaces, a lot of traps can be avoided. Java seems, to me, to be particularly prone to falling into the casting trap if you don't think about things before you start.

One of the biggest problems I ever faced was for a city-builder I was doing for my University project, I used an array of Building Object references to hold the map, and there were were multiple interfaces for buildings, such as Environment_Structure, Living_Structure etc, however, the awkward part was, you needed a seperate class for each type of building, each extending its required interface(s), (A factory would be an Industry_Structure for example, but if you wanted a live-in Factory you'd have to create a new Class that extended Living_Structure and Industry_Structure) which wasn't ideal, but did allow the game to keep simple lists of all in-game buildings that used a particular Interface automatically, and because it used interfaces, you knew the Building object you were dealing with provided certain information, it didn't matter what exact kind of structure it was

Edit: Also thanks to Java's lack of unsigned-byte handling, the bytemaps used to hold some data turned out to be much harder to work with than they needed to be, there was a lot of recasting byte to int and vice versa, which pleased me not a lot.

portej05 · **Reply #4 on:** October 25, 2009, 01:40:17 pm

OK, so you folks got number 1.
Now lets try number 2 (to which I'll post the answer in about 12 hours):

Code: [Select]

struct Data
{
  int iterations;
  int timeofday;
};

HANDLE hThread;
unsigned threadID;

unsigned __stdcall ThreadedFunc( void* p )
{
    printf("Iterations: %d\nTime of Day: %d\n", (Data*)p->iterations, (Data*)p->timeofday );
    return 0;
} 

void start_thread( )
{
    Data d;
    d.iterations = 100;
    d.timeofday = 100;
    hThread = (HANDLE)_beginthreadex( NULL, 0, &ThreadedFunc, &d, 0, &threadID );
}

int main()
{ 
    start_thread( );
    WaitForSingleObject( hThread, INFINITE );
    CloseHandle( hThread );
}

The above code crashes randomly. Sometimes it works, sometimes it doesn't.
Why? (and yes, I found this gem in an application)

Hery · **Reply #5 on:** October 25, 2009, 01:58:50 pm

Pointer + stack + threads + typing code without thinking = bug
(yep, "typing code without thinking" alone also equals "bug")

When start_thread() returns pointer to d is no longer valid, that's not what ThreadedFunc() expects. We don't know if ThreadedFunc() is executed before or after start_thread() returns (or at the same moment if CPU count > 1), that's why it crashes randomly.

portej05 · **Reply #6 on:** October 26, 2009, 12:42:06 am

or Hery can post the answer 18 minutes after I post the question

.

OK, (just to keep you on your toes)

Two more brief little puzzlers for you.

Example 1:

Code: [Select]

class Object
{
 int value;
... functions ...
};

class Ship : public Object
{
 std::string hello_world;
... functions ...
};


int main( int argc, char** argv )
{
 Object* sh = new Ship( );

 delete sh;
 return 0;
}

The above program compiles and runs fine (just imagine a scenario where you might want to do this outside of this extremely simplified example).
But it leaks memory. A lot of memory. Why, and how do you fix it?

Example 2:

Code: [Select]

class Object
{
 int value;
public:
 ~Object( )
 {
  Cleanup( );
 }

 void Clean( )
 {
  Cleanup( );
 }

 virtual void Cleanup( )
 {
 }
};

class Ship : public Object
{
 void* awesome_data;
public:
 virtual void Cleanup( )
 {
  printf("%p\n", awesome_data );
 }
};


int main( int argc, char** argv )
{
 Object* sh = new Ship( );
 sh->Clean( );

 delete sh;
 return 0;
}

Questions:
1) Why does calling sh->Clean( ); print out the current value of awesome_data?
2) Given that 1) works, why doesn't Cleanup( ) get called during the destruction phase?

(If you need a hint, Pete Isensee has a fantastic presentation on this called 'The Beauty of Destructors')

EDIT: I gave the references.

Spicious · **Reply #7 on:** October 26, 2009, 01:14:07 am

Non-virtual destructors = fail (at least when using inheritance). More specifically for the second one, I'm going to assume that like in constructors, virtual functions are a terrible idea during destructors. It'll probably end up calling the Object version of Cleanup().

portej05 · **Reply #8 on:** October 26, 2009, 01:17:18 am

Spicious, that is indeed correct sir!

virtual functions are a terrible idea during destructors, even virtual destructors.
The virtual table gets destroyed pretty early in the destruction process.
Steer clear.

Destruction order is [Pete Isensee]:
1. Destructor body
2. Data members in reverse order of declaration
3. Direct non-virtual base classes in reverse order
4. Virtual base classes in reverse order

Note that this means that encapsulated objects do actually get destroyed.

Slides 26 and 27 of 'The Beauty of Destructors' are some good recommendations.

Mongoose · **Reply #9 on:** October 26, 2009, 01:21:30 pm

Man, now I wish I remembered anything at all from my programming structures course. All of the class stuff apparently went in one ear and out the other. It didn't help that the professor was terrible, I guess.

Just to clarify, for the first one, is the issue that you're only deleting the pointer to the ship class, and not the allocated memory itself?

portej05 · **Reply #10 on:** October 26, 2009, 01:27:10 pm

Quote from: Mongoose on October 26, 2009, 01:21:30 pm

Just to clarify, for the first one, is the issue that you're only deleting the pointer to the ship class, and not the allocated memory itself?

No, the issue is that the destructor is non-virtual.
If the destructor was virtual, the Ship destructor would be called.

As it is, the non-virtual Object destructor is called only.

Mongoose · **Reply #11 on:** October 26, 2009, 01:43:50 pm

...huh. Yeah, I'll definitely have to read that textbook again.

portej05 · **Reply #12 on:** October 29, 2009, 08:57:17 am

Pop quiz:

1) What is the insertion complexity of (and why!):

Code: [Select]

std::vector, std::list, std::map
2) Which of the following has better caching characteristics (and why!):

Code: [Select]

std::vector, std::list, std::map
3) On which of the following is random access not a O(1) operation (and why!)?

Code: [Select]

int arr[500]; std::vector, std::list
4) Given the above information, why do iterators become invalid after a non-const operation on a data structure?

5) An std::vector with 500 elements in it does not use the same amount of memory (most of the time...) as sizeof(element)*500. Why!

(A basic quiz just to keep the programming section interesting for a while

)

Flipside · **Reply #13 on:** October 29, 2009, 03:21:05 pm

Well, I'd say for 3, that a list would be the answer, since, at least in my experience, lists are designed to be iterated through, not accessed non-sequentially.

True, there are situations where you can calculate the pointer position and pull directly from a list, but if you can do that, why not use an array?

Aardwolf · **Reply #14 on:** October 29, 2009, 05:02:18 pm

1:

vector: O(n)
list: O(1)
map: O(1)

And I'm too lazy to answer why

2:

Idunno what that means

3:

list, because it has to iterate through to the element every time

4:

maybe you deleted the next element, or the previous one, that an iterator pointed to? or the current? something liek that
or changed it by inserting a new element

5:

it also stores the size, and maybe some other stuff

Flipside · **Reply #15 on:** October 29, 2009, 06:11:36 pm

You know, thinking on the subject of Hierarchy in classes, it wonder if it reveals anything about the political position of most language designers? Or possibly the working conditions?

Look at it this way, a higher Class is completely ignorant to a connected lower Class, but that Class can do everything the higher class can do, and some more...

I know that's not related, but I'm really not certain it deserves a new thread...

Iss Mneur · **Reply #16 on:** October 30, 2009, 12:22:34 am

Quote from: portej05 on October 29, 2009, 08:57:17 am

Pop quiz:

Spoiler:

To start this depends on the STL being used, STL's are not required to use the same underlying data structures to implement the data structure.

Quote from: portej05 on October 29, 2009, 08:57:17 am

1) What is the insertion complexity of (and why!):
Code: [Select]
std::vector, std::list, std::map

Spoiler:

std::vector = O(n) - O(1) assuming that std::vector is implemented as an array and the insertion location is empty. But normally O(n) because of the the shifting of the data.
std::list = O(n) - Assuming std::list is implemented as a linked list, O(n), because as a linked list the structure has to scan down the structure to the insertion point. Unless inserting at the start or end of list (assuming there is a tail pointer), O(1), but this is an edge case, general is still O(n).
std::map = O(1) - Assuming that std::map is implemented as a hashmap, and the hashing function properly distributes, insertions are always order O(1).

Quote from: portej05 on October 29, 2009, 08:57:17 am

2) Which of the following has better caching characteristics (and why!):
Code: [Select]
std::vector, std::list, std::map

Spoiler:

Not clear on what you mean, but assuming you mean better for using as a caching system, it would be std::map because if it was implemented as a hashmap all accesses would be O(1).

Quote from: portej05 on October 29, 2009, 08:57:17 am

3) On which of the following is random access not a O(1) operation (and why!)?
Code: [Select]
int arr[500]; std::vector, std::list

Spoiler:

std::list because std::list is normally implemented as a linked list and as such you have to run down the list to get your access location resulting in O(n). The array and std::vector implemented as an array would be both be just pointer dereferencing which is O(1).

Quote from: portej05 on October 29, 2009, 08:57:17 am

4) Given the above information, why do iterators become invalid after a non-const operation on a data structure?

Spoiler:

Non-const operations can modify the data structure which may move the data around in memory causing the pointers that the iterators have to point to the wrong thing location.

Quote from: portej05 on October 29, 2009, 08:57:17 am

5) An std::vector with 500 elements in it does not use the same amount of memory (most of the time...) as sizeof(element)*500. Why!

Spoiler:

The data structures provided by STL's normally store meta data about what is in the structure. In std::vector's case it needs to know how large its underlying array is so that it will not go out of bound. Also if the structure contains NULL able elements like the pointers that are allowed to be null and basic types in which 0 or some other magic value is valid, the vector then has no way of knowing how many elements are actually stored in the structure.

Not to mention that having to scan the structure to know how many elements are contained is an O(n) operation so most implemenations of std::vector store the current max size and the current number of elements in addition to the data.

Quote from: Flipside on October 29, 2009, 06:11:36 pm

You know, thinking on the subject of Hierarchy in classes, it wonder if it reveals anything about the political position of most language designers? Or possibly the working conditions?

Look at it this way, a higher Class is completely ignorant to a connected lower Class, but that Class can do everything the higher class can do, and some more...

I know that's not related, but I'm really not certain it deserves a new thread...

I think you are thinking into this too much. The real problem is that scanning the function tables of all derived classes is an expensive operation so C++ does not do this by default but allows the programmer to use the 'virtual' keyword to selectively turn on this behavior.

portej05 · **Reply #17 on:** October 30, 2009, 01:22:06 am

IssMneur: ISO/IEC 14882:2003(E) specifies (page 489)

Quote

The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().

So I should have said any conforming implementation

You might be surprised to learn that a vector supports amortised constant time inserts and erases at the end:

Quote

In addition, it supports (amortized) constant time insert and erase operations at the end; insert and erase in the middle take linear time.

(we'll come back to this for question 5)
I'm pretty sure IssMneur nailed it for question 1, except that a vector is 'normally' constant time (according to the standard).

2) By this I meant for regularly accessed data in a tight loop. Vector has significantly better data locality, so will perform significantly better than other data structures where a caching system is provided that is significantly faster than accesses into main memory.

3) No surprises - it's the list!

4) [quote IssMneur]
Non-const operations can modify the data structure which may move the data around in memory causing the pointers that the iterators have to point to the wrong thing location.[/quote]

5) This one you're both partially correct, but it's only a small overhead required for housekeeping.
Going back to question 1, the insert time for a vector is amortised constant.
This is achieved by OVERALLOCATING each time a reallocation and move is done.
So if you're pushing a 100 1KB objects onto a vector, you might be surprised to learn that the final vector size might be something like 150 elements (again, depends on the implementation).
Write yourself a little program comparing the return values from vector::capacity and vector::size and you'll see what I mean here.

I'm going to have to think to come up with something more challenging for you folks!

Flipside · **Reply #18 on:** October 30, 2009, 04:14:22 am

Quote from: Iss Mneur on October 30, 2009, 12:22:34 am

Quote from: Flipside on October 29, 2009, 06:11:36 pm
You know, thinking on the subject of Hierarchy in classes, it wonder if it reveals anything about the political position of most language designers? Or possibly the working conditions?

Look at it this way, a higher Class is completely ignorant to a connected lower Class, but that Class can do everything the higher class can do, and some more...

I know that's not related, but I'm really not certain it deserves a new thread...

I think you are thinking into this too much. The real problem is that scanning the function tables of all derived classes is an expensive operation so C++ does not do this by default but allows the programmer to use the 'virtual' keyword to selectively turn on this behavior.

To be honest, that was intended more as a humorous observation than a genuine attempt to look into the mentality behind language design

But, admittedly, I probably should have made that clear

Iss Mneur · **Reply #19 on:** October 30, 2009, 01:02:41 pm

Quote from: Flipside on October 30, 2009, 04:14:22 am

Quote from: Iss Mneur on October 30, 2009, 12:22:34 am
Quote from: Flipside on October 29, 2009, 06:11:36 pm
You know, thinking on the subject of Hierarchy in classes, it wonder if it reveals anything about the political position of most language designers? Or possibly the working conditions?

Look at it this way, a higher Class is completely ignorant to a connected lower Class, but that Class can do everything the higher class can do, and some more...

I know that's not related, but I'm really not certain it deserves a new thread...

I think you are thinking into this too much. The real problem is that scanning the function tables of all derived classes is an expensive operation so C++ does not do this by default but allows the programmer to use the 'virtual' keyword to selectively turn on this behavior.

To be honest, that was intended more as a humorous observation than a genuine attempt to look into the mentality behind language design But, admittedly, I probably should have made that clear

I wasn't sure if you were being serious or not, but because that was not the first time I had seen that particular comment (not from you, just in general) I just figured that I would clarify it for everyone.

portej05: I know some of this we already talked about last night on IRC, but I figured I should clarify a few things.

Quote from: portej05 on October 30, 2009, 01:22:06 am

IssMneur: ISO/IEC 14882:2003(E) specifies (page 489)
Quote
The elements of a vector are stored contiguously, meaning that if v is a vector<T, Allocator> where T is some type other than bool, then it obeys the identity &v[n] == &v[0] + n for all 0 <= n < v.size().

So I should have said any conforming implementation

You might be surprised to learn that a vector supports amortised constant time inserts and erases at the end:
Quote
In addition, it supports (amortized) constant time insert and erase operations at the end; insert and erase in the middle take linear time.

(we'll come back to this for question 5)
I'm pretty sure IssMneur nailed it for question 1, except that a vector is 'normally' constant time (according to the standard).

Like I said at the top of my reply, it depends on the STL that you are using.

To be honest, I have never read the C++ specs and wasn't aware that the specs even specified what the STL was required to do except possibly for the interface. I actually thought the STL was just a defacto standard and not formalized by the ISO, based on the STL's history.

That being said, because you didn't say where the insertions are, you cannot depend on the amortized cost because it only comes into effect when appending to the end of the vector. Which as I noted I my response: "O(1) assuming that std::vector is implemented as an array and the insertion location is empty. But normally O(n) because of the the shifting of the data." I was aware of the amortization which is what I was trying to get at when I said the insertion location is empty. Insertion location can be empty for two reasons, over allocation or gaps in the vectors array; though only over allocation will cause insertion location to be empty when using the insert() call. By gaps I mean you the coder know that you can just overwrite a specific location because you made what was there invalid in some way; that is, just setting at an index vector[index] = something.

Quote from: portej05 on October 30, 2009, 01:22:06 am

2) By this I meant for regularly accessed data in a tight loop. Vector has significantly better data locality, so will perform significantly better than other data structures where a caching system is provided that is significantly faster than accesses into main memory.

The CPU's cache! I was definitely not expecting you to be talking about that level, but as I said in my response, I was not clear as to what you were getting at, as caching can mean a lot of things and levels.

Quote from: portej05 on October 30, 2009, 01:22:06 am

5) This one you're both partially correct, but it's only a small overhead required for housekeeping.
Going back to question 1, the insert time for a vector is amortised constant.
This is achieved by OVERALLOCATING each time a reallocation and move is done.
So if you're pushing a 100 1KB objects onto a vector, you might be surprised to learn that the final vector size might be something like 150 elements (again, depends on the implementation).
Write yourself a little program comparing the return values from vector::capacity and vector::size and you'll see what I mean here.

Okay, I took it at face value, that is, why is the memory usage of 500 elements in an array and 500 elements in std::vector not exactly equal.

News:

Author Topic: A subtle bug for you all (Read 12589 times)

A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all

Re: A subtle bug for you all