Hard Light Productions Forums

Off-Topic Discussion => Programming => Topic started by: chief1983 on February 08, 2010, 03:30:17 pm

Title: Algorithms Quiz (anyone can give it a shot, please!)
Post by: chief1983 on February 08, 2010, 03:30:17 pm
I'm not an algorithms guy really, so this isn't exactly my forte.  But I'm supposed to be evaluating some answers to these quiz questions tomorrow, and if I could see some ideas of what you guys come up with before then, I'll be better prepared tomorrow.  So here's the questions.

Code: [Select]
1) Given a prime number, N, write an algorithm to get the next prime number.
2) Given an array of red, black and yellow balls, sketch an algorithm to produce 3 arrays of balls, each containing all balls of a single color.
3) Given a file of 50 million strings, sketch an algorithm to find a string that appears at least twice.

Anyone is welcome to give it a shot, whether you think it's a very good answer or not.  I'll be judging all sorts of answers tomorrow so the broader range of solutions I can be aware of the better.  Remember, no actual code required, pseudo code or just verbal explanation is fine, correctness proofs preferred.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Colonol Dekker on February 08, 2010, 03:44:32 pm
1.

if x= not x/1 or x/x :nervous:

I have no idea how to write these things down...
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Herra Tohtori on February 08, 2010, 04:06:41 pm
Does "next" imply that N1<N2? :drevil: Sequences can go to either direction, so I would say the question is a bit ambiguous if it doesn't define the direction (next doesn't really cover it, it just says "adjacent").

How large primes are we talking here? I mean you can bruteforce check whether N is a multiple of M (M<N/2), and if you get negative answer you do N++ and re-check. Can't bother to figure out how to do it with programming.

There are super computers looking for the next prime all the time. The biggest one they have determined to be a prime so far is N=(2^43112609)-1 so you can understand if checking if a large natural number is a prime can take a bit long. And brute forcing them is hopeless so I think there are advanced algorithms to determine prime candidates, but I haven't really ever had any passing interest in raw mathemathics like this. Aside from getting an array of random numbers, what use is knowing primes really, aside from just knowing that a really big number can only be divided by one and itself...

You can't really even put 2^43112609 - 1 into a comprehensible form so it's not even useable for that. Well, if you have a huge array of hard disk drives that can store the number, then I guess you could use it that way but really, there are better ways to gain random numbers. :p

The others are fundamentally easier. Abstractizing the properties (colour) of the balls and then sorting them by that property would not be that hard.

The final one is a bit trickier... 50 million strings takes a while to go through. Bruteforcing would be possible, I guess. Save the strings into an array, take the first string (string.0) and check if it appears again in the file (if string1==string2 etc.), if not, take the second string (string.1) and check match for all the string.(>1). It would, at least, work. But whether or not I could write it in formal logic language or any given programming language is a different thing, I hate being limited by syntax. :blah:
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: The E on February 08, 2010, 04:26:09 pm
The first one is deceptively simple; finding a next number is, after all, as easy as just incrementing n. Checking whether or not that number is prime, however, is expensive.

Problem 2:
Iterate through the source array. Based on what is found at the current index, copy its contents into one of three vectors. Once you're done, destroy the source array an return the three vectors.

Problem 3:
Use a sorting algorithm on the array. Iterate through the sorted array. If this.element == next.element, return this.element. Or do whatever else you want to do with the elements.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Sushi on February 08, 2010, 04:32:40 pm
IMO #3 is the only really interesting one. :)

For #3, just create a really big hashtable. Iterate through the list of strings once, checking to see if each string is in the hashtable already. If it isn't there, add it to the hashtable. If it is there, then you've found a duplicate. Complexity: O(N)... but you need a lot of memory to hold that hashtable.

Or, alternately, use a binary search tree and do the same thing. It has much less up-front memory cost, but the worst-case complexity is now O(NlogN).

For #2: The problem description leaves a lot to be desired: what property does a "ball" have for us to deal with? Assuming it has a unique ID and a color, though, you can just iterate over the original array, stuff each ball into a list corresponding to its color, then read out the lists into fresh color-specific arrays.

#1, as The E pointed out, is tricky because it's hard to make it computationally feasible. Naive solution is just to increment & check until you find a prime. More efficient solutions exist, but I don't remember what they are and I don't know enough theoretical math to figure it out on my own. :p
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: chief1983 on February 08, 2010, 04:48:29 pm
I found some random functions for the prime stuff online, first is C++ and second is Javascript.  I'm pretty sure they're naive though at first glance but I haven't looked closely.

Code: [Select]
int NextPrime( int n)
{
if( n == 0)
return 2;
else if (n == 2)
return 3;

n++;

for(int d=2; d < n; d++)
{
if( n%d == 0)
{
n++;
d=2;
}
}

return n;
}

function checkPrime(n)
{
var dummy = 0
for (var i = 2; i <= Math.sqrt(n); i++)
{
if ( Math.round(n/i) == n/i)
{
var result = n + " = " + i + "*" +n/i
//alert(result)
theForm.Output.value = result
dummy = 1
break
}
}
if (dummy == 0)
{
var result = n + " is prime."
theForm.Output.value = result
}
}
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: portej05 on February 08, 2010, 11:24:10 pm
I'll think about the others, but I've got a good one for number 3

If all you wanted to know was whether a string repeated (for number 3), this might work (depending on how long your strings are going to be):
Create a tree structure:
Code: [Select]
structure
  child_nodes
  end_node
  character
endstructure

Add string characters into the tree, where the character doesn't exist, create the child node.
At the end of each string, see if you've hit an end node.

EDIT: That last part doesn't work right. Damn, don't algorithm while tired :P
EDIT2: Although, if you mark a node as an end node, you can be sure that the above would work.
EDIT3: It works, and is quite quick too :)

Would probably be faster if I didn't use a vector and didn't use std::string, but it's proof of concept.
Code: [Select]
#include <iostream>
#include <fstream>
#include <string>
#include <vector>

#include "stdio.h"
#include "stdlib.h"

void challenge3( bool data );
void generate_data( const char* filename );
int rand_range( int max );

struct c3_node
{
std::vector< c3_node > children;
bool end;
char c;
};

const char alphabet[] = {"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"};

int rand_range( int max )
{
   int r = (int)((((double)rand( ))/((double)RAND_MAX+1))*(double)max);
   if ( r == max )
  r--;
   return r;
}


void generate_data( const char* filename )
{
std::ofstream f( filename );
/* Generate 100000 random phrases */
for ( size_t n = 0; n < 100000; n++ )
{
/* Random length for this phrase */
int length = rand_range( 200 );
if ( length == 0 )
length++;
std::string s;
s.reserve( length );
for ( int l = 0; l < length; l++ )
{
s += alphabet[ rand_range( sizeof( alphabet ) / sizeof( alphabet[ 0 ] ) - 1 ) ];
}
f << s << std::endl;
}
}

void challenge3( bool data )
{
if ( data )
{
generate_data( "data.txt" );
return;
}

c3_node root;
std::ifstream f( "data.txt" );
bool oldeof = false;
size_t count = 0;
while ( true )
{
count++;
std::string line;
std::getline( f, line );
if ( f.eof( ) && ( oldeof || line.length( ) == 0 ) )
break;

/* Process this string */
c3_node* cur = &root;
for ( size_t i = 0; i < line.length( ); i++ )
{
/* Find the current letter */
bool found = false;
for ( size_t curr_letter = 0; curr_letter < cur->children.size( ); curr_letter++ )
{
if ( cur->children[ curr_letter ].c == line[ i ] )
{
cur = &cur->children[ curr_letter ];
found = true;
break;
}
}

/* Insert unfound node */
if ( !found )
{
c3_node n;
n.end = false;
n.c = line[ i ];
cur->children.push_back( n );
cur = &cur->children[ cur->children.size( ) - 1 ];
}

if ( i == line.length( ) - 1 )
{
/* Are we at an end node? */
if ( cur->end )
std::cout << "Duplicate: " << line << " @ " << count << std::endl;
else
cur->end = true;
}
}
}

std::cout << "END" << std::endl;
}
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: portej05 on February 09, 2010, 01:50:00 am
Without further information on #2, the best I can do is to overallocate the 3 destination arrays and you end up with an O(N) 1-pass algorithm.

#1 is interesting - how big is N going to be?
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: karajorma on February 09, 2010, 01:53:07 am
For the prime number question you only need to check if the number is perfectly divisible by all the preceding prime numbers up to half it's value.

So what I'd do is create a vector and push the number 2 onto it.
I'd then start a loop.
Every time you find a number that isn't divisible by a numbers in the vector you add it to the vector.
Keep doing that until you get to N/2.

Now test the numbers after N against the vector.


Then I'd save the contents of the vector for next time. :p
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Spicious on February 09, 2010, 04:51:30 am
Probably Miller-Rabin test.

Trie with a count/bool on each node or a hash table.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Bobboau on February 09, 2010, 08:40:38 am
what use is knowing primes really, aside from just knowing that a really big number can only be divided by one and itself...

if you can quickly generate huge primes you can break RSA encryption and thereby win the internet, undoing such things as secure online banking or bill paying.

For the prime number question you only need to check if the number is perfectly divisible by all the preceding prime numbers up to half it's value.

square root of it's value.


as for the third question, as others have said use a hash table or a binary search tree.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: The E on February 09, 2010, 08:48:54 am
Errm. If you can quickly do factorization on large composites (the product of two primes multiplied), that's when you win the internet. That's somewhat harder.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Bobboau on February 09, 2010, 08:52:42 am
making a huge list of primes is a good way to do that.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Sushi on February 09, 2010, 10:20:40 am
I like Portej's #3 solution, that's pretty clever.

Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: portej05 on February 09, 2010, 10:49:03 am
Sushi - can't claim full credit, I saw a note somewhere in a book I was reading during ACM training
It's also a total monster on the memory use depending on string length :P (potentially 26^averagestringlength - I think for suitably random strings)
The hash table approach could be better for limited memory systems,
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: castor on February 09, 2010, 02:08:56 pm
Problem 3:
Use a sorting algorithm on the array. Iterate through the sorted array. If this.element == next.element, return this.element. Or do whatever else you want to do with the elements.
That's what I'd do too. Maybe adding another (preceding) sorting phase by string length.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: The E on February 09, 2010, 02:13:41 pm
Yeah, but Sushi's (and especially portej's) solutions are much better. In terms of runtime, portej's will probably win (if not in terms of memory usage, it will balloon very quickly for very large strings). Sushi's solution is a close second, since it has the potential to deliver false positives, but checking for those adds only a negligible overhead to the whole function.

Personally, I prefer portej's approach due to its elegance.

Oh, and sorting for length will lengthen your runtime for no real benefits.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: chief1983 on February 09, 2010, 07:58:43 pm
#3 was changed to 250 million 200 character strings, apparently the point was that loading all of it into memory wouldn't be feasible.  We did get some solutions, a lot of brute force attempts but one guy in particular had some very interesting solutions for most of the problems.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Spicious on February 10, 2010, 01:32:31 am
making a huge list of primes is a good way to do that.
Good luck storing them all.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: castor on February 10, 2010, 01:50:48 pm
Oh, and sorting for length will lengthen your runtime for no real benefits.
Well, to be more specific, I meant splitting the input to multiple arrays that can be sorted separately and in parallel.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: The E on February 10, 2010, 05:48:04 pm
Ah, but remember this post:
#3 was changed to 250 million 200 character strings, apparently the point was that loading all of it into memory wouldn't be feasible.  We did get some solutions, a lot of brute force attempts but one guy in particular had some very interesting solutions for most of the problems.

I am not quite certain what cutting the problem up into discrete packages gives you, aside from the risk of never finding your duplicate.

Under the altered conditions, some measure must be taken to reduce memory usage. Portej's solution requires, in a pessimistic case, 9.87*10^282 bytes of memory, if not more due to oerhead. This is at best impractical.

I would really like to see the winning solution, though.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: chief1983 on February 10, 2010, 09:32:36 pm
Turns out the winning one might not have been optimal, however it was one of the best approaches to solving it with an algorithm that we saw.  I'll post the PDF from my laptop tomorrow probably.

Also, storing primes for cracking security would essentially be making a primes rainbow table.  Sure it's massive but it's what rainbow tables are all about.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Spicious on February 11, 2010, 02:41:43 am
Also, storing primes for cracking security would essentially be making a primes rainbow table.  Sure it's massive but it's what rainbow tables are all about.
It's just say a few hundred orders of magnitude bigger.

You could always partition the words by starting letter, distribute each somewhere and repeat on subsequent letters until lists are small enough to work with in memory. Sounds like it could work nicely with a language like Erlang or Go.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Bobboau on February 11, 2010, 04:22:10 am
I have a hard drive that can hold one trillion bytes of information, that should get me at least a decent sized list of primes. remember I don't need all primes up to N to factor N, I just need all primes up to sqrt(N), if I have a huge list of primes in a file on my hard drive, I just load one prime try to divide N with it and then move on until I find one that divides evenly.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: castor on February 11, 2010, 11:44:13 am
I am not quite certain what cutting the problem up into discrete packages gives you, aside from the risk of never finding your duplicate.
The point is that if there is a cheap way to tell which of the strings can't be duplicates in any case, we don't have to spend time comparing them against each other (other assumed benefits are the possibility for parallel processing and better cache utilization due to smaller working set). I admit it's not that easy in practice, since managing the memory usage would be quite tough even with the original setting. And if the input would need to be split to separate physical files it'd get slow as hell.  

edit: I quess you could also use an incremental version of the sorting idea; read the first n characters of every string, along with their offsets in the file. Sort, throw away unique lines. Append the next n characters to the remaining strings, sort again, discard unique strings. And so on..
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Spicious on February 11, 2010, 03:08:00 pm
I have a hard drive that can hold one trillion bytes of information, that should get me at least a decent sized list of primes. remember I don't need all primes up to N to factor N, I just need all primes up to sqrt(N), if I have a huge list of primes in a file on my hard drive, I just load one prime try to divide N with it and then move on until I find one that divides evenly.
Keep in mind that for RSA these primes would be on the order of at least 2^512, if not 2^2048. Now, the number of primes less than n is ~n/log n - numbers that make a trillion puny by comparison.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: Bobboau on February 11, 2010, 11:04:55 pm
well lets see here a 2^512 number would need, somewhere round 1.5*10^75 primes. so, yeah I guess your right.
Title: Re: Algorithms Quiz (anyone can give it a shot, please!)
Post by: CP5670 on February 12, 2010, 11:11:20 pm
There is no efficient way to do problem 1 for huge numbers. It amounts to computing prime gaps, and as far as I know the only ways to do that are essentially brute force checks as people brought up earlier. There are some results giving bounds on prime gaps or describing them asymptotically, so you could use them to get a starting estimate or to eliminate some choices, but they are not sharp at all.

If you just want to find large primes, then that's comparatively easier.