There's was a stackoverflow question to optimized this code from a computation puzzle:
lli exchange(lli n)
{
if(n<12) return n;
else
{
lli sum=0;
sum+=max(n/2,exchange(n/2));
sum+=max(n/3,exchange(n/3));
sum+=max(n/4,exchange(n/4));
return sum;
}
}
I worked out that you could unroll two levels of the recursion, combining like terms. The max function turned out to never be needed: Exchange n is always at least n. After that I unrolled three levels of recursion. Then i said fukkit and wrote a recursive function that outputs the unrolled function for n levels of recursion. The function it outputs is thousands of lines long, but it's really fast.
BTW here's the code for outputing the function to speed up the above:
std::vector<std::map<lli, lli>> map1;
map1.push_back(std::map<lli, lli>());
map1[0][2] = 1;
map1[0][3] = 1;
map1[0][4] = 1;
const int unrolled_levels = 20;
for (int level = 1; level < unrolled_levels; ++level)
{
map1.push_back(std::map<lli, lli>());
for (auto i = map1[level - 1].begin(); i != map1[level - 1].end(); ++i)
{
map1[level][(*i).first * 2] += map1[level - 1][(*i).first];
map1[level][(*i).first * 3] += map1[level - 1][(*i).first];
map1[level][(*i).first * 4] += map1[level - 1][(*i).first];
}
}
int level = unrolled_levels - 1;
std::cout << "\tlli exchange_opt(lli n) // unroll" << level << "\n\t{\n\n";
for (int inner_level = level; inner_level >= 0; --inner_level)
{
lli mult = 12;
std::cout << "\t\tif (n >= 12LL ";
for (auto i = 0; i < inner_level; ++i)
{
std::cout << " * 4LL";
mult *= 4LL;
}
std::cout << ") // " << (mult) << "\n\t\t{\n";
std::cout << "\t\t\tlli sum = 0;\n";
for (auto i = map1[inner_level].begin(); i != map1[inner_level].end(); ++i)
{
std::cout << "\t\t\tsum += exchange_opt(n/" << (*i).first << "LL)";
if ((*i).second > 1) std::cout << " * " << (*i).second << "LL";
std::cout << "; \n";
}
std::cout << "\t\t\treturn sum;\n";
std::cout << "\t\t}\n";
}
std::cout << "\t\treturn n;\n";
std::cout << "\n\t}\n\n";
Basically you run this, then cut and paste the output into your code, and it gives you a lot of speedup. For example if the original number is 1 million, you get 7000 times speedup, for 1 billion, you get 40000 times speed up. Just be aware: the optimized function is almost 2000 lines of code, and is optimized for long int numbers up to about 4 trillion. (I'm trying 1 trillion now, but I don't know if that would finish in a reasonable time).