An approximate solution for the Subset Sum problem

Posts: 15
Comments: 31
Trackbacks: 0

<< The subset sum problem | Home | A Cyclic Scheduling Model >>

An approximate solution for the Subset Sum problem

We described Greedy in Part 1. It’s very fast and often quite accurate, but not always accurate enough. So how can we improve on it?

Remember, we can get perfect accuracy by enumerating all of the possible subsets of the numbers, but that number gets out of hand fast. But, look at this list:

{ 1, 3, 4, 6, 9 }

While this list has 32 distinct subsets, it does not have 32 distinct subset sums. The subset { 1, 6 } sums to 7, and the subset { 3, 4 } also adds up to 7. How can we take advantage of this?

Computing all subsets

The LINQ algorithm I posted to calculate subsets is nice, but we need a more iterative way to generate the subsets so we can pole at them in between iterations. Here’s another solution:

        public override List<int> Find(List<int> list, int sum)
        {
            var subsets = new List<IEnumerable<int>> { new List<int>() };
            for (int i = 0; i < list.Count; i++)
            {
                var T = subsets.Select(y => y.Concat(new[] { list[i] }));
 
                subsets.AddRange(T.ToList());
 
                var result = subsets.Find(s => s.Sum() == sum);
 
                if (result != null)
                    return result.ToList();
 
            }
 
            return null;
        }

The idea is to start with the subset with no items, then on each iteration, we take our current list of subsets and add the next number in the list to each subset, to create an additional set of subsets. So if you had the list { 1, 2, 3 }, after the first iteration you’d just have the subset { 1 }. On the next iteration, you’d have {2}, and {1,2} as well as the original { 1 }. On the third iteration, you’d add 3 to all of those sets, so the new sets would be {3}, {1,3}, {2,3}, and {1,2,3}.

Getting rid of duplicates

Let’s take the theory even further and instead of just eliminating subsets that sum to the same amount, we’ll get rid of them even if they’re just close to each other.

We’ll start by getting the subsets ordered by sum:

        public override List<int> Find(List<int> list, int sum)
        {
            var subsets = new List<IEnumerable<int>> { new List<int>() };
            for (int i = 0; i < list.Count; i++)
            {
                var T = subsets.Select(y => y.Concat(new[] { list[i] }));
                var U = subsets.Union(T).OrderBy(k => k.Sum()).ToList();

Next, we’ll eliminate all subsets that sum to within a range of a delta value. We’ll also eliminate any subsets that go over the target sum.

                subsets = RemoveCloseSums(U, _delta).Where( s => s.Sum() <= sum).ToList();
 

Since we’re providing a delta anyway, we’ll go ahead and retrun any subset that gets us delta close to our target. Note that this is what Greedy is doing as well, although in the case of Greedy it just stops looking when it’s out of ideas.

                var result = subsets.Find(s => s.Sum() <= sum && s.Sum() > (1 - _delta) * sum);
 
                if (result != null)
                    return result.ToList();
 
            }
 
            return null;
        }
    }

And that’s it.

Is it any good?

Well, yes and no. Greedy does a pretty bang-up job in most cases that I’ve tried. If you have a lot of little jobs that you can poke in near the end after the big ones are finished, Greedy will find those and put them in when appropriate. But you can fool it. One of the test lists I used was 100 jobs, with times ranging from 200 to 5000, and Greedy had a hard time being too precise with its numbers, while the approximating algorithm was able to come up with a set just as quickly. But paring off sets that were too large helped a lot with that as well. Greedy didn’t much care if my target value was 5000 or 50000, it chose its solution and returned. Approximating took a lot longer to compute the 50000 set.

Still, I didn’t put a lot of effort into optimizing the C# code. Look at all those Sum() calls up there – I’m guessing the compiler isn’t managing to optimize those away, and if not, they’re really going to hurt performance. The next step in speeding up the algorithm would probably be to eliminate those, and then you might be able to compete with Greedy again.

But that is left as an exercise for the reader. Once again, the code is available on GitHub. Happy coding!

Algorithmic Alley

Computing all subsets

Getting rid of duplicates

Is it any good?