Tuesday, August 26, 2008

Basic Set operations in Java

How do we work with Sets in Java when it comes to performing the basic operations: Union, Intersection, Subset, Complement, Cartesian Product (for which there is no standard Java method in the Set Interface)? Do we have some kind of a collection that maps this mathematical concept into Java? Yes we do.

The Set Interface models the mathematical concept of a set and it is a Collection that contains no duplicate objects. Before talking about the specific methods that we will use, you must be aware of the three most used Set implementations: HashSet (uses a hash table to store its elements), LinkedHashSet (HashSet that also maintains a doubly-linked list running through all of its entries), and TreeSet (a sorted set).

Assume we have two sets s1 and s2. Going back to the basic set operations, we have:
- Union: the union of sets s1 and s2 is a set whose elements are contained in either set s1 or s2.

Java: s1.addAll(s2) - adds all elements from s2
in s1 if they are not already present.

- Intersection: the intersection of two sets s1 and s2 is a set whose elements are in both s1 and s2.
 
Java: s1.retainAll(s2) - s1 becomes the intersection of
s1 and s2.

- Subset: s2 is a subset of s1 if all elements of s2 are also elements of s1.

Java: s1.containsAll(s2) - returns true if s1 contains
all of the elements of s2.

- Complement: the complement of s2 in s1, denoted s1 - s2, is a set of all elements that are members of s1 but not of s2.

Java: s1.removeAll(s2) - removes from s1 all the
elements that are also contained in s2.

- Cartesian Products: the Cartesian product of two sets s1 and s2 is the set of all ordered pairs (x, y), where x is an element in s1, and y is an element in s2.

Java: public static Set computeCartesianProduct(Set s1,
Set s2) {
Set
result = null;
if (s1 == null || s2 == null) {
throw new IllegalArgumentException();
}
if (s1.isEmpty()) {
return s2;
}
if (s2.isEmpty()) {
return s1;
}

result = new HashSet
();
for (Object obj1 : s1) {
for (Object obj2 : s2) {
result.add("(" + obj1 + "," + obj2 + ")");
}
}

return result;
}


UPDATE (04/30/09): For the Cartesian Product operation above, I have used a raw type for unknown set element type. As of Java 1.5, you should consider using generics. For example, you should use an unbounded wildcard type for the input parameters of the computeCartesianProduct method, making the above code type safe and more flexible. This implies that instead of using the declaration Set s1, you should actually have Set<?> s1.

5 comments:

Kenny Yu said...
This comment has been removed by the author.
Kenny Yu said...

Excellent exposition of using java api for set operations.
If both sets are ordered, either by hash or via a comparator in the case of treeset, the operations should be fast. They cold even avoid using for-loops if they make use of the ranges. For example, if set A has 1-100 and set B has 90-200, you can start looping at 90 to get the union.

IlPuccio said...

Gud article.
I need a way to hanlde ranges and perform the operation you described in this post.
Can you give me some suggestions ? Are there classes which implement ranges ?

Mihai Fonoage said...

@IlPuccio: Look at the Range-view Operations section of Sun's SortedSet Interface tutorial here.

Kenny Yu said...

java.util.Collection
provides methods for set operations:
addAll() union
removeAll() difference
retainAll() intersect

SortedSet.subSet() give you a range.