For batch tasks it is quite common to need to browse a full table. Depending the table it can be done in memory without thinking much or it can be too big and needs pagination.
A common solution was to use a kind of PageResult object which was representing the current page and its index and let the client/caller iterating over PageResults.
With java 8 streams the API can be more concise and efficient.
NOTE: the solution of this article works until the caller recreates a Collection from the Stream which would likely be a misusage of the API but is not forbidden by default – you need to write your own Stream wrapper for that.
The idea to provide a Stream representing the full set of data is to compute the number of pages and then flatMap each page to its corresponding stream which will give us an aggregated Stream.
If we have a Person entity and we suppose we have the obvious named queries “Person.countAll” and “Person.findAll” then we can write our Stream this way:
final int total = entityManager.createNamedQuery("Person.countAll", Number.class).getSingleResult().intValue(); final double ratio = total * 1. / pageSize; return IntStream.range(0, (int) (ratio == (int) ratio ? ratio : 1 + Math.floor(ratio))) .mapToObj(pageIdx -> entityManager.createNamedQuery("Person.findAll", Person.class) .setFirstResult(pageIdx * pageSize) .setMaxResults(pageSize)) .flatMap(q -> q.getResultList().stream());
We start by computing the number of page and iterate over them thanks to an IntStream (kind of for (i =0; i < maxIteration; i++)) then we convert each page index to the findAll() query corresponding to the page using JPA pagination API (setFirstResult() and setMaxResults()) and finally we execute the query and aggregate all the queries in a final Stream.
NOTE: calling the previous code will not execute anything since there is no leaf to the Stream.
If you add a forEach() at the end of this code you will see you browse the full dataset and if you add some debug logs you will see you browse it by page – ie query then iteration over each element of the query then next query etc… – and not loading the full dataset in memory which was our objective :).
TIP: if you want to edit these entities think to wrap the query in a transaction *by page* otherwise you will use a global transaction which can have a disaster effect.