Workaround for protobuff buffersize limitation

There is a data size constraint on protobuff, as it’s using `int` type for bufferSize, which limits the maximum value to serialize to 2GB.

There are two possibly workaround, which are basically same concept:

flush to the same stream by batch

as an example, it could either batch per 1 million rows, or if the data size is above 268 Mb

            while (rs != null && rs.next()) {
                models.addModels(..newBuilder().set...(rs.getString("..")...)
                        .build());


                if(++rowcount > 1_000_000){
//                if(rowcount > 1_000_000 || models.build().getSerializedSize() > Math.pow(2,28)){
                    rowcount=0;

                    //flush by batch
                    try (FileOutputStream fos = new FileOutputStream(Constants.MODEL_PB_FILE, true)) {
                        model.build().writeTo(fos);
                    } catch (FileNotFoundException e) {
                        e.printStackTrace();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                    model.clear();
                }
            }

alternatively, this could be pushed and read by batch from different streams.

                if(++rowcount >= 1_000_000){
                    rowcount=0;

                    //flush by batch
                    try (FileOutputStream fos = new FileOutputStream(CACHE_FILE + currentFileIndex++, true)) {
                        models.build().writeTo(fos);
                    } catch (FileNotFoundException e) {
                        e.printStackTrace();
                    } catch (IOException e) {
                        e.printStackTrace();
                    }
                    models.clear();
                }

same for read

            Files.list(Paths.get(Constants.CACHE_FILE_DIR + File.separator+Constants.PB_FILE)).filter(Files::isRegularFile)
                        .map(Path::toFile)
                        .filter(file -> file.getName().startsWith(Constants.PB_FILE))
                        .parallel().map(file -> readFile(file))
                        .reduce(....)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s