Writing Avro to Parquet

Niels Basjes, 14 December 2015

Today I tried to write records created using Apache Avro to disk using Apache Parquet in Java. Although this seems like a trivial task I ran into a big obstacle which was primarily caused by a lack in my knowledge of how generics in Java work.

Because all constructors are either deprecated or package private the builder pattern is the way to go. So I simply tried this (assisted by the fact that IntelliJ ‘suggested’ this):

Path outputPath = new Path("/home/nbasjes/tmp/measurement.parquet");
ParquetWriter<Measurement> parquetWriter =
    AvroParquetWriter<Measurement>.builder(outputPath)
        .withSchema(Measurement.getClassSchema())
        .withCompressionCodec(CompressionCodecName.GZIP)
        .build();

This fails to compile with terrible errors and does not yield an understandable reason why it does so.None of the unit tests in the Parquet code base use the builder pattern yet; they are still on the methods that have been deprecated.

Looking at the source of the class that needs to do the work you’ll see this:

public class AvroParquetWriter<T> extends ParquetWriter<T> {
  public static <T> Builder<T> builder(Path file) {
    return new Builder<T>(file);
  }

After talking to a colleague of mine he pointed out that calling a static method in generic has a slightly different syntax. Apparently calling such a method requires the following syntax (which works like a charm):

Path outputPath = new Path("/home/nbasjes/tmp/measurement.parquet");
ParquetWriter<Measurement> parquetWriter =
   AvroParquetWriter.<Measurement>builder(outputPath)
       .withSchema(Measurement.getClassSchema())
       .withCompressionCodec(CompressionCodecName.GZIP)
       .build();

Let me guess; You don’t see the difference? It is the ‘.’ before the call to the static builder method.

So instead of this

AvroParquetWriter<Measurement>.builder(outputPath)

you must do this

AvroParquetWriter.<Measurement>builder(outputPath)