In Haskell, one can present a streaming compression/decompression API with lazy bytestrings.

Transcoding

Suppose we want to transcode some data, say, bzip2 to lzip.

bzip2 and lzip both have a C API (via libbz2 and lzlib respectively), viz.

typedef struct { char *next_in; unsigned int avail_in; unsigned int total_in_lo32; unsigned int total_in_hi32;

  char *next_out;
  unsigned int avail_out;
  unsigned int total_out_lo32;
  unsigned int total_out_hi32;

  void *state;

  void *(*bzalloc)(void *,int,int);
  void (*bzfree)(void *,void *);
  void *opaque;

} bz_stream;

BZ_EXTERN int BZ_API(BZ2_bzDecompressInit) ( bz_stream *strm, int verbosity, int small );

BZ_EXTERN int BZ_API(BZ2_bzDecompress) ( bz_stream* strm );

BZ_EXTERN int BZ_API(BZ2_bzDecompressEnd) ( bz_stream *strm );

enum LZ_Errno { LZ_ok = 0, LZ_bad_argument, LZ_mem_error, LZ_sequence_error, LZ_header_error, LZ_unexpected_eof, LZ_data_error, LZ_library_error };

const char * LZ_strerror( const enum LZ_Errno lz_errno );

struct LZ_Encoder;

struct LZ_Encoder * LZ_compress_open( const int dictionary_size, const int match_len_limit, const unsigned long long member_size ); int LZ_compress_close( struct LZ_Encoder * const encoder );

int LZ_compress_finish( struct LZ_Encoder * const encoder ); int LZ_compress_restart_member( struct LZ_Encoder * const encoder, const unsigned long long member_size ); int LZ_compress_sync_flush( struct LZ_Encoder * const encoder );

int LZ_compress_read( struct LZ_Encoder * const encoder, uint8_t * const buffer, const int size ); int LZ_compress_write( struct LZ_Encoder * const encoder, const uint8_t * const buffer, const int size ); int LZ_compress_write_size( struct LZ_Encoder * const encoder );

enum LZ_Errno LZ_compress_errno( struct LZ_Encoder * const encoder ); int LZ_compress_finished( struct LZ_Encoder * const encoder ); int LZ_compress_member_finished( struct LZ_Encoder * const encoder );

unsigned long long LZ_compress_data_position( struct LZ_Encoder * const encoder ); unsigned long long LZ_compress_member_position( struct LZ_Encoder * const encoder ); unsigned long long LZ_compress_total_in_size( struct LZ_Encoder * const encoder ); unsigned long long LZ_compress_total_out_size( struct LZ_Encoder * const encoder );

Then one shuffles data between a buffer, calling the necessary compressors and decompressors above. This becomes untenable: if we also wanted to transcode from bzip2 to lz4, we would have to write glue code calling the Bzip2 and LZ4 APIs, so that for \( n \) compression formats we would have to write \( O(n^2) \) glue procedures.

In Haskell, one can write

transcode :: Lazy.ByteString -> Lazy.ByteString transcode = LZ.compress . BZ2.decompress

To handle all cases while writing a sensible amount of code:

data Codec = BZ2 | Zlib | LZ4 | ...

enc :: Codec -> Lazy.ByteString enc BZ2 = BZ2.encode; enc Zlib = Zlib.encode; enc LZ4 = LZ4.encode; ...

dec :: Codec -> Lazy.ByteString dec BZ2 = BZ2.decode; dec Zlib = Zlib.decode; dec LZ4 = LZ4.decode; ...

transcode :: Codec -> Codec -> Lazy.ByteString -> Lazy.ByteString transcode from to = enc to.dec from

Thus streaming transcoding in Haskell is feasible. Interestingly, the standard way to transcode compressed files is via pipes in the shell, e.g.

zstd -dc src.tar.zst | lz4 - src.tar.lz4

which is lazy but via a clumsier mechanism involving system processes.

I think this unified interface to streaming is an underrated accomplishment. It has been around since 2006 and wrangles some subtleties of effects in Haskell that no strict language could showcase.