In Haskell, one can present a streaming compression/decompression API with lazy bytestrings.
Transcoding
Suppose we want to transcode some data, say, bzip2 to lzip.
bzip2 and lzip both have a C API (via libbz2 and lzlib respectively), viz.
typedef
struct {
char *next_in;
unsigned int avail_in;
unsigned int total_in_lo32;
unsigned int total_in_hi32;
char *next_out;
unsigned int avail_out;
unsigned int total_out_lo32;
unsigned int total_out_hi32;
void *state;
void *(*bzalloc)(void *,int,int);
void (*bzfree)(void *,void *);
void *opaque;
}
bz_stream;
BZ_EXTERN int BZ_API(BZ2_bzDecompressInit) (
bz_stream *strm,
int verbosity,
int small
);
BZ_EXTERN int BZ_API(BZ2_bzDecompress) (
bz_stream* strm
);
BZ_EXTERN int BZ_API(BZ2_bzDecompressEnd) (
bz_stream *strm
);
enum LZ_Errno { LZ_ok = 0, LZ_bad_argument, LZ_mem_error,
LZ_sequence_error, LZ_header_error, LZ_unexpected_eof,
LZ_data_error, LZ_library_error };
const char * LZ_strerror( const enum LZ_Errno lz_errno );
struct LZ_Encoder;
struct LZ_Encoder * LZ_compress_open( const int dictionary_size,
const int match_len_limit,
const unsigned long long member_size );
int LZ_compress_close( struct LZ_Encoder * const encoder );
int LZ_compress_finish( struct LZ_Encoder * const encoder );
int LZ_compress_restart_member( struct LZ_Encoder * const encoder,
const unsigned long long member_size );
int LZ_compress_sync_flush( struct LZ_Encoder * const encoder );
int LZ_compress_read( struct LZ_Encoder * const encoder,
uint8_t * const buffer, const int size );
int LZ_compress_write( struct LZ_Encoder * const encoder,
const uint8_t * const buffer, const int size );
int LZ_compress_write_size( struct LZ_Encoder * const encoder );
enum LZ_Errno LZ_compress_errno( struct LZ_Encoder * const encoder );
int LZ_compress_finished( struct LZ_Encoder * const encoder );
int LZ_compress_member_finished( struct LZ_Encoder * const encoder );
unsigned long long LZ_compress_data_position( struct LZ_Encoder * const encoder );
unsigned long long LZ_compress_member_position( struct LZ_Encoder * const encoder );
unsigned long long LZ_compress_total_in_size( struct LZ_Encoder * const encoder );
unsigned long long LZ_compress_total_out_size( struct LZ_Encoder * const encoder );
Then one shuffles data between a buffer, calling the necessary compressors and decompressors above. This becomes untenable: if we also wanted to transcode from bzip2 to lz4, we would have to write glue code calling the Bzip2 and LZ4 APIs, so that for \( n \) compression formats we would have to write \( O(n^2) \) glue procedures.
In Haskell, one can write
transcode :: Lazy.ByteString -> Lazy.ByteString
transcode = LZ.compress . BZ2.decompress
To handle all cases while writing a sensible amount of code:
data Codec = BZ2 | Zlib | LZ4 | ...
enc :: Codec -> Lazy.ByteString
enc BZ2 = BZ2.encode; enc Zlib = Zlib.encode; enc LZ4 = LZ4.encode; ...
dec :: Codec -> Lazy.ByteString
dec BZ2 = BZ2.decode; dec Zlib = Zlib.decode; dec LZ4 = LZ4.decode; ...
transcode :: Codec -> Codec -> Lazy.ByteString -> Lazy.ByteString
transcode from to = enc to.dec from
Thus streaming transcoding in Haskell is feasible. Interestingly, the standard way to transcode compressed files is via pipes in the shell, e.g.
zstd -dc src.tar.zst | lz4 - src.tar.lz4
which is lazy but via a clumsier mechanism involving system processes.
I think this unified interface to streaming is an underrated accomplishment. It has been around since 2006 and wrangles some subtleties of effects in Haskell that no strict language could showcase.