Best Data Serialization Formats to communicate between docker containers

1. Needs for better data serialization

Now I’m building web applications that are running on Docker containers. Each containers need to interact with the others. Then, which data format is the best to communicate among them?

Nowadays most of web APIs are using JSON with REST api under HTTP protocol. JSON and XML are readable for humans and it is big advantage of them. But when it comes to the data size, JSON and XML are not good.

So, this time, I’d like to consider another way of data serialization.

2. Data Serialization

Let’s check each of data serialization formats.

2.1 Protocol Buffers

Protocol Buffers is developed by Google, and it’s initially released on 2001. The data size is much smaller than JSON or XML. Many tech companies adopted Protocol Buffers. For example, Google, Square, Netflix, Docker and Cisco and so on.

Schema is intuitive

The schema file is called .proto file. You need to define interfaces in this file. The syntax is simple and intuitive, so it’s more understandable than JSON (JSON ).

After creating the .proto file, you can compile the .proto file and it automatically generate the interface codes. Now you can serialize the data using those codes. It’s so simple.

gRPC expands their scope to web browsers

Furthermore, gRPC is custom RPC protocol based on Protocol Buffers. This is also developed by Google and Google is using this gRPC at their enormous infrastructure. gRPC-web is also just released. It enables web browser to use gRPC with servers. I expect that the use of gRPC will grow more soon.


2.2 Apache Thrift

Thrift is an interface definition language and binary communication protocol used for defining and creating services for numerous languages. (by: wikipedia)

Thrift is developed by Facebook initially, and now it’s developed under Apache. facebook, Cassandra, Twitter and Evernote use Thrift.

With Thrift, it defines schema using IDL (Interface Definition Language) just same as Protocol Buffers.

Also the performance is good. You can refer to some articles:

2.3 MessagePack

MessagePack is a computer data interchange format. It is a binary form for representing simple data structures like arrays and associative arrays. (by: wikipedia)

The data of MessagePack looks similar to JSON, but fast and small. It’s really easy to use and many languages supports.

redis, fluentd, Treasure Data, Pinterest are using this Messagepack to their data serialization.


2.4 FlatBuffers

FlatBuffers is a free software library implementing a serialization format similar to Protocol Buffers, (by: wikipedia)

This FlatBuffers is also developed by Google and released on 2014.

From some websites, the performance of FlatBuffers are better than the others. But as far as I know, the development using FlatBuffers requires more work to implement. So it maybe suitable when you need to handle quite large sized objects.


3. Conclusion: Protocol Buffers / gRPC

Finally, I chose Protocol Buffers with gRPC.

First, it gives a great performance. It can decrease the data size enormously, compared to XML or JSON.

Second, The schema of Protocol Buffers is very intuitive so that it makes easier to figure out the interface immediately. When you confront the API that you didn’t touch more than 3 months, you need to check the schema. Protocol buffers intuitive schema is a major advantage.

Furthermore, many tech companies start to use Protocol Buffers and gRPC, and the documentations and examples are more plentiful than the others.

Also the newly released gRPC-web boosts my decision because it will expand its usecases more into web browsers and front-end.

In conclusion, I & my team decided to go with Protocol Buffers / gRPC. If I found anything regarding to Protocol Buffers, I’ll share it here👍

by @takp