Publishing Avro

Avro is a compact, binary data format widely used in BigData world. It is recommended that all topics use this format, as it lowers volume of data sent to Kafka and is easy to analyze when dumping data from Kafka to Hadoop.

Conversion

Publisher can publish either JSON or Avro to Avro topic. This allows legacy systems, that might not have Avro support, publish data to Hermes. Message content type is read from HTTP Content-Type header. If incoming message has application/json content type, it will be treated as JSON and converted to Avro using specified schema.

Validation

Each incoming message is validated against schema. Any message that fails to pass validation is dropped and publisher receives 400 Bad Message status with validation error details.

Schema

Each Avro topic must have Avro schema defined.

Defining a good schema for complex data structures might be bit of a trial & error process, thus we prepared a tool that helps you validate the schema locally before uploading it to Hermes.

Create base schema

Create schema for your data by following Avro schema spec. Schema requires two special fields that identify it:

  • namespace: set to Hermes group name by convention
  • name: set to Hermes topic name by convention

Hermes appends metadata to each event, which has to be reflected in defined schema as well. Thus for each schema please append the following field definition. You should not modify published JSON. __metadata field defaults to null.

{
  "name": "__metadata",
  "type": [
    "null",
    {
      "type": "map",
      "values": "string"
    }
  ],
  "default": null,
  "doc": "Field used in Hermes internals to propagate metadata like hermes-id"
}

Example schema:

{
  "namespace": "tech.hermes.group",
  "name": "topic",
  "type": "record",
  "doc": "This is a sample schema definition for some Hermes message",
  "fields": [
    {
      "name": "id",
      "type": "string",
      "doc": "Message id"
    },
    {
      "name": "content",
      "type": "string",
      "doc": "Message content"
    },
    {
      "name": "tags",
      "type": { "type": "array", "items": "string" },
      "doc": "Message tags"
    },
    {
      "name": "__metadata",
      "type": [
        "null",
        {
          "type": "map",
          "values": "string"
        }
      ],
      "default": null,
      "doc": "Field used in Hermes internals to propagate metadata like hermes-id"
    }
  ]
}

Validating schema

To validate created schema use Avro schema validator. Ready-to-use JAR file can be downloaded from current release.

java -jar validator/build/libs/json2avro-validator.jar -s sample-schema.avcs -i sample-message.json -m json2avro2json

This command will validate given message against the schema, convert the message from JSON to Avro and back from Avro to JSON, so you can see if the formatters got everything right.

Setting topic schema

Please refer to the schema repository configuration section in order to use Avro schemas in Hermes. Having up and running schema repository, send a POST request with topic's schema to Hermes management endpoint:

curl -X POST -H "Content-type: application/json" --data {schema} /topics/{topicName}/schema

Schema versioning

A very important property of the Avro standard is schema backward compatibility. Each update to schema MUST be backward compatible. There might be multiple versions of schema for given topic defined in schema repository). Hermes always uses the latest schema version.

In case you need to break backwards compatibility, you should create new topic which defines new schema and plan deprecation of old topic, including migrating subscribers from one topic to the other.

Complex schema example

Writing a good Avro schema might be challenging for complex messages. Below is an example of schema that uses complex records, optionals, maps and arrays.

{
    "namespace": "tech.hermes.group",
    "name": "topic",
    "type": "record",
    "doc": "This is a sample complex message schema",
    "fields": [
        {
            "name": "optionalField",
            "default": null,
            "type": [
                "null", {
                    "name": "optionalField_type",
                    "type": "record",
                    "fields": [
                        {
                            "name": "someField",
                            "type": "string"
                        }
                    ]
                }
            ]
        },
        {
            "name": "mapField",
            "type": "map",
            "values": "string"
        },
        {
            "name": "complexMapField",
            "type": "map",
            "values": {
                "name": "complexMapField_type",
                "type": "record",
                "fields": [
                    {
                        "name": "stringField",
                        "type": "string"
                    },
                    {
                        "name": "intField",
                        "type": "int"
                    }
                ]
            }
        },
        {
            "name": "arrayField",
            "type": "array",
            "values": "int"
        },
        {
            "name": "complexArrayField",
            "type": "array",
            "values": {
                "name": "complexArrayField_type",
                "type": "record",
                "fields": [
                    {
                        "name": "stringField",
                        "type": "string"
                    },
                    {
                        "name": "intField",
                        "type": "int"
                    }
                ]
            }
        }
    ]
}