Data validation in Ruby with ClassyHash

In this post I will briefly demonstrate the ClassyHash gem, which is a fast data validation gem written in pure Ruby. I recently released version 0.2.0 of ClassyHash, with some cool PRs integrated and quite a few new features. Here’s a quick example:

1
2
3
4
5
6
7
8
9
schema = {
  query: Set.new('repo', 'user'), # Match either 'repo' or 'user'
  values: [[String]], # Match an array of zero or more Strings
}
data = {
  query: 'repo',
  values: ['classy_hash', 'hashformer'],
}
CH.validate(data, schema) # Returns true if the data is valid, raises an error if not

Continue reading to learn about ClassyHash’s history and see a basic ClassyHash validation tutorial.

When to use ClassyHash

ClassyHash is versatile, thanks to its speed. You can use it as your sole validation mechanism, or as a quick sanity check before passing data to something like ActiveRecord. If you are writing an API or message queue that needs to process a lot of requests, ClassyHash can be your front-line defense against accidentally malformed requests and developer mistakes.

Since ClassyHash schemas are plain Ruby Hashes, it’s easy to create a private gem to share schemas across internal codebases. That way you can validate API calls and queued messages against the same schema in all of your projects.

The ClassyHash origin story

At my previous employer I was responsible for integrating an e-commerce site running Solidus with the Oracle E-business Suite used by the rest of the company. One of the tools I created to help with this process is the ClassyHash gem.

The e-commerce site needed to pass a lot of order data into the Oracle system for processing and fulfillment, and shipment updates needed to be returned to the e-commerce site. To make it easier to develop the internal APIs and to prevent accidentally mismatched data formats, I wrote the ClassyHash gem.

Validation needed to be fast, so it could be used everywhere without worrying about overhead. ClassyHash achieves its speed by using built-in Ruby data structures and avoiding memory allocation where possible. Schemas are written in a Ruby DSL, with Hashes to describe Hashes, doubled-up Arrays to describe Arrays, etc.

Validation also needed to be reliable, so ClassyHash has 100% test coverage.

Usage tutorial

To demonstrate writing a ClassyHash schema, we’ll extend the hypothetical example above to support mixed user and repo queries, then write a schema for replies to the extended API. You can find complete documentation of the ClassyHash DSL in the ClassyHash README.

This API design is exaggerated a bit to show off some ClassyHash features.

Request schema

First let’s define separate schemas for User and Repo queries, the two object types that can be queried in our extended API. To write a basic ClassyHash schema for a Hash, we create a Hash containing the keys we expect, with the Class of the values we expect.

1
2
3
4
5
# Expect a :user key with a String value
user_query_schema = { user: String }.freeze

# Expect a :repo key with a String value
repo_query_schema = { repo: String }.freeze

Next, we’ll define a schema for an individual query. We’ll use the multiple-choice feature of ClassyHash, represented as an Array of basic constraints, to allow querying either a User or a Repo. Also new here is the use of TrueClass to represent either true or false. If :verbose is true, we’ll add more data to our reply (defined below).

1
2
3
4
5
6
7
query_schema = {
  # Expect a :verbose key with either true or false
  verbose: TrueClass,

  # Expect a :query key with either a user query or a repo query
  query: [user_query_schema, repo_query_schema],
}.freeze

Finally, we will allow an array of queries to be made. We will use a ClassyHash generator to create an array length constraint, so API users can’t make too many queries at once.

1
2
3
4
request_schema = {
  # Expect a :queries key with an Array of 1 to 10 query objects
  queries: CH::G.array_length(1..10, query_schema)
}.freeze

With our schema complete, let’s suppose we received a JSON request to our API and we want to validate it. We will pass true for the :strict parameter, which will treat any extra keys as an error.

1
2
3
4
5
# Use :symbolize_names to match our schema's use of symbols
request = JSON.parse(body, symbolize_names: true)

# Raises a ClassyHash::SchemaViolationError if request is invalid
ClassyHash.validate(request, request_schema, strict: true)

Reply schema

It can also be useful to validate responses before sending them, to help catch implementation bugs. We’ll follow the same inside-out design process as our request schema to create a reply schema, starting with the definitions for our base objects.

This schema introduces the :optional flag on a multiple-choice constraint. We’ll mark the keys that only get sent for :verbose queries as :optional.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
user_reply_schema = {
  id: Integer,
  username: String,

  repo_count: [:optional, Integer],
  url: [:optional, Integer],
}.freeze

repo_reply_schema = {
  id: Integer,
  reponame: String,

  commit_count: [:optional, Integer],
  head_commit: [:optional, String],
  url: [:optional, Integer],
}.freeze

Since we allow multiple queries, we also need to allow multiple replies.

1
2
3
4
reply_schema = {
  # Expect an Array of either user_reply or repo_reply objects
  replies: [[ user_reply_schema, repo_reply_schema ]],
}.freeze

Validation is the same as the request schema.

1
2
3
4
5
6
7
# An example reply
reply = {
  replies: [{ id: 1, username: 'Mike' }, { id: 456, reponame: 'classy_hash' }],
}

ClassyHash.validate(reply, reply_schema, strict: true)
# => true

Invalid replies will raise an error.

1
2
3
4
5
6
7
8
9
10
# An invalid reply
invalid_reply = {
  replies: [{ id: 1, username: 'Repo?', reponame: 'User?' }],
}

ClassyHash.validate(invalid_reply, reply_schema, strict: true)
# => ClassyHash::SchemaViolationError: :replies[0] is not valid: contains
# members not specified in schema, :replies[0] is not one of a Hash matching
# {schema with keys [:id, :username, :repo_count, :url]}, a Hash matching
# {schema with keys [:id, :reponame, :commit_count, :head_commit, :url]}

Getting ClassyHash

So that’s how you write validation schemas in the ClassyHash DSL. ClassyHash is packaged as a Ruby Gem, distributed through rubygems.org. You can simply add ClassyHash to your Gemfile:

1
gem 'classy_hash', '~> 0.2.0'

Also check out the ClassyHash source on GitHub, with detailed documentation in the README. I no longer work for Deseret Book, but I still check in on Issues and PRs from time to time.