How did I develop a crypto service

This is a project I joined on half way, and got it done eventually.

What is this project about?

Our product is an on-premise platform. Customer can deploy bits in their own data center, or in AWS, Azure, and GCP.

There are two categories of passwords within our cluster:

  • User input passwords, which are entered by users

    • AWS key
    • smtp passwords
    • Windows runas user password
  • System managed passwords, which are randomly generated by our system

    • jdbc password
    • Postgres admin password
    • ZooKeeper password

So the passwords I’m talking about is not the password for someone’s server account. The latter ones live in a table of the cluster’s PostgreSQL database and are already encrypted.

The current version of our system stores all the passwords in plain text in a configuration file, which is distributed among machines.

The new architecture also stores all passwords in plain text, but in a Znode in ZooKeeper.

The project is about developing a crypto service that encrypts passwords before persisting them somewhere, and decrypts passwords into memory when server uses them, so that no more clear text secrets live in our system. The new service should support both our old monolith JRuby app and our new Java service-oriented architecture.

Why are we developing this feature?

Our product is for enterprise customers, and the top 3 things that enterprise IT cares about are SECURITY, SECURITY, SECURITY.

According to our sales team, keeping secrets in clear text has been one of the biggest sales blockers. Our customers have also been bringing up many times that either their huge concerns w.r.t the potential password leaks, or the friction they’ve experienced to buy our product - they need to get an exception from IT since our product is the only app in house that don’t encrypt passwords.

Trade-offs

There’s no doubt that we are gonna build the crypto service for new Java architecture, a service-oriented server architecture that will come out in 2017. The biggest trade-off we faced was whether to put efforts on enabling this feature in our old monolith JRuby server app.

The downsides of doing that are:

  • Lots of efforts required. Our old server architecture is a monolith JRuby app with messy logic (yeah, everyone agrees on it! It’s totally messed up because people have been keeping on adding new stuff here and there for 9 years since 2007). The crypto service will be written in Java and incorporate it into old server can be really hard.
  • Uncertain schedule. Because of the large amount of work expected, no one is certain whether we can finish this feature, fully test it, and ship it on time. If it comes out too late, say only three months before new server gets shipped, it’s probably much less valuable.
  • We are not sure if it worths it. If our customers have been living with plain text passwords for 9 years, they probably can wait for the new server for another six months?
  • Developer frastruction. Not only because it’s a messy job, but also because you know your code will be deleted a year later.

The upsides of doing that are:

  • Removes the sales blocker as soon as possible
  • Relieves customers pain and complaints as soon as possible
  • Or at least demonstrate to our customers that we have heard of their requirements and we are executing on it (this is sounds very much self-comforting, but it’s been brought up serveral time. Yeah, it’s all about ATTITUDE)

The team eventually decided to do it.

Design

Compared to the complicated integration of crypto service and our ruby app, the design of crypto service itself is actually much more straightforward.

The core idea is:

Encrpt actual secrets with an encryption key, and encrypt the encryption key with a master key

Architecture

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
File System / Config Dir / Config File
------yml file
| inputs
| - master key algorithm
| - encryption key algorithm
| - keystore location
|
| generated values
| - master key id
| - encryption key
| - encryption key id
------

File System / Config Dir / Java KeyStore (JKS)
----------------
| Master key |
----------------

Bootstrap

  1. Provide the following three args:

    • master key algorithm
    • encryption key algorithm
    • keystore location
  2. Use master key algorithm to create a master key, and put it in Java Keystore (JKS) in keystore location. JKS will be in server’s config directory, and will be guarded by the dir’s access permission.

  3. Create encryption key using encryption key algorithm, encrypt it with master key, and put the encrypted value in config file.

  4. Encrypt all existing passwords in config file

When encrypting passwords

Whenever a new key-value pair is requested to be put into config file:

  1. Check if the key is in our encryption list. If true, proceed; otherwise, break out and just put them as clear text in config file
  2. Get encrypted encryption key value, decrypt it with master key to get the actual encryption key. You need to have access to the JKS file to read master key.
  3. Encrypt passwords with encryption key

When decrypting passwords

  1. Get encrypted encryption key value, decrypt it with master key to get the actual encryption key. You need to have access to the JKS file to read master key.
  2. Decrypt passwords with encryption key

Key-rolling

I made key rolling an atmoic transaction, so that it can recover from any failure.

  1. Take a backup of the current config file and the JKS, put them in a temp dir which shares the same access permission as the config dir
  2. Roll all server-managed passwords, encrypt them with existing crypto service, and persist new values in config file
  3. Decrypt all passwords (both server-managed and user-managed) in config file, and hold them as clear text in memory
  4. Force creating a new JKS with new master key
  5. Create a new encyption key
  6. Reinitialize crypto service with new crypto properties
  7. Use the new crypto service to encrypt all password held in memory
  8. Put all new crypto properties and all encrypted passwords into the config file
  9. If any error happens, force restoring the config file and the JKS in your backup

When to roll keys

Here are some best practices I recommend:

  • When a worker machine is removed/decommissioned from your cluster
  • Every a few months
  • Whenever the admin think is necessary

My Role in this project

I played two major roles in this project

1. Fireman

I was not an orginal team member of this project. I joined this project to help put off fire.

2. One of the Key Contributor

I am one of the main contributors, and finish the following works:

  • crypto service bootstapping in both old Jruby app and new service-oriented Java app
  • key rolling
  • most Java-Ruby integration
  • enabling unit and integration tests
  • all customer-facing commands

Most Important Stuff I learned from this project

Project Management:

  • Always keep an eye on the project schedule, and don’t be shy to alert everybody when you cannot finish your work on time

Tech:

  • Learned quite a lot crypto architecture and related knowledge
  • Learned a lot Ruby and JRuby