Developing Robust Authentication Using Rails, Omniauth, and Okta
Alto Pharmacy aims to deliver a better pharmacy experience for everyone who needs medication. To achieve that, we need to enable three different types of users:
- Individual customers who need prescriptions
- Healthcare providers: the nurses, doctors, physicians assistants, and other medical professionals who prescribe medications
- Operations users: the Altoids who intake prescriptions, bill insurance, fill prescriptions, and more
Each type of user has a different product or set of products we’ve built for them. From within each product, we collect, transmit, process, and store a tremendous amount of sensitive information. However, our biggest area of security, privacy, and compliance risk is our internal pharmacy management tool called Wunderbar that is used by our operations (ops) users every day to process prescriptions. Securing access to Wunderbar is mission-critical to protect our data and our business, and, of course, we need our super-secure implementation to be super-smooth to avoid slowing down our hard-working ops users.
What was the problem before?
We wanted to be able to change our Wunderbar session lifetimes without hampering either our security efforts or the usability of our product. Moreover, we wanted session lifetimes of other apps to be able to change appropriately with business risk. Before starting this project, we’d implemented a Google OAuth 2.0 authorization flow that allowed ops users to log into Wunderbar via Google and take advantage of transparent session refresh for a limited amount of time.
However, while Google Workspaces was (and is) great for our email, documents, and calendars, it isn’t as configurable or extensible as a proper SSO solution. One painful side effect of this limitation was that, in order to ensure our refresh sessions were limited to twelve hours in Wunderbar, any user of the Alto Gmail drive also had to log back in every twelve hours just to see email — even though there’s not nearly the same level of sensitive data in email.
Plus, our IT team had already picked the tool of the future: Okta. And the goal was to enable SSO, or Single Sign On: if we still used Google for SSO, it wouldn’t really be SSO!
It was clear we needed to move to Okta for authenticating in Wunderbar, too.
Okay: how do we authenticate with Okta?
Okta is a big company with many different authentication mechanisms. For us, the big question was really: do we want to authenticate via SAML or Open ID Connect (OIDC)? There are a lot of great resources around comparing these two standards and what they’re good at; here’s a good chart comparing and contrasting parts of these two authentication flows:
From this Medium post, which is a great primer on the differences
For us, the most attractive thing about SAML is that it can be used to provision user accounts from your upstream IDP; for example, Okta could be configured to push username, email, group, role, etc., information to our platforms with which we could automatically provision user accounts.
In the end, we decided the OIDC implementation would provide for a very similar user experience to that which our operations users were already accustomed to, with the same capability to perform transparent token refresh.
How were we authenticating before?
This transition was pretty easy because we were moving from Google OAuth 2.0 flow to Okta’s OIDC flow. OAuth 2.0 and OIDC are very similar; just know that OAuth 2.0 is technically an authorization framework (once you have a user, what can they access) vs an authentication framework (actually identifying and validating the user). However, there were some odd things about our previous implementation, and I only had one person to blame for those choices: myself. It’s great to get the opportunity to re-implement our authentication system — in a fast moving startup, you rarely get the chance to revisit problems so soon!
The biggest issue at hand with our previous Google flow was the way in which I’d chosen to re-authenticate users whose Wunderbar sessions had expired. In this case, OAuth provides a handy mechanism: a refresh token. Here’s how access tokens and refresh tokens are exchanged from the client and the authentication server in OAuth 2.0 flows:
From these WSO docs
At the time, we had session lifetimes of 60 minutes when accessing Wunderbar. These cookies also lasted 60 minutes. We faced a problem: if somebody walked away from their laptop for 61 minutes after authenticating, when they came back, how could we make it so they didn’t have to go back through the Google OAuth 2.0-initiated login flow from the beginning?
My solution at the time persisted a secure cookie with the refresh token value client-side with a longer expiration length. Then, if the user didn’t have a valid access token, we’d simply present the refresh token back to the server and get a new access token all over again. This worked well, but exposing secrets to the client always presents a risk of compromise. Refresh tokens are longer-lived, so they're of even higher value to protect!
How was the implementation?
Okta’s documentation made it very easy to stand up a simple authentication server and set up a local development server. Okta has a great blog post guide with tools we were already using (Rails and Omniauth, specifically) to get up and running. Beyond that, our biggest challenge was ensuring that we had a smooth rollout.
There was an issue in how we’d test and use this flow in our staging environments, however. At the time we were working on this (early 2020), there were two issues:
- Our sandbox environment URLs were non-deterministic and could be varied depending upon how an engineer user named their cluster.
- Okta apps did not support wildcard redirects in their oauth redirect URLs. This controls where, upon hitting Okta’s servers, they should redirect your payload to handle the meat of authentication.
For a time, our clever infrastructure engineering team managed around these issues with the use of an intermediary keycloak server. However, Okta now supports wildcard redirects that we’re successfully using today:
The biggest functional change to our authentication flow was in our refresh flow. The biggest challenge in landing this change was in how we deployed it to end users in production.
How has our refresh flow changed?
We didn’t want to expose the refresh token because it presented too much risk. What do we do instead? Something a little bit risky, but that we preferred after threat modeling our scenarios. Security engineering is often about making tradeoffs given a number of constraints. Today, our session lifetimes are down to fifteen minutes, and our refresh lifetime for an active session (e.g. somebody making at least one request within this window) is 60 minutes. If somebody comes back after their access token expires, that’s fine; we’ve persisted the encrypted refresh token to the database, and we can use attributes from the expired but still cryptographically valid token in order to look that refresh token up.
This was crucial for Okta especially because Okta refresh tokens are valid for 100 days by default!
How did we roll out this change?
Disruptive change is hard. We didn’t want to disrupt the workday of our hard-working ops users who are caring for our many customers and provider partners. The biggest intent here was to keep a safety blanket fallback to the old authentication flow until we were quite confident in its functionality. This broke down into a multi-phase approach:
1. Dark-launch the new Okta flow hidden behind a URL and with a small allow-list of users (mostly just the Security team) assigned to the new application in Okta.
2. Ask for volunteers! These volunteers would go about their normal work day in a variety of functional areas (customer care, pharmacy, fulfillment, engineering) using Okta instead of Google.
3. Change the default login flow to use Okta, but keep Google working just fine as a secondary action hidden behind a URL.
4. Eventually, remove Google entirely.
This allowed us to always ensure that we didn’t disrupt peoples’ days too much!
What did we learn?
We achieved our goals and now, nearly two years later, we’re successfully using Okta’s OIDC flow within our core pharmacy management app. What did we learn in the process?
- Not every API is necessarily fully implemented in every client. At the time of implementation, Okta’s Ruby client wasn’t fully supported. In order to support our refresh flow, we needed to upstream a small change into the client library, which required a little investigation.
- Exposing secrets involves risk! In our case there wasn’t a glaring need to expose refresh tokens client-side at all.
- Slow rollouts are great, especially with disruptive changes! A big thank-you to our team members who were willing to be part of the experiment group last year when this rolled out.