How We Used Kafka to Build an Event-Driven Automation Pipeline and Scale Our Pharmacy
Imagine this: You're at the doctor’s office, and your doctor just wrote you a new prescription. Minutes later, before you even check out with the receptionist, you already have a notification from Alto Pharmacy on your phone that your prescription is ready. All you have to do is pick the date and time that works best for your free courier delivery.
This is the future we’re building at Alto: a future where a prescription can be transcribed, billed, and verified by a pharmacist in five minutes, flat. To get there without dramatically increasing Alto’s head count, we need to automate much of the process, starting with the first step: transcribing, or intaking, a prescription.
What is intake?
As a pharmacy, we receive prescriptions in many different forms. Some are sent electronically, in a structured format, but others are handwritten, phoned in, or even faxed. To represent all of these forms in a single table, we previously required every incoming prescription to be transcribed by an Alto team member, who would answer questions like:
- Whom is this prescription for? Is this customer already in our system?
- Where did this prescription come from? Who wrote it?
- What is the relevant prescription information (medication to dispense, quantity, days supply, directions, etc.)?
This manual process is both time-consuming and prone to human error, which, if uncaught, could impact our ability to process the prescription downstream. Furthermore, at our busiest times, it could take up to two hours for a team member to get through the pending intake queue — unacceptable if we hope to have prescriptions ready to schedule before people even leave their doctor’s office.
Automating intake required close collaboration between three teams:
- Data Science, who built the decision-making algorithms
- Data Engineering, who served up the results of those algorithms
- Product Engineering, who used those results to automatically intake prescriptions
To allow the teams to operate in parallel without toe-stepping — and in the language most suitable for each (the Data Science and Data teams prefer Python, while product engineers work mainly in Ruby) — we separated the code into two environments:
1. Data Prod Interface (DPI), where the code for generating automation results lives
2. Wunderbar (our proprietary pharmacy operations platform), where the code for requesting and using those automation results lives
The key question: to establish robust communication between these two environments, should we use an API-based approach, or an event-driven one? Each has its pros and cons.
- Legacy approach everyone at Alto is familiar with
- Existing enforcement of strong types and data
- Difficult to scale with increased data & requests
- Requires ongoing authorization / authentication
- Infrastructure relevant to other business areas
- Out of the box AWS solution
- Automatic retries, reprocessing, and queueing
- New and will have a learning curve
- Longer to implement
Our existing infrastructure currently utilizes a hefty API-based structure, so there was a natural tendency to go down this path, but we knew that beginning to migrate services towards an event-based architecture with Kafka at its heart would help us in the long term.
Breaking new ground with Kafka
Kafka is a framework implementation developed by LinkedIn, now open-source with the Apache Software Foundation. It aims to provide a unified, high-throughput, and low-latency platform for handling real-time data feeds.
Alto uses Amazon Web Services as our cloud provider, and luckily, they offer an out-of-the-box managed solution for Kafka, MSK. To configure Kafka with our system, our infrastructure team worked with AWS to set up managed instances to comply with our business requirements. Our Infrastructure team will share more details about this work in an upcoming blog post.
Kafka offers many benefits. The asynchronous nature of the message system allows for retries, reprocessing, and queueing as well as easy recovery from outages. In addition, the infrastructure work done for setting up the managed cluster would be a one-off workstream. When complete, we’d be able to re-use the same setup and configurations across various business needs to streamline other projects that leverage Kafka.
Setting up for success: defining a strict contract
One element we had to get right from the start was the data contract. As we send messages between DPI and Wunderbar, we need to ensure messages are produced with the correct schema, so that downstream consumers know what to expect.
Protocol Buffers, or Protos, is an open-source, cross-platform data format used to serialize structured data. Developed by Google, Protos allow you to define how you structure your data once and then use generated code to easily write and read the structured data in a variety of different languages.
For this project, we turned again to Protos to define the schema for messages produced by DPI and Wunderbar. We defined IntakeAutomationRequest and IntakeAutomationResult messages in Go and generated the types (Sorbet, Flow, etc.), which both DPI and Wunderbar now refer to when producing or consuming messages.
Producing and consuming messages
Because Wunderbar is already configured to run on AWS and houses our existing intake logic, it was easy enough to send a prescription to DPI for automation. Where we would normally create a task for an operations team member to manually transcribe the prescription, we instead produce a message to our “request automation” Kafka topic using the ruby-kafka library.
Our consumer, instantiated in an executable that runs on a dedicated worker, consumes messages from a “receive automation” Kafka topic. First, the consumer verifies that a given message corresponds to a prescription we previously sent to DPI. Then, it persists any automation results included in that message. If the automation results are sufficient, we skip intake altogether. Otherwise, we create an intake task to be handled by an operations team member.
Handling the worst case scenario
Because this project was among our early use cases for Kafka, we wanted to be prepared for the worst-case scenario: we produce a message requesting automation results, and for whatever reason—maybe DPI is down—we don’t hear back. For us, we defined a reasonable time frame as fifteen minutes, in order to avoid building up a large backlog of work that would have to be offloaded back onto our operations team.
To this end, we store metadata about the automation lifecycle in an intake_automation_results table, which includes timestamps representing when we sent a message to DPI and when we received a response back. A cron job runs every minute to check that table for rows where received_at is blank, and sent_at was more than fifteen minutes ago. If this occurs, we flag the prescription to be manually intaken by an operations team member.
The diagram below illustrates our intake automation architecture.
Current state and results
Intake automation is now live at Alto. We’re seeing prescriptions fly through in just a few minutes — during which time our automation creates a user profile if one doesn’t already exist, links the new prescription to that profile, and finds and bills the user’s insurance; a pharmacist reviews the prescription for clinical accuracy; and the user receives a text message to schedule their delivery. This single flow, prior to intake automation, once took up to four hours to complete!
Going from zero to one is quite a feat, and it wouldn’t have been possible without an event-based approach. Now that intake automation has laid the groundwork, we know that Kafka will only continue to provide exponential value to Alto as we grow and migrate more of our services to this event-based architecture.
Interested in learning more about engineering at Alto? Follow us on LinkedIn.