Ingesting HL7v2 with ❄️ Snowflake Part 1

Ingesting HL7v2 Feeds Into Snowflake

Snowflake is a cloud-based data warehouse system that enables organizations to securely store, query, and analyze their vast amounts of data. It has powerful capabilities for managing complex workloads and quickly adapting to changing business needs.

By leveraging Snowflake’s scalability and performance, healthcare organizations can easily integrate HL7v2-formatted EHR data into their data warehouse. That all sounds awesome but how does data actually get into Snowflake? If you haven't ready my primer on HL7v2 check out this post here.

Our goal is to setup tooling that will listen for hl7v2 events from our EHR and add the ones we need to Snowflake. For this ETL (extract, transform, load) workflow we are after patients that are moved from outpatient to inpatient (ADT A06 Event) with a filter on the diagnosis code (DG1.3, A41.9 for Sepsis), we will then add the filtered records to Snowflake for analytics. This will give us a table of patients who went to the hospital with an infection that had a complication.

Known Nuances

Infrastructure - HL7v2 is most commonly sent over a TCP/IP connection using MLLP, while some workflows involve http(s) many still rely on an MLLP server and some networking. Thus, much of we have come to rely on for rapid development only work with http(s), such as AWS lambda functions.

Security - The MLLP TCP/IP stuff I just mentioned doesn’t have built in security like TLS, HL7v2 predates the internet! Common practice has been IPSec and IKEv2, which has it’s own headache overhead that I’ll try to write a follow up on.

Message Routing - Message routing is a subtle nuance that can often be overlooked. With HL7v2, messages can have varying structures and information based on the system that is sending or receiving them. This means that without a clear understanding of both the sending and receiving systems, it's easy to misroute a message or miss essential information. Modern tools try to simplify this process, but the myriad of different structures means you'll often have to create custom routes and parsers to ensure data gets where it needs to go without any loss or misinterpretation.

💊 Integration Engines

Integration engines like Mirth Connect, Cloverleaf, and Rhapsody have existed for years to accommodate the complexities of HL7v2. These tools allow for message transformation, routing, and monitoring, giving developers a platform for managing HL7v2 integrations.

💊 Cloud-Based Solutions

Cloud providers like AWS, Google Cloud, and Azure have started offering healthcare-specific solutions that cater to HL7v2 integrations. These services often come with built-in tools for security, scaling, and monitoring, reducing the overhead of managing and maintaining an integration environment.

For today, I’m going to leverage a couple cloud based solutions to handle our use case: Retrohook to handle the hl7v2 bits, and Tray.io for the last mile. I advise you to pick the right tools for your team.

We have highlighted a few popular tools and some possible reasons to why one may fit your business better.

Mirth Connect

  • Your team has familiarity with Mirth, and/or Java.
  • Your team has devOps resources capable of IPSec VPN tunneling from your cloud environment.

Mulesoft & Patient-360

  • Your business is already using products from Salesforce.
  • Your business has Salesforce development resources available.

Google Healthcare API & MLLP Adapter

  • Your business runs on Google Cloud Infrastructure.
  • Your team has Pub/Sub and TCP/IP knowledge.

Retrohook (beta)

  • The tool requires little to no technical know-how.
  • The tool handles the most difficult parts of HL7v2.
  • We are not glued to the complexity and overhead of FHIR, the tool offers a simple JSON webhook.

Setting up Retrohook HL7v2 tooling

We are going to use Retrohook for transmitting our HL7 v2 data over a REST API successfully, if you have messages hitting your cloud infrastructure you may skip the Retrohook part below, otherwise let’s start here.

The ease of the tool is probably the strongest selling point. Technical knowledge is not required for users to leverage their healthcare data feeds. The downside is the tool is a beta version and is limited in capabilities. After logging (or signing up) we will create a fresh workspace for this workflow, but you could use the default.

Create a new workspace

Create a new Environment and connect to your AWS account

This will setup a new environment and secure connection.

  • To create a new environment, we need to give it a name, and provide credentials to a user with CloudFormation access.
  • Once we create the environment we can manage the version and deploy new updates when they are available.

🚀 Deploy the 3 modules into your AWS account

We then deploy the modules into your default VPC using the region specified. If you have a VPC you would like to use, you can specify it in the advanced settings. They are kept separate to allow for easier management of the resources.

You can think of the infrastructure as the time machine from bill and ted, it’s a phone booth that will take your hl7v2 messages and send them to the future (your cloud infrastructure).

Note: You will see other environments (depending on your workspace billing level) that can be used instead of deploying our own.

  1. Development environment: Doesn’t require network security, doesn’t support PHI
  2. Production environment: Supports and requires network security, supports PHI

Configuring security

With Retrohook, the security tool is pretty simple, and if we were using the multi tenant environment this would be our first step.

We simply provide the IP addresses of our HL7v2 MLLP server and the tool will generate a connection agent for us, which installs and connects to our environment over IPsec with IKEv2.

Generating a new tunnel

To create new tunnel agent we simply provide the private ip and public ip addresses for our message server, give it a name and click generate, the process took about 4-5 minutes to deploy, it deploys a secure site-to-site tunnel for MLLP which will leverage IPSec and IKEv2.

Connecting to the tunnel

For quick setup, just run the quick connect script on the system that you would like to configure the tunnel on.

  sudo sh quick-connect.sh 
  ...
  # backups any existing settings and add new connection settings for Retrohook
  Making a backup of current ipsec.conf - ipsec.conf.bak 
  Creating new ipsec.conf to connect to Retrohook VPN
  Making a backup of current ipsec.conf - ipsec.conf.bak
  Created new ipsec.conf to connect to Retrohook VPN, now creating new ipsec.secrets
  Created new ipsec.secrets to connect to Retrohook VPN, now restarting ipsec
  Redirecting to: systemctl restart ipsec.service
  ...
  000 algorithm IKE DH Key Exchange: name=DH24, bits=2048
  000 "RetrohookPrimaryTunnel": 192.168.2.151<192.168.2.151>[44.193.173.135]...52.86.74.145<52.86.74.145>===172.31.0.0/16; erouted; eroute owner: #2
  ...
  000 "RetrohookSecondaryTunnel":   IKE algorithms: AES_CBC_128-HMAC_SHA1-MODP1024
  000 "RetrohookSecondaryTunnel":   ESP algorithms: AES_CBC_128-HMAC_SHA1_96-MODP2048
  000 Total IPsec connections: loaded 2, active 1
  000 State Information: DDoS cookies not required, Accepting new IKE connections
  000 IKE SAs: total(1), half-open(0), open(0), authenticated(1), anonymous(0)
  000 IPsec SAs: total(1), authenticated(1), anonymous(0)
  000  
  000 1: "RetrohookPrimaryTunnel":4500 STATE_PARENT_I3 (PARENT SA established); EVENT_SA_REPLACE in 9979s; newest ISAKMP; idle; import:admin initiate
  000 2: "RetrohookPrimaryTunnel":4500 STATE_V2_IPSEC_I (IPsec SA established); EVENT_SA_REPLACE in 9769s; newest IPSEC; eroute owner; isakmp#1; idle; import:admin initiate

🛡️ Now we have connected our messaging infrastructure with a site-to-site vpn and have backup tunnel ready for failover! 🎉

Creating A Workflow

With Retrohook (and many other integration tools) the business logic is contained inside a “workflow”, which for Retrohook will run on the tunnel we created in step 3.

Workflows are the glue that connects HL7 messages from your EHR to modern modalities.

High level configuration
a. Event Adapter should be set to MLLP
b. Let’s use the Webhook Action Adapter
c. Providing a sample message allows us to define our next steps

Message Routing/Filtering

Message event filtering happens when we define the capture formula
a. We define the parameters to match on, let’s select a few key parameters to filter unnecessary messages out.
b. We can leverage the wildcards ? for a single character wildcard and * for a multi character wildcard.

Note: If you don’t see the fields you are after make sure the All toggle is set, I toggled to Selected after setting my matches.

Message Field Mapping

We perform a similar operation for setting the field mapping of our workflow. Let’s map out some basic demographics.

  1. When we map, we are providing the payload key, the Actions tab provides a live example.
  2. Use the cog icon to can configure custom transformations or lookups for each field (optional).
  3. The system handles the |,^, &, and repeating segments using arrays, and arrays of objects to ensure the integrity of the message is maintained. The system provides a sample in the Actions tab:

Any message that hits our Entry point and matches our trigger will trigger a HTTP Post Webhook with a payload similar to our sample:

{
  "MSH": {
    "hl7MessageType": "ADT",
    "hl7EventType": "A04",
    "messageDateTime": "20230426133527",
    "sendingFacility": "ACME HOSPITAL",
    "environmentFlag": "T"
  },
  "PID": {
    "patientAddressType": "L",
    "patientState": "TN",
    "patientCity": "NASHVILLE",
    "patientCountry": "US",
    "patientZIP": "37205",
    "patientStreetLine2": "APT 3507",
    "patientStreetLine1": "83 COLUMBIA AVE",
    "PID.13": [
      {
        "patientPhoneAreaCode": "706",
        "patientTelecomType": "P",
        "patientPhone": "5558907",
        "patientTelecomEquipmentType": "Mobile"
      },
      {
        "patientTelecomType": "E",
        "patientEmailAddress": "ADUNN@EXAMPLE.COM"
      }
    ],
    "patientMaritalStatus": "M",
    "patientSsn": "000-00-0384",
    "PID.5": [
      {
        "patientNameType": "L",
        "patientLastName": "DUNN",
        "patientFirstName": "ANTONIA"
      },
      {
        "patientNameType": "P",
        "patientLastName": "DUNN",
        "patientFirstName": "TONI"
      }
    ],
    "patientDeceasedFlag": "false",
    "patientRace": "BLACK",
    "PID.3": [
      {
        "identifier": "4029",
        "identifierAssigningAuthority": "EHR",
        "identifierType": "DATABASE"
      },
      {
        "identifier": "203716",
        "identifierAssigningAuthority": "HL7EHR",
        "identifierType": "MRN"
      }
    ],
    "patientPrimaryLanguage": "EN-US",
    "patientMothersMaidenName": "HARRIS",
    "patientSex": "F",
    "patientDob": "19720210"
  },
  "EVN": {
    "test": [
      "ADMINISTRATOR",
      "SYSTEM"
    ],
    "eventCode": "A04",
    "eventDateTime": "20230426133527"
  }
}

🎉 Now we have our HL7v2 messages being securely transmitted over a HTTPS Webhook using JSON.

Stay tuned: Part 2 will cover the Tray.io portion of the workflow, where we will take the JSON payload and add it to Snowflake.