Continuous Delivery (CD) and Infrastructure as Code (IaC) means apps, clusters, and environments are constantly changing in your business.
Drift occurs when an app, microservice, or infrastructure ‘drifts’ out of its intended configuration or approved operating boundaries.
In short, drift is difficult to detect and introduces risk which isn’t seen or managed until something serious happens (e.g., an outage, incident, breach).
Drift can happen in a variety of places:
- Container Orchestration
- Application Run-time
- Business Logic
- Data Flow
Drift isn’t just about Infrastructure
A common perception is that drift is an infrastructure problem, where IaC scripts (e.g., Terraform or CloudFormation) get out of sync with what is running in your environments.
For example, a dev team might use a CloudFormation script to provision a new environment that declares all EC2 instances should be “t2.small”. Meanwhile, an engineer decides to manually add a “c4.large” instance to the same environment.
With GitOps, we live in the predicament that the state of a production cluster *could* one day be out of sync with our K8 manifest sitting in our code repo. God forbid we might have pods running the wrong version of your app or microservice or the wrong number of replicas, and so you need a way to mitigate this risk.
So yes, infrastructure drift can happen, but something else can drift more frequently than you might imagine.
Business Logic and Data Flow Can Drift Too
Continuous Delivery (CD) means code, business logic, architecture, and data flow can change hourly in your environments.
Depending on the level of automation and guardrails in your CI/CD pipelines, engineers might deploy on-demand or be required to follow a review process (e.g., fill in this word doc with 180 questions) should a change be significant.
Consider this; a single code change can introduce new:
- 3rd Party Service calls
- Datastore or Database connections
- Data flow
- Risk you might not have considered or thought about
How Code Can Change Everything
A few weeks ago, a small code change resulted in a PII data exposure at an enterprise. The risk got into production because the engineer who committed the code change did not know that their code touched PII data and answered “no” in their change request questionnaire.
At Bionic, we detect and observe drift frequently when customers evaluate our software, and more often than not, that drift is related to business logic, architecture, and data flows.
We can’t eliminate all risks in applications or the business, but we can start to go beyond what we know and think differently about what could impact our business.
Applications are complex beasts to tame that encompass hundreds or thousands of components and dependencies. Every code change introduces potential risk; the question is–do you see these risks, and do you know how their potential impact?