Status: In Progress

Sponsor User: <todo>

Date of Submission: Jul 3, 2023

Submitted by: Rahul Jadhav

Affiliation(s): AccuKnox

<Please fill out the above fields, and the Overview, Design and User Experience sections below for an initial review of the proposed feature.>

Scope and Signoff: (to be filled out by Chair)

<Please fill out the Overview, Design and User Experience sections for an initial review of the proposed feature.>

Overview

A mailing list for this sub-group has been created at https://lists.lfedge.org/g/OpenHorizonWorkloadSecurity and you can subscribe to the meeting calendar there, or by sending an email to OpenHorizonWorkloadSecurity+subscribe@lists.lfedge.org

OpenHorizon provides the ability to flexibly deploy edge workloads by providing its own orchestrating elements. As an edge service provider who uses OpenHorizon this provides immense flexibility in deploying and managing edge operations.

However, this flexibility comes at a tradeoff wherein the workloads deployed on edge might not necessarily be created by security savvy developers and might have vulnerability. The impact of such a vulnerability exploit can be immense since it can bring the edge to a halt, but more importantly, the attacker has the possibility of leveraging a security gap in one workload to target another workload on the same edge node since they are colocated. The edge workloads may contain sensitive data related to user and hence needs to be protected.

Furthermore, it is important for the edge administrators or service providers to have monitoring options for edge workloads. This could be needed further for compliance and regulatory purposes.

As an example, IEC 62443 standard defines following principles to be followed in the OT sector:

Principle of least privilege: Provide edge node components and external interfaces only the required access and deny everything else.
Defense in Depth: Multi layered defense techniques to delay or prevent a cyber attack in the industrial network
Risk Analysis: Practice used to address risks related to production infrastructure, production capacity etc

Design

OH Anax and Edge workload security design

OpenHorizon-AnyLog Integration.drawio original draw.io file for any modifications if needed.

Deployment Design

TODO

User Experience

Edge Node Deployment (Day 1)

On the target edge node, native (non-containerized) anax and KubeArmor agents

Working assumptions

We are setting a precedent for installation of optional third-party components
To simplify the installation process, keep each operation atomic, and to allow components to be installed in any order, all component installations will be decoupled from the base anax installation.
The process should also function if a person is "bringing their own already-installed component" and we are just integrating anax with a pre-existing KubeArmor installation.
The default case will be based on native applications, not containerized versions, although both options or a mix thereof should work.
The integration should also be easily reversible.
Workload deployment policies may optionally support this integration by specifying a security policy to deploy and activate along with the workload.
Node policies can specify a default security policy to apply to one or more workloads running on that node.
Deployment policies can override the node's default security policy due to greater specificity.
A node may have more than one anax agent running, but anax > 1 must always be containerized.
The same should apply to the KubeArmor agent ... agents>= 2 must be containerized and should not protect the host.

anax agent installation

Today, installing the agent on the target device involves running the "agent-install.sh" script as documented at Automated agent installation and registration. At this point, we are assuming that no signal needs to be sent to this installation script and process to notify it that KubeArmor should also be installed. If that were the case, we should consider a flag in the form of an installation argument or an environment variable. This will allow us to decouple the process of installing KubeArmor as an optional security component.

KubeArmor agent installation

Instead of altering the "agent-install.sh" script to trigger the KubeArmor installation process, we are proposing that a completely separate script be created that will install a native KubeArmor application, and then signal to anax that it has been installed and is ready to use. This assumes that anax has already been installed and configured, but does not need to be registered with an exchange for KubeArmor to be installed. In fact, if we are proposing to create or modify the node policy file, it is better if the anax agent is not currently registered.

On the target cluster

Remote, zero-touch provisioning (FDO)

Deployment UX

Should we consider k8s mode of deployment or pure-containerized mode of deployment? KubeArmor works best with k8s mode of deployment and is the recommended mode. Having said that, the previous integration/demo/POC done with OH was in pure-containerized mode.
How would the deployment of KubeArmor on the target edge node happen? Will it be deployed as a separate workload with its own control plane or will it be integrated into the same control plane as that of OH?
1. There is a value in keeping KubeArmor and associated tooling decoupled from Anax and OH Management Hub. This would allow independent updates and essentially the security should be considered as one more addon from the service provider side of things.
2. The real challenge here is how would OH framework allow extensions to be built to integrate third party tooling?
Ship the hardening policies along with the KubeArmor installation.

Day2 Operations UX

How would the policy add/delete/list/modify work?
How would the recommended policies be shown to the user?
How would the SIEM tools integrations be done and at what point?
How would upgrade of KubeArmor be handled?

Use-cases to consider

<TODO: Every security use-case could have a corresponding set of tags that could indicate the fulfilled compliance control, or attack framework (for e.g., MITRE) control fulfilled.>

Observability & Monitoring use-cases

Security Event Monitoring:

File Integrity Monitoring: Any changes to the systems folders should be monitored/audited.
Reverse Shell execution
Use of security sensitive primitives: setuid(), setguid(),chmod(),chown(),
Updates to root certificates folder
Use of kubectl exec to gain shell access in the pod
Privilege escalation attempted
Monitor for external networks access
Suspicious IP detection (for e.g. using Feodo Blocked IP List)
Monitor for use of DGA (Domain Generation Algorithms) in the workload

Application Performance Monitoring:

Excessive CPU usage: >90% of CPU used consistently for > 2 mins
Excessive Memory usage: >80% of allocated memory used
...

Goals

Install and run Open Horizon all-in-one, publish and deploy HomeAssistant and KubeArmor with test security policy
Demonstrate how to monitor the listed events and access the results

Deliverables

Documentation allowing anyone to replicate the results of the goals listed above
Demo video showing the results

Components

Open Horizon - to deliver and manage running workloads
KubeArmor - to monitor and enforce security policy on host and workloads
HomeAssistant - example service

Protection: Hardening use-cases

Node Hardening:

Protect systems folders: Do not allow updates to kernel modules on the host.
Prevent root certificates updates

Workload/Pod/Container Hardening:

Protecting workload Secrets. Secrets could be injected in the workloads using volume mounts, environment vars, etc. Provide clear guidelines and specific tooling to secure such secrets.
Protecting sensitive assets mounted using volume mount points

Protection: Enforcing principle of least privilege

Network Segmentation and enforcing least privilege network access
Enforce Process Whitelisting
Enforce least permissive access to sensitive assets. All volume mount points can be considered sensitive assets.
Enforce least permissive process based network control. Only allow certain set of processes to do network communication.

Protection: Enforcing Network Protection

Enforce Ingress/Egress controls using CIDRSets, Domain names, Protocols/Ports
Auto Discover Network Protection rules.

Workload Forensics

Workload Process Monitoring
Workload Sensitive Asset access
External Network exposure for workloads
Ability to query forensics details for a specified time duration from past X days.

Command Line Interface

How to extend Anax cli and integrate with karmor cli? Can we expect the user to have two clis? Does Anax cli offer pluggable interfaces?
The policy add/delete/update/list should be handled through this cli.

External Components

<Describe any new or changed interactions with components that are not the agent or the management hub.>

Affected Components

Security

APIs

<Describe and new/changed/deprecated APIs, including before and after snippets for clarity. Include which components or users will use the APIs.>

Build, Install, Packaging

Documentation Notes

Test

<Summarize new automated tests that need to be added in support of this feature, and describe any special test requirements that you can foresee.>

OH Agent and Edge Workload Runtime Security