...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
...
Support for independent and autonomous AI and ML model deployment was added to OpenHorizon a few years ago. Since that time, support for edge nodes which are manifest as Kubernetes clusters has also been added. However, when the edge cluster support was added, there was not sufficient time and resource to support model deployment to edge clusters. This design proposes to address that problem by introducing model deployment to edge clusters.
...
- Enable policy based deployment of models to edge clusters, with no changes to the existing model policy schema.
- Enable deployed applications to receive models and model updates using the same APIs used by applications which run on an edge device.
Receiving models
The ESS API is the means by which the service can poll for new and updated models. On edge devices, when a service is started by the agent, it is provided with a URL, login credentials, and an SSL certificate for accessing the ESS API. The URL and SSL certificate are the same for every service that is started by a given agent. The login credentials are unique to each service instance, and are the means of identifying which models the service is able to receive. The URL is provided through OpenHorizon platform environment variables (HZN_ESS_API_PROTOCOL, HZN_ESS_API_ADDRESS, HZN_ESS_API_PORT), the login credentials and SLL cert are mounted to the service container at a location indicated by two other environment variables; HZN_ESS_AUTH and HZN_ESS_CERT. Please note that the SSL cert does not contain a private key, it is a client side cert. The only truly sensitive information is the login credentials.
On edge clusters, the service that is deployed is actually a k8s operator (built by the service developer). The operator is responsible for starting the real application containers. Because OpenHorizon has no visibility to the application containers, it is the responsibility of the OH deployed operator to forward the HZN_ESS environment variables, login credentials and SSL cert to the relevant application containers. An operator deployed as an OH service does not need to perform this forwarding if model deployment is not a feature required by the application.
There is a subtle but important difference in how the operator will interact with the HZN_ESS_AUTH and HZN_ESS_CERT environment variables. On edge clusters, these env vars will contain the name of a k8s secret containing the respective information; one for the login credentials and one for the SSL certificate. This is different from edge devices, where that env var contains the name of the folder where the credentials are mounted. This difference will enable the operator to simply attach the secrets to any application containers that need them, in a way that is natural for k8s application developers. The OH agent will create these two secrets as part of deploying the operator.
Enabling the ESS API
On edge otdevices, the ESS API endpoint is provided by the agent itself. The device agent uses a Unix Domain Socket as the network address of the API. Clearly this will not work for the edge cluster agent, for many reasons. Therefore, the agent's ESS API needs to be accessible over the cluster's internal network. This is accomplished in k8s by attaching a k8s service (the term service is now overloaded) to the agent's deployment. It is the responsibility of the agent install script to establish a k8s service for the agent's ESS API. The k8s service looks something like this:
Code Block |
---|
apiVersion: v1
kind: Service
metadata:
name: agent-service
namespace: openhorizon-agent
spec:
selector:
app: agent
ports:
- protocol: TCP
port: 8443 |
This k8s service definition includes a few notable aspects:
- The API host name provided to the application is the metadata.name field; HZN_ESS_API_ADDRESS = "agent-service"
- The API protocol is https; HZN_ESS_API_PROTOCOL="https"
- The API port is 8443; HZN_ESS_API_PORT=8443
- This k8s service is attached to an app called "agent" which is the app name given to the agent when it is installed.
- This service is defined in the "openhorizon-agent" k8s namespace.
In order for the above settings to be variable, i.e. set by the agent installer, the k8s service definition above needs to be conditioned to reflect those differences before it is installed to the edge cluster.
This k8s service allows the edge cluster agent to continue to be the ESS API provider, and enables application containers within the cluster to access the API, even if the agent is moved from one pod to another.
Model Deployment
There are numerous "node type" checks throughout the anax code for the agent and the agbot, some of which disable the deployment of models to edge clusters. These checks should be removed were appropriate to the re-enable model deployment. Removing these checks will allow the ESS to be started in edge cluster agent and allow the agbot to route models to edge cluster nodes. Aside from these minor code updates, model deployment should work exactly as it does for device nodes. When an agreement is formed, the agbot instructs the MMS to deploy models to the node. Since the ESS will be enabled in the edge cluster agent, the agbot's routing instructions will be performed by the MMS exactly as is done for agreements with device nodes.
Model Storage in the Agent
On edge devices, models deployed to an edge node are stored in root protected storage on the host. For edge clusters, the models are stored in a k8s persistent volume that is available when the agent is installed. A persistent volume is required in case the agent is moved from one pod to another. The persistent volume must be large enough to accommodate the expected number and size of models that will be needed by the edge cluster. Given the potential variability of storage requirements, the node owner must be able to provide this persistent volume to the agent install script. A default persistent volume can be created by the agent install script if one is not provided by the node owner, but that default is unlikely to meet the requirements of all use cases.
As an org admin, I want to write a model deployment policy that targets a service that is deployed to an edge cluster.
...
As a node owner, I want the agent installed and configured on an edge cluster to automatically support deployment of models to services that run on my edge cluster.
None
None
Agent (the k8s agent container)
Agent install
None
Edge cluster agent install
- Ensure that the ESS starts and stops (use unregister command) on the edge cluster agent.
- Ensure that models are removed from the edge cluster agent (including the storage) when deleted from the CSS or undeployed (e.g. model policy changed) from the node.