Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 5 Next »

Problem Statment

Currently, on EVE software module, ZedAgent is responsible for top level orchestration, basos upgrade validation, cloud connectivity for configuration/status.

In the whole EVE node boot up process, ZedAgent and associated modules are spawned, only after network connectivity(through nim, waitfor address) and device registration (zedclient).

For  baseos upgrade validation, this leaves a gap between node boot up and real baseos upgrade transition process invocation in zedagent. Any failure inbetween, the device boot up until zedagent starts, may lead to device being struck in some indefinite state and may turn the device to a non-functional unit. 


Proposal

The zedagent module will be broken-up. The base of validation and over all connectivity and device health will be managed by DevAgent. The DevAgent will be one of the first modules to be spawned along with ledmanager, and will be persistent for the whole lifetime of the EVE node. The ZedAgent will be only responsible for cloud connectivity and configuration parsing and status/metrics publication. The baseos upgrade validation will be covered by DevAgent module, covering all the intermediary state for the device boot up. 

EVE Node Health Monitor Function

EVE Node health check functionality, consists of the following, 

 pillar agent(s) run state and responsiveness

          Each agent's health is monitored through watchdog timer. 

 Controller connectivity

The controller connectivity for the EVE node is evaluated, as following,

Reset Time

In normal operation scenario, for controller connectivity loss, the EVE node is rebooted after the reset timer interval.

 Fallback Time

On baseos upgrade, in validation phase, for controller connectivity loss, EVE Node falls back to fallback image, after the fallback time interval.

Current Implementation

The EVE node reset and fallback timer functionalities are currently part of ZedAgent Module.  


Proposal for Refactoring

Baseosmgr Module


ZedAgent Module


DevAgent Module


DevAgent will  listen to the following,

   - ledBlinker Status.  – for EVE node registration, controller connectivity change events

   - Zboot Status

   - Zedagent Status

DevAgent will publish to the following,

    - Zboot Config

    - DevAgent Status

ZedAgent additionally will listen to the following,

    - Dev Agent Status





PS. 

Currently, the scope of device health, as defined above, does not include the following,

            - cpu usage health

            - disk space usage health

            - network usage health

            - each agent's basic functionality check


  • No labels