Introduction
Sometimes, a BaseOs upgrade may fail because of transient error conditions. In general EVE provides for eventual consistency, where it will retry operations after a failure (in case the failure condition has gone away). However, a BaseOS update with associated reboot is quite disruptive and going in a loop repeating this even more so. Hence it makes sense to require some user intervention before a failed BaseOs Upgrade is retried.
Currently, Device Config API doesn't have a retry mechanism. Controller has to first remove the baseos configuration, wait for the device to sync-up, then reconfigure the BaseOs again. This is not very userfriendly.
This document describes the support to retry a failed BaseOs upgrade.
Proposed Solution
Introduce a new command "baseos_upgrade_retry" for devices.
EVE API
diff --git a/api/proto/config/devconfig.proto b/api/proto/config/devconfig.proto
index c58376ab7..7dc9c59e2 100644
--- a/api/proto/config/devconfig.proto
+++ b/api/proto/config/devconfig.proto
@@ -83,6 +83,19 @@ message EdgeDevConfig {
// if we set new epoch, EVE sends all info messages to controller
// it captures when a new controller takes over and needs all the info be resent
int64 controller_epoch = 25;
+
+ // Retry the BaseOs upgrade for the configured image ONLY if the image
+ // upgrade has failed. If the currently configured image is in FAILED state in the other
+ // partition, retry the image upgrade. ELSE - Do nothing. Just update the
+ // baseos_upgrade_retry counter in Info message.
+ DeviceOpsCmd baseos_upgrade_retry = 26;
}
diff --git a/api/proto/info/info.proto b/api/proto/info/info.proto
index 7bead8777..230452ac1 100644
--- a/api/proto/info/info.proto
+++ b/api/proto/info/info.proto
@@ -344,6 +344,13 @@ message ZInfoDevice {
// Are we in the process of rebooting EVE?
bool reboot_inprogress = 41;
+ // BaseOsUpgrade Retry Counter.
+ // if status_baseOs_upgrade_retry_counter != config.baseOs_upgrade_retry_counter &&
+ // configured_version_partition.State == ERROR:
+ // Trigger Upgrade
+ // status_baseOs_upgrade_retry_counter = config.baseOs_upgrade_retry_counter
+ // schedule_info_msg_to_be_sent()
+ uint32 baseOs_upgrade_retry_counter = 42;
}
Note: Even in case of No-Op for upgrade_retry, the device sends an Info message to the controller to update its baseos_upgrade_retry_counter.
EVE Support
- If the currently configured image is in FAILED state in the other partition, retry the image upgrade. ( Intended Use Case )
- ELSE Do nothing. Just update the baseos_upgrade_retry counter in the Info message and send an Info message to the Controller.