Extend wait_for maas.py, wait_for_* attempts arg
maas.py: Extend wait_for states with timeout param
Extend the wait_for states with a timeout parameter.
The timeout value is taken from reclass pillar data if
defined. Oterwise, the states use the default value.
Based on Ting's PR [1], slightly refactored.
[1] https://github.com/salt-formulas/salt-formula-maas/pull/34
Signed-off-by: ting wu <ting.wu@enea.com>
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
maas.py: wait_for_*: Add attempts arg
Introduce a new parameter that allows a maximum number of automatic
recovery attempts for the common failures w/ machine operations.
If not present in pillar data, it defaults to 0 (OFF).
Common error states, possible cause and automatic recovery pattern:
* New
- usually indicates issues with BMC connectivity (no network route,
but on rare occassions it happens due to MaaS API being flaky);
- fix: delete the machine, (re)process machine definitions;
* Failed commissioning
- various causes, usually a simple retry works;
- fix: delete the machine, (re)process machine definitions;
* Failed testing
- incompatible hardware, missing drivers etc.
- usually consistent and board-specific;
- fix: override failed testing
* Allocated
- on rare ocassions nodes get stuck in this state instead 'Deploy';
- fix: mark-broken, mark-fixed, if it failed at least once before
perform a fio test (fixes another unrelated spurious issue with
encrypted disks from previous deployments), (re)deploy machines;
* Failed deployment
- various causes, usually a simple retry works;
- fix: same as for nodes stuck in 'Allocated';
Relateed: PROD-28390(PROD:28390)
Change-Id: Ifb7dd9f8fcfbbed557e47d8fdffb1f963604fb15
Signed-off-by: Alexandru Avadanii <Alexandru.Avadanii@enea.com>
6 files changed