https://argoproj.github.io/argo-rollouts/features/analysis/
Overview - Argo Rollouts - Kubernetes Progressive Delivery Controller
Analysis & Progressive Delivery Argo Rollouts provides several ways to perform analysis to drive progressive delivery. This document describes how to achieve various forms of progressive delivery, varying the point in time analysis is performed, its freque
argoproj.github.io
새 버전의 파드 배포 과정에서 사용자가 지정한 메트릭에 대한 측정 및 평가를 수행 후 결과에 따라 롤아웃 진행, 롤백을 결정하는 설정
측정 및 평가의 기준이 되는 메트릭은 프로메테우스 쿼리, kubernetes job, web(url 호출) 등 다양함
트래픽 제어와 함께 analysis 를 사용하여 카나리 배포를 하는 경우 효과적인 배포 전략을 세울 수 있다.
# 백그라운드 분석
-> startingStep 에 설정한 steps 부터 모든 롤아웃 과정이 완료될 때까지 반복적으로 측정 및 평가 수행
-> 최초 1회 테스트 후 interval 에 설정된 시간만큼 간격을 두어 측정 및 평가 실행
-> failureLimit 값을 초과하여 실패하면 롤아웃 과정은 종료됨
아래는 Envoy Gateway 의 메트릭을 Prometheus 를 통해 쿼리하여 정상 응답 비율이 95% 이상인 경우 분석이 성공하는 백그라운드 분석 예제로 백그라운드 분석은 실패하지 않는 이상 배포가 끝날 때까지 반복하여 메트릭을 평가한다.
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-flask-app
spec:
progressDeadlineSeconds: 300
replicas: 3
revisionHistoryLimit: 3
selector:
matchLabels:
app: my-flask-app
strategy:
canary:
analysis:
templates:
- templateName: success-rate
startingStep: 2 # (0 -> 1 -> 2) delay starting analysis run until setWeight: 40%
args:
- name: backend-namespace
value: joel-test
- name: backend-name
value: backend-eg
steps:
- setWeight: 20
- pause: {duration: 10m}
- setWeight: 40
- pause: {duration: 10m}
- setWeight: 60
- pause: {duration: 10m}
- setWeight: 80
- pause: {duration: 10m}
template:
metadata:
labels:
app: my-flask-app
spec:
terminationGracePeriodSeconds: 30
serviceAccount: my-flask-app
containers:
- name: my-flask-app
image: my-flask-app:2b557100a0c22b7c1ec79408e2bba826b94f04aa
imagePullPolicy: Always
...
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: backend-namespace
- name: backend-name
metrics:
- name: success-rate
interval: 1m
failureLimit: 3
successCondition: len(result) > 0 && result[0] >= 0.95
provider:
prometheus:
address: https://prometheus.example.com
query: |
sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/{{args.backend-namespace}}/{{args.backend-name}}/.*", envoy_response_code_class="2"}[1m]
)) /
sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/{{args.backend-namespace}}/{{args.backend-name}}/.*"}[1m]
))
(참고) 분석으로 인해 생성된 AnalysisRun
-> 처음 평가는 실패했지만 이후 평가는 성공하여 롤아웃이 중단되지 않고 진행됨
➜ [~] git:(master) ✗ k get ar
NAME STATUS AGE
my-flask-app-6f66db4bd4-2 Successful 38m
➜ [~] git:(master) ✗ k describe ar my-flask-app-6f66db4bd4-2
Name: my-flask-app-6f66db4bd4-2
Namespace: joel-test
Labels: app=my-flask-app
rollout-type=Background
rollouts-pod-template-hash=6f66db4bd4
Annotations: rollout.argoproj.io/revision: 2
API Version: argoproj.io/v1alpha1
Kind: AnalysisRun
Metadata:
Creation Timestamp: 2025-12-08T01:05:20Z
Generation: 12
Owner References:
API Version: argoproj.io/v1alpha1
Block Owner Deletion: true
Controller: true
Kind: Rollout
Name: my-flask-app
UID: ef82b969-ae2f-46f0-a2e0-0cf9ebb4f74f
Resource Version: 3284104
UID: 8a8333c5-c51d-4fe7-aa25-56143e45a850
Spec:
Args:
Name: backend-namespace
Value: joel-test
Name: backend-name
Value: backend-eg
...
Status:
Completed At: 2025-12-08T01:30:28Z
Dry Run Summary:
Message: Run Terminated
Metric Results:
Consecutive Success: 8
Count: 9
Failed: 1
Measurements:
Finished At: 2025-12-08T01:05:20Z
Phase: Failed
Started At: 2025-12-08T01:05:20Z
Value: [0]
Finished At: 2025-12-08T01:06:20Z
Phase: Successful
Started At: 2025-12-08T01:06:20Z
Value: [1]
...
Metadata:
Resolved Prometheus Query: sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/joel-test/backend-eg/.*", envoy_response_code_class="2"}[1m]
)) /
sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/joel-test/backend-eg/.*"}[1m]
))
Name: success-rate
Phase: Successful
Successful: 8
Phase: Successful
Run Summary:
Count: 1
Successful: 1
Started At: 2025-12-08T01:05:20Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal MetricSuccessful 13m rollouts-controller Metric 'success-rate' Completed. Result: Successful
Normal AnalysisRunSuccessful 13m rollouts-controller Analysis Completed. Result: Successful
# 인라인 분석
-> 롤아웃 과정에 분석이 포함되어 특정 단계에만 분석이 실행되고 측정 및 평가 결과에 따라 롤아웃을 더 진행할지, 중단할지가 결정됨
-> 기본적으로 1회만 측정 및 평가를 수행하여 결과에 따라 롤아웃을 진행하거나 중단하는데, count, interval 설정을 추가하여 여러 차례 측정 및 평가를 실행하는 것도 가능
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-flask-app
spec:
progressDeadlineSeconds: 300
replicas: 3
revisionHistoryLimit: 3
selector:
matchLabels:
app: my-flask-app
strategy:
canary:
steps:
- setWeight: 20
- pause: {duration: 5m}
- analysis:
templates:
- templateName: success-rate
args:
- name: backend-namespace
value: joel-test
- name: backend-name
value: backend-eg
template:
metadata:
labels:
app: my-flask-app
spec:
terminationGracePeriodSeconds: 30
serviceAccount: my-flask-app
containers:
- name: my-flask-app
image: my-flask-app:f52ac3c36b364c83101284ad8913057c1868868e
imagePullPolicy: Always
...
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: backend-namespace
- name: backend-name
metrics:
- name: success-rate
successCondition: len(result) > 0 && result[0] >= 0.95
provider:
prometheus:
address: https://prometheus.example.com
query: |
sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/{{args.backend-namespace}}/{{args.backend-name}}/.*", envoy_response_code_class="2"}[1m]
)) /
sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/{{args.backend-namespace}}/{{args.backend-name}}/.*"}[1m]
))
(참고) 분석으로 인해 생성된 AnalysisRun
-> 분석의 측정 및 평가가 실패하여 롤아웃이 abort 된 모습
➜ [~] git:(master) ✗ k describe ar my-flask-app-7c78979668-3-2
Name: my-flask-app-7c78979668-3-2
Namespace: joel-test
Labels: app=my-flask-app
rollout-type=Step
rollouts-pod-template-hash=7c78979668
step-index=2
Annotations: rollout.argoproj.io/revision: 3
API Version: argoproj.io/v1alpha1
Kind: AnalysisRun
Metadata:
Creation Timestamp: 2025-12-08T01:47:59Z
Generation: 2
Owner References:
API Version: argoproj.io/v1alpha1
Block Owner Deletion: true
Controller: true
Kind: Rollout
Name: my-flask-app
UID: ef82b969-ae2f-46f0-a2e0-0cf9ebb4f74f
Resource Version: 3291357
UID: a6268725-3c0d-4e46-bf11-cdbc7390a6cb
Spec:
Args:
Name: backend-namespace
Value: joel-test
Name: backend-name
Value: backend-eg
...
Status:
Completed At: 2025-12-08T01:47:59Z
Dry Run Summary:
Message: Metric "success-rate" assessed Failed due to failed (1) > failureLimit (0)
Metric Results:
Count: 1
Failed: 1
Measurements:
Finished At: 2025-12-08T01:47:59Z
Phase: Failed
Started At: 2025-12-08T01:47:59Z
Value: [0]
Metadata:
Resolved Prometheus Query: sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/joel-test/backend-eg/.*", envoy_response_code_class="2"}[1m]
)) /
sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/joel-test/backend-eg/.*"}[1m]
))
Name: success-rate
Phase: Failed
Phase: Failed
Run Summary:
Count: 1
Failed: 1
Started At: 2025-12-08T01:47:59Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning MetricFailed 40s rollouts-controller Metric 'success-rate' Completed. Result: Failed
Warning AnalysisRunFailed 40s rollouts-controller Analysis Completed. Result: Failed
➜ [~] kubectl argo rollouts get rollout my-flask-app -w
Name: my-flask-app
Namespace: joel-test
Status: ✖ Degraded
Message: RolloutAborted: Rollout aborted update to revision 3: Metric "success-rate" assessed Failed due to failed (1) > failureLimit (0)
Strategy: Canary
Step: 0/3
SetWeight: 0
ActualWeight: 0
Images: my-flask-app:2b557100a0c22b7c1ec79408e2bba826b94f04aa (stable)
Replicas:
Desired: 3
Current: 3
Updated: 0
Ready: 3
Available: 3
NAME KIND STATUS AGE INFO
⟳ my-flask-app Rollout ✖ Degraded 57m
├──# revision:3
│ ├──⧉ my-flask-app-7c78979668 ReplicaSet • ScaledDown 5m29s canary
│ └──α my-flask-app-7c78979668-3-2 AnalysisRun ✖ Failed 18s ✖ 1
├──# revision:2
│ ├──⧉ my-flask-app-6f66db4bd4 ReplicaSet ✔ Healthy 45m stable
│ │ ├──□ my-flask-app-6f66db4bd4-lwhl4 Pod ✔ Running 45m ready:1/1
│ │ ├──□ my-flask-app-6f66db4bd4-v28r7 Pod ✔ Running 38m ready:1/1
│ │ └──□ my-flask-app-6f66db4bd4-jk7zl Pod ✔ Running 27m ready:1/1
│ └──α my-flask-app-6f66db4bd4-2 AnalysisRun ✔ Successful 42m ✔ 8,✖ 1
└──# revision:1
└──⧉ my-flask-app-779f6988b5 ReplicaSet • ScaledDown 57m
# 클러스터 분석 템플릿
-> 여러 네임스페이스에서 분석 템플릿을 공유하여 사용하고자 하는 경우 사용
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: guestbook
spec:
...
strategy:
canary:
steps:
- setWeight: 20
- pause: {duration: 5m}
- analysis:
templates:
- templateName: success-rate
clusterScope: true
args:
- name: service-name
value: guestbook-svc.default.svc.cluster.local
---
apiVersion: argoproj.io/v1alpha1
kind: ClusterAnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: service-name
- name: prometheus-port
value: 9090
metrics:
- name: success-rate
successCondition: len(result) > 0 && result[0] >= 0.95
provider:
prometheus:
address: "http://prometheus.example.com:{{args.prometheus-port}}"
query: |
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}",response_code!~"5.*"}[5m]
)) /
sum(irate(
istio_requests_total{reporter="source",destination_service=~"{{args.service-name}}"}[5m]
))
# argo rollouts experiment
-> 테스트용 리플리카셋을 추가로 생성하여 분석을 수행하는 방법
-> 단독으로 Experiment 리소스를 정의하여 테스트를 생성도 가능하고, Rollout 리소스의 steps 에 experiment 를 정의하여 테스트를 롤아웃 단계에 포함시킬 수도 있다.
https://argoproj.github.io/argo-rollouts/features/experiment/
Experiments - Argo Rollouts - Kubernetes Progressive Delivery Controller
Experiment CRD What is the Experiment CRD? The Experiment CRD allows users to have ephemeral runs of one or more ReplicaSets. In addition to running ephemeral ReplicaSets, the Experiment CRD can launch AnalysisRuns alongside the ReplicaSets. Generally, tho
argoproj.github.io
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-flask-app
spec:
progressDeadlineSeconds: 300
replicas: 3
revisionHistoryLimit: 3
selector:
matchLabels:
app: my-flask-app
strategy:
canary:
steps:
- experiment:
duration: 1h
templates:
- name: v1
specRef: stable
- name: v2
specRef: canary
analyses:
- name : success-rate
templateName: success-rate
args:
- name: backend-namespace
value: joel-test
- name: backend-name
value: backend-eg
template:
metadata:
labels:
app: my-flask-app
spec:
terminationGracePeriodSeconds: 30
serviceAccount: my-flask-app
containers:
- name: my-flask-app
image: my-flask-app:f52ac3c36b364c83101284ad8913057c1868868e
imagePullPolicy: Always
...
---
apiVersion: argoproj.io/v1alpha1
kind: AnalysisTemplate
metadata:
name: success-rate
spec:
args:
- name: backend-namespace
- name: backend-name
metrics:
- name: success-rate
interval: 1m
failureLimit: 3
successCondition: len(result) > 0 && result[0] >= 0.95
provider:
prometheus:
address: https://prometheus.example.com
query: |
sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/{{args.backend-namespace}}/{{args.backend-name}}/.*", envoy_response_code_class="2"}[1m]
)) /
sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/{{args.backend-namespace}}/{{args.backend-name}}/.*"}[1m]
))
(참고) 테스트에 실패하여 롤아웃이 abort 된 상태
➜ [~] git:(master) ✗ k describe exp my-flask-app-779f6988b5-8-0
Name: my-flask-app-779f6988b5-8-0
Namespace: joel-test
Labels: rollouts-pod-template-hash=779f6988b5
Annotations: rollout.argoproj.io/revision: 8
API Version: argoproj.io/v1alpha1
Kind: Experiment
Metadata:
Creation Timestamp: 2025-12-08T04:16:23Z
Generation: 15
Owner References:
API Version: argoproj.io/v1alpha1
Block Owner Deletion: true
Controller: true
Kind: Rollout
Name: my-flask-app
UID: ef82b969-ae2f-46f0-a2e0-0cf9ebb4f74f
Resource Version: 3357926
UID: 163f64be-c5c0-484f-b07f-c3fd88bc2336
Spec:
Analyses:
Args:
Name: backend-namespace
Value: joel-test
Name: backend-name
Value: backend-eg
Name: success-rate
Template Name: success-rate
Analysis Run Metadata:
Duration: 1h
Progress Deadline Seconds: 300
Templates:
Name: v1
Selector:
Match Labels:
App: my-flask-app
Rollouts - Pod - Template - Hash: 7c78979668
Template:
Metadata:
Labels:
App: my-flask-app
Rollouts - Pod - Template - Hash: 7c78979668
Spec:
Containers:
Env:
Name: TYPE
Value: dev
...
Name: v2
Selector:
Match Labels:
App: my-flask-app
Rollouts - Pod - Template - Hash: 779f6988b5
Template:
Metadata:
Labels:
App: my-flask-app
Rollouts - Pod - Template - Hash: 779f6988b5
Spec:
Containers:
Env:
Name: TYPE
Value: dev
...
Status:
Analysis Runs:
Analysis Run: my-flask-app-779f6988b5-8-0-success-rate
Message: Metric "success-rate" assessed Failed due to failed (4) > failureLimit (3)
Name: success-rate
Phase: Failed
Available At: 2025-12-08T04:16:33Z
Conditions:
Last Transition Time: 2025-12-08T04:16:23Z
Last Update Time: 2025-12-08T04:26:04Z
Message: Experiment "my-flask-app-779f6988b5-8-0" is running.
Reason: NewReplicaSetAvailable
Status: True
Type: Progressing
Message: Metric "success-rate" assessed Failed due to failed (4) > failureLimit (3)
Phase: Failed
Template Statuses:
Available Replicas: 0
Last Transition Time: 2025-12-08T04:26:04Z
Name: v1
Ready Replicas: 0
Replicas: 0
Status: Successful
Updated Replicas: 0
Available Replicas: 0
Last Transition Time: 2025-12-08T04:26:04Z
Name: v2
Ready Replicas: 0
Replicas: 0
Status: Successful
Updated Replicas: 0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal TemplateProgressing 9m42s (x2 over 9m42s) rollouts-controller Template 'v1' transitioned from -> Progressing
Normal TemplateProgressing 9m42s (x2 over 9m42s) rollouts-controller Template 'v2' transitioned from -> Progressing
Normal ExperimentPending 9m42s (x2 over 9m42s) rollouts-controller Experiment transitioned from -> Pending
Normal ScalingReplicaSet 9m42s rollouts-controller Scaled up ReplicaSet my-flask-app-779f6988b5-8-0-v1 from 0 to 1
Normal ScalingReplicaSet 9m42s rollouts-controller Scaled up ReplicaSet my-flask-app-779f6988b5-8-0-v2 from 0 to 1
Normal TemplateRunning 9m32s rollouts-controller Template 'v1' transitioned from Progressing -> Running
Normal TemplateRunning 9m32s rollouts-controller Template 'v2' transitioned from Progressing -> Running
Normal ExperimentRunning 9m31s rollouts-controller Experiment transitioned from Pending -> Running
Normal AnalysisRunPending 9m31s rollouts-controller AnalysisRun 'success-rate' transitioned from -> Pending
Normal AnalysisRunRunning 9m31s rollouts-controller AnalysisRun 'success-rate' transitioned from -> Running
Warning AnalysisRunFailed 31s rollouts-controller AnalysisRun 'success-rate' transitioned from Running -> Failed: Metric "success-rate" assessed Failed due to failed (4) > failureLimit (3)
Warning ExperimentFailed 31s rollouts-controller Experiment transitioned from Running -> Failed
Normal TemplateSuccessful 31s rollouts-controller Template 'v1' transitioned from Running -> Successful
Normal TemplateSuccessful 31s rollouts-controller Template 'v2' transitioned from Running -> Successful
Normal ScalingReplicaSet 1s rollouts-controller Scaled down ReplicaSet my-flask-app-779f6988b5-8-0-v1 from 1 to 0
Normal ScalingReplicaSet 1s rollouts-controller Scaled down ReplicaSet my-flask-app-779f6988b5-8-0-v2 from 1 to 0
➜ [~] git:(master) ✗ k describe ar my-flask-app-779f6988b5-8-0-success-rate
Name: my-flask-app-779f6988b5-8-0-success-rate
Namespace: joel-test
Labels: <none>
Annotations: <none>
API Version: argoproj.io/v1alpha1
Kind: AnalysisRun
Metadata:
Creation Timestamp: 2025-12-08T04:16:34Z
Generation: 5
Owner References:
API Version: argoproj.io/v1alpha1
Block Owner Deletion: true
Controller: true
Kind: Experiment
Name: my-flask-app-779f6988b5-8-0
UID: 163f64be-c5c0-484f-b07f-c3fd88bc2336
Resource Version: 3357687
UID: 360e8840-b490-4fe2-8c29-25e4ce6b2667
Spec:
Args:
Name: backend-namespace
Value: joel-test
Name: backend-name
Value: backend-eg
...
Status:
Completed At: 2025-12-08T04:25:34Z
Dry Run Summary:
Message: Metric "success-rate" assessed Failed due to failed (4) > failureLimit (3)
Metric Results:
Count: 4
Failed: 4
Measurements:
Finished At: 2025-12-08T04:16:34Z
Phase: Failed
Started At: 2025-12-08T04:16:34Z
Value: [0]
Finished At: 2025-12-08T04:19:34Z
Phase: Failed
Started At: 2025-12-08T04:19:34Z
Value: [0]
Finished At: 2025-12-08T04:22:34Z
Phase: Failed
Started At: 2025-12-08T04:22:34Z
Value: [0]
Finished At: 2025-12-08T04:25:34Z
Phase: Failed
Started At: 2025-12-08T04:25:34Z
Value: [0]
Metadata:
Resolved Prometheus Query: sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/joel-test/backend-eg/.*", envoy_response_code_class="2"}[1m]
)) /
sum(irate(
envoy_cluster_upstream_rq_xx{envoy_cluster_name=~"httproute/joel-test/backend-eg/.*"}[1m]
))
Name: success-rate
Phase: Failed
Phase: Failed
Run Summary:
Count: 1
Failed: 1
Started At: 2025-12-08T04:16:34Z
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning MetricFailed 14s rollouts-controller Metric 'success-rate' Completed. Result: Failed
Warning AnalysisRunFailed 14s rollouts-controller Analysis Completed. Result: Failed
'kubernetes' 카테고리의 다른 글
| Envoy Gateway (3) | 2025.07.11 |
|---|---|
| OPA Gatekeeper 와 Kyverno (0) | 2025.04.29 |
| Istio ambient mode (0) | 2025.01.04 |
| Istio CNI 플러그인과 Pod Security Admission (1) | 2024.06.15 |
| ksniff 로 kubernetes 컨테이너 패킷 캡쳐 (1) | 2024.04.06 |