概述
本文档提供了Envoy的实际使用示例和生产环境的最佳实践,帮助开发者和运维人员更好地理解和应用Envoy代理。
基础配置示例
1. 简单的HTTP代理
# envoy-simple-proxy.yaml
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
codec_type: AUTO
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["*"]
routes:
- match:
prefix: "/"
route:
cluster: service_backend
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
clusters:
- name: service_backend
connect_timeout: 0.25s
type: LOGICAL_DNS
dns_lookup_family: V4_ONLY
load_assignment:
cluster_name: service_backend
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: httpbin.org
port_value: 80
admin:
address:
socket_address:
address: 127.0.0.1
port_value: 9901
功能说明:
- 在端口10000上监听HTTP请求
- 将所有请求转发到httpbin.org
- 提供管理接口在端口9901
启动命令:
envoy -c envoy-simple-proxy.yaml
2. HTTPS终止和重写
# envoy-tls-termination.yaml
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 10000
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
access_log:
- name: envoy.access_loggers.stdout
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
http_filters:
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
route_config:
name: local_route
virtual_hosts:
- name: local_service
domains: ["example.com"]
routes:
- match:
prefix: "/api/v1/"
route:
prefix_rewrite: "/v1/"
cluster: api_service
- match:
prefix: "/static/"
route:
cluster: static_service
- match:
prefix: "/"
route:
cluster: web_service
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
tls_certificates:
- certificate_chain:
filename: "/etc/ssl/certs/example.com.crt"
private_key:
filename: "/etc/ssl/private/example.com.key"
clusters:
- name: api_service
connect_timeout: 0.25s
type: STRICT_DNS
load_assignment:
cluster_name: api_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: api.internal
port_value: 8080
- name: static_service
connect_timeout: 0.25s
type: STRICT_DNS
load_assignment:
cluster_name: static_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: cdn.internal
port_value: 8080
- name: web_service
connect_timeout: 0.25s
type: STRICT_DNS
load_assignment:
cluster_name: web_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: web.internal
port_value: 8080
admin:
address:
socket_address:
address: 127.0.0.1
port_value: 9901
功能说明:
- TLS/SSL终止
- 基于路径的路由分发
- URL重写功能
- 访问日志记录
高级配置示例
3. 微服务网关配置
# envoy-microservices-gateway.yaml
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 8080
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
stat_prefix: ingress_http
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: "/var/log/envoy/access.log"
format: |
[%START_TIME%] "%REQ(:METHOD)% %REQ(X-ENVOY-ORIGINAL-PATH?:PATH)% %PROTOCOL%"
%RESPONSE_CODE% %RESPONSE_FLAGS% %BYTES_RECEIVED% %BYTES_SENT%
%DURATION% %RESP(X-ENVOY-UPSTREAM-SERVICE-TIME)% "%REQ(X-FORWARDED-FOR)%"
"%REQ(USER-AGENT)%" "%REQ(X-REQUEST-ID)%" "%REQ(:AUTHORITY)%" "%UPSTREAM_HOST%"
http_filters:
# CORS过滤器
- name: envoy.filters.http.cors
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.cors.v3.Cors
# JWT认证过滤器
- name: envoy.filters.http.jwt_authn
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.jwt_authn.v3.JwtAuthentication
providers:
jwt_provider:
issuer: "https://auth.example.com"
audiences: ["api.example.com"]
remote_jwks:
http_uri:
uri: "https://auth.example.com/.well-known/jwks.json"
cluster: auth_service
timeout: 5s
cache_duration: 300s
rules:
- match:
prefix: "/api/"
requires:
provider_name: "jwt_provider"
# 限流过滤器
- name: envoy.filters.http.local_ratelimit
typed_config:
"@type": type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.local_ratelimit.v3.LocalRateLimit
value:
stat_prefix: http_local_rate_limiter
token_bucket:
max_tokens: 10000
tokens_per_fill: 1000
fill_interval: 1s
filter_enabled:
runtime_key: local_rate_limit_enabled
default_value:
numerator: 100
denominator: HUNDRED
filter_enforced:
runtime_key: local_rate_limit_enforced
default_value:
numerator: 100
denominator: HUNDRED
# 路由过滤器
- name: envoy.filters.http.router
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
route_config:
name: local_route
virtual_hosts:
- name: api_gateway
domains: ["api.example.com", "localhost:8080"]
routes:
# 用户服务路由
- match:
prefix: "/api/users"
route:
cluster: user_service
timeout: 30s
retry_policy:
retry_on: "5xx,gateway-error,connect-failure,refused-stream"
num_retries: 3
per_try_timeout: 10s
# 订单服务路由
- match:
prefix: "/api/orders"
route:
cluster: order_service
timeout: 30s
retry_policy:
retry_on: "5xx,gateway-error,connect-failure,refused-stream"
num_retries: 3
# 支付服务路由
- match:
prefix: "/api/payments"
route:
cluster: payment_service
timeout: 60s
# 健康检查路由
- match:
path: "/health"
route:
cluster: health_check
cors:
allow_origin_string_match:
- prefix: "https://"
- exact: "http://localhost:3000"
allow_methods: "GET, POST, PUT, DELETE, OPTIONS"
allow_headers: "authorization,content-type,x-requested-with"
max_age: "86400"
clusters:
# 用户服务集群
- name: user_service
connect_timeout: 0.25s
type: LOGICAL_DNS
dns_lookup_family: V4_ONLY
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: user_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: user-service
port_value: 8080
- endpoint:
address:
socket_address:
address: user-service-2
port_value: 8080
health_checks:
- timeout: 1s
interval: 5s
unhealthy_threshold: 3
healthy_threshold: 2
http_health_check:
path: "/health"
expected_statuses:
- start: 200
end: 300
# 订单服务集群
- name: order_service
connect_timeout: 0.25s
type: LOGICAL_DNS
dns_lookup_family: V4_ONLY
lb_policy: LEAST_REQUEST
load_assignment:
cluster_name: order_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: order-service
port_value: 8080
# 支付服务集群
- name: payment_service
connect_timeout: 0.25s
type: LOGICAL_DNS
dns_lookup_family: V4_ONLY
lb_policy: ROUND_ROBIN
load_assignment:
cluster_name: payment_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: payment-service
port_value: 8080
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 100
max_pending_requests: 100
max_requests: 200
max_retries: 3
# 认证服务集群
- name: auth_service
connect_timeout: 5s
type: LOGICAL_DNS
dns_lookup_family: V4_ONLY
load_assignment:
cluster_name: auth_service
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: auth.example.com
port_value: 443
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.UpstreamTlsContext
sni: auth.example.com
# 健康检查集群
- name: health_check
connect_timeout: 0.25s
type: STATIC
load_assignment:
cluster_name: health_check
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: 127.0.0.1
port_value: 8080
admin:
address:
socket_address:
address: 127.0.0.1
port_value: 9901
功能说明:
- JWT认证保护API端点
- 基于路径的微服务路由
- 限流保护
- 健康检查和重试策略
- CORS支持
- 熔断器配置
4. 动态配置(xDS)示例
控制平面配置
# envoy-xds-bootstrap.yaml
node:
id: "envoy-node-1"
cluster: "production"
metadata:
version: "1.0"
dynamic_resources:
lds_config:
resource_api_version: V3
api_config_source:
api_type: GRPC
transport_api_version: V3
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
set_node_on_first_message_only: true
cds_config:
resource_api_version: V3
api_config_source:
api_type: GRPC
transport_api_version: V3
grpc_services:
- envoy_grpc:
cluster_name: xds_cluster
set_node_on_first_message_only: true
static_resources:
clusters:
- name: xds_cluster
connect_timeout: 0.25s
type: LOGICAL_DNS
http2_protocol_options: {}
dns_lookup_family: V4_ONLY
load_assignment:
cluster_name: xds_cluster
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: control-plane
port_value: 18000
admin:
address:
socket_address:
address: 127.0.0.1
port_value: 9901
Python控制平面示例
#!/usr/bin/env python3
"""
简单的xDS控制平面实现示例
"""
import grpc
from concurrent import futures
import time
import threading
import logging
# Envoy API imports
from envoy.service.discovery.v3 import discovery_service_pb2_grpc
from envoy.service.discovery.v3 import discovery_service_pb2
from envoy.config.listener.v3 import listener_pb2
from envoy.config.cluster.v3 import cluster_pb2
from envoy.config.core.v3 import config_core_pb2
from envoy.config.core.v3 import address_pb2
from envoy.extensions.filters.network.http_connection_manager.v3 import http_connection_manager_pb2
from envoy.extensions.filters.http.router.v3 import router_pb2
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
class ControlPlaneServer:
"""xDS控制平面服务器实现"""
def __init__(self):
self.version = "1"
self.listeners_cache = {}
self.clusters_cache = {}
self.nonce = 0
self._build_initial_config()
def _build_initial_config(self):
"""构建初始配置"""
# 创建监听器配置
listener = listener_pb2.Listener()
listener.name = "listener_0"
listener.address.socket_address.address = "0.0.0.0"
listener.address.socket_address.port_value = 10000
# HTTP连接管理器配置
hcm = http_connection_manager_pb2.HttpConnectionManager()
hcm.stat_prefix = "ingress_http"
hcm.codec_type = http_connection_manager_pb2.HttpConnectionManager.AUTO
# 路由配置
route_config = hcm.route_config
route_config.name = "local_route"
vhost = route_config.virtual_hosts.add()
vhost.name = "service"
vhost.domains[:] = ["*"]
route = vhost.routes.add()
route.match.prefix = "/"
route.route.cluster = "service_backend"
# HTTP过滤器
router_filter = hcm.http_filters.add()
router_filter.name = "envoy.filters.http.router"
router_filter.typed_config.Pack(router_pb2.Router())
# 添加过滤器链
filter_chain = listener.filter_chains.add()
filter_config = filter_chain.filters.add()
filter_config.name = "envoy.filters.network.http_connection_manager"
filter_config.typed_config.Pack(hcm)
self.listeners_cache[listener.name] = listener
# 创建集群配置
cluster = cluster_pb2.Cluster()
cluster.name = "service_backend"
cluster.type = cluster_pb2.Cluster.LOGICAL_DNS
cluster.connect_timeout.seconds = 0
cluster.connect_timeout.nanos = 250000000
cluster.dns_lookup_family = cluster_pb2.Cluster.V4_ONLY
# 负载均衡配置
endpoint = cluster.load_assignment.endpoints.add()
lb_endpoint = endpoint.lb_endpoints.add()
lb_endpoint.endpoint.address.socket_address.address = "httpbin.org"
lb_endpoint.endpoint.address.socket_address.port_value = 80
cluster.load_assignment.cluster_name = cluster.name
self.clusters_cache[cluster.name] = cluster
def get_next_nonce(self):
"""获取下一个nonce"""
self.nonce += 1
return str(self.nonce)
class AggregatedDiscoveryService(discovery_service_pb2_grpc.AggregatedDiscoveryServiceServicer):
"""聚合发现服务实现"""
def __init__(self, control_plane):
self.control_plane = control_plane
def StreamAggregatedResources(self, request_iterator, context):
"""流式聚合资源发现"""
logger.info("新的ADS连接建立")
# 发送初始配置
for request in request_iterator:
logger.info(f"收到请求: {request.type_url}")
if request.type_url == "type.googleapis.com/envoy.config.listener.v3.Listener":
# 发送监听器配置
response = discovery_service_pb2.DiscoveryResponse()
response.version_info = self.control_plane.version
response.type_url = request.type_url
response.nonce = self.control_plane.get_next_nonce()
for listener in self.control_plane.listeners_cache.values():
resource = response.resources.add()
resource.Pack(listener)
logger.info(f"发送监听器配置,版本: {response.version_info}")
yield response
elif request.type_url == "type.googleapis.com/envoy.config.cluster.v3.Cluster":
# 发送集群配置
response = discovery_service_pb2.DiscoveryResponse()
response.version_info = self.control_plane.version
response.type_url = request.type_url
response.nonce = self.control_plane.get_next_nonce()
for cluster in self.control_plane.clusters_cache.values():
resource = response.resources.add()
resource.Pack(cluster)
logger.info(f"发送集群配置,版本: {response.version_info}")
yield response
def main():
"""主函数"""
# 创建控制平面
control_plane = ControlPlaneServer()
# 创建gRPC服务器
server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
# 注册服务
ads_service = AggregatedDiscoveryService(control_plane)
discovery_service_pb2_grpc.add_AggregatedDiscoveryServiceServicer_to_server(
ads_service, server
)
# 监听端口
listen_addr = '[::]:18000'
server.add_insecure_port(listen_addr)
logger.info(f"控制平面服务器启动,监听地址: {listen_addr}")
server.start()
try:
while True:
time.sleep(86400)
except KeyboardInterrupt:
logger.info("正在关闭服务器...")
server.stop(0)
if __name__ == "__main__":
main()
功能说明:
- 动态监听器发现(LDS)
- 动态集群发现(CDS)
- gRPC流式API
- 版本控制和增量更新
Docker化部署
5. Docker Compose配置
# docker-compose.yml
version: '3.8'
services:
envoy:
image: envoyproxy/envoy:v1.28-latest
ports:
- "8080:8080"
- "9901:9901"
volumes:
- ./envoy.yaml:/etc/envoy/envoy.yaml:ro
- ./certs:/etc/ssl/certs:ro
command: ["envoy", "-c", "/etc/envoy/envoy.yaml", "--log-level", "info"]
depends_on:
- web-service
- api-service
web-service:
image: nginx:alpine
ports:
- "8081:80"
volumes:
- ./web:/usr/share/nginx/html:ro
api-service:
image: httpbin/httpbin
ports:
- "8082:80"
control-plane:
build:
context: .
dockerfile: Dockerfile.control-plane
ports:
- "18000:18000"
environment:
- LOG_LEVEL=info
Dockerfile示例
# Dockerfile.control-plane
FROM python:3.9-slim
WORKDIR /app
# 安装依赖
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
# 复制源码
COPY control_plane.py .
COPY protos/ ./protos/
# 暴露端口
EXPOSE 18000
# 启动命令
CMD ["python", "control_plane.py"]
最佳实践
1. 性能优化
工作线程配置
# 基于CPU核数配置工作线程
static_resources:
listeners:
- name: listener_0
address:
socket_address:
address: 0.0.0.0
port_value: 8080
# 启用端口重用以提高性能
reuse_port: true
# 设置连接平衡配置
connection_balance_config:
exact_balance: {}
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
"@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
# 设置连接缓冲区限制
per_connection_buffer_limit_bytes: 32768
# 启用HTTP/2
http2_protocol_options:
max_concurrent_streams: 100
# 配置访问日志
access_log:
- name: envoy.access_loggers.file
typed_config:
"@type": type.googleapis.com/envoy.extensions.access_loggers.file.v3.FileAccessLog
path: "/var/log/envoy/access.log"
连接池优化
# 集群连接池配置
clusters:
- name: service_backend
connect_timeout: 0.25s
type: LOGICAL_DNS
# 连接池设置
circuit_breakers:
thresholds:
- priority: DEFAULT
max_connections: 1024
max_pending_requests: 1024
max_requests: 1024
max_retries: 3
- priority: HIGH
max_connections: 2048
max_pending_requests: 2048
max_requests: 2048
max_retries: 5
# HTTP连接池配置
typed_extension_protocol_options:
envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
"@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
explicit_http_config:
http2_protocol_options:
max_concurrent_streams: 100
initial_stream_window_size: 65536
initial_connection_window_size: 1048576
2. 安全配置
TLS最佳实践
# HTTPS监听器配置
listeners:
- name: https_listener
address:
socket_address:
address: 0.0.0.0
port_value: 443
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
# ... HTTP连接管理器配置 ...
transport_socket:
name: envoy.transport_sockets.tls
typed_config:
"@type": type.googleapis.com/envoy.extensions.transport_sockets.tls.v3.DownstreamTlsContext
common_tls_context:
# TLS证书配置
tls_certificates:
- certificate_chain:
filename: "/etc/ssl/certs/server.crt"
private_key:
filename: "/etc/ssl/private/server.key"
# TLS参数
tls_params:
tls_minimum_protocol_version: TLSv1_2
tls_maximum_protocol_version: TLSv1_3
cipher_suites:
- "ECDHE-ECDSA-AES256-GCM-SHA384"
- "ECDHE-RSA-AES256-GCM-SHA384"
- "ECDHE-ECDSA-CHACHA20-POLY1305"
- "ECDHE-RSA-CHACHA20-POLY1305"
- "ECDHE-ECDSA-AES128-GCM-SHA256"
- "ECDHE-RSA-AES128-GCM-SHA256"
# ALPN协议协商
alpn_protocols:
- "h2"
- "http/1.1"
安全头配置
# 安全HTTP头过滤器
http_filters:
- name: envoy.filters.http.local_response
typed_config:
"@type": type.googleapis.com/udpa.type.v1.TypedStruct
type_url: type.googleapis.com/envoy.extensions.filters.http.local_response.v3.LocalResponse
value:
mappers:
- filter:
status_code_filter:
comparison:
op: EQ
value:
default_value: 200
runtime_key: response_code
headers_to_add:
- header:
key: "X-Content-Type-Options"
value: "nosniff"
- header:
key: "X-Frame-Options"
value: "DENY"
- header:
key: "X-XSS-Protection"
value: "1; mode=block"
- header:
key: "Strict-Transport-Security"
value: "max-age=63072000; includeSubDomains; preload"
- header:
key: "Referrer-Policy"
value: "strict-origin-when-cross-origin"
3. 可观测性配置
统计指标配置
stats_config:
# 统计标签配置
stats_tags:
- tag_name: "method"
regex: "^http\\.(.+?)\\.downstream_rq_(.+?)$"
fixed_value: "\\1"
- tag_name: "status"
regex: "^http\\.(.+?)\\.downstream_rq_(\\d{3})$"
fixed_value: "\\2"
# 直方图桶配置
histogram_bucket_settings:
- match:
name: "http.downstream_rq_time"
buckets: [0.5, 1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000]
- match:
name: "http.upstream_rq_time"
buckets: [0.5, 1, 5, 10, 25, 50, 100, 250, 500, 1000, 2500, 5000, 10000]
# Prometheus统计接收器
stats_sinks:
- name: envoy.stat_sinks.statsd
typed_config:
"@type": type.googleapis.com/envoy.config.metrics.v3.StatsdSink
address:
socket_address:
address: 127.0.0.1
port_value: 8125
prefix: "envoy"
OpenTelemetry追踪配置
tracing:
http:
name: envoy.tracers.opentelemetry
typed_config:
"@type": type.googleapis.com/envoy.config.trace.v3.OpenTelemetryConfig
grpc_service:
envoy_grpc:
cluster_name: jaeger
timeout: 0.25s
service_name: "envoy-gateway"
# 追踪集群
clusters:
- name: jaeger
type: LOGICAL_DNS
dns_lookup_family: V4_ONLY
load_assignment:
cluster_name: jaeger
endpoints:
- lb_endpoints:
- endpoint:
address:
socket_address:
address: jaeger-collector
port_value: 14250
4. 容器化部署最佳实践
Kubernetes配置
# envoy-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: envoy-config
namespace: default
data:
envoy.yaml: |
# Envoy配置内容...
---
# envoy-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: envoy-gateway
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: envoy-gateway
template:
metadata:
labels:
app: envoy-gateway
spec:
containers:
- name: envoy
image: envoyproxy/envoy:v1.28-latest
ports:
- containerPort: 8080
name: http
- containerPort: 9901
name: admin
volumeMounts:
- name: config-volume
mountPath: /etc/envoy
readOnly: true
# 资源限制
resources:
requests:
cpu: "100m"
memory: "128Mi"
limits:
cpu: "500m"
memory: "512Mi"
# 健康检查
livenessProbe:
httpGet:
path: /ready
port: 9901
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 9901
initialDelaySeconds: 5
periodSeconds: 5
# 优雅关闭
lifecycle:
preStop:
exec:
command:
- /bin/sh
- -c
- "wget -qO- http://localhost:9901/healthcheck/fail && sleep 15"
volumes:
- name: config-volume
configMap:
name: envoy-config
---
# envoy-service.yaml
apiVersion: v1
kind: Service
metadata:
name: envoy-gateway-service
namespace: default
spec:
type: LoadBalancer
ports:
- port: 80
targetPort: 8080
protocol: TCP
name: http
- port: 9901
targetPort: 9901
protocol: TCP
name: admin
selector:
app: envoy-gateway
5. 监控和告警
Prometheus监控配置
# prometheus.yml
global:
scrape_interval: 15s
scrape_configs:
- job_name: 'envoy'
static_configs:
- targets: ['envoy:9901']
metrics_path: '/stats/prometheus'
scrape_interval: 5s
- job_name: 'envoy-admin'
static_configs:
- targets: ['envoy:9901']
metrics_path: '/server_info'
scrape_interval: 30s
Grafana仪表板示例
{
"dashboard": {
"title": "Envoy Proxy Dashboard",
"panels": [
{
"title": "Request Rate",
"type": "graph",
"targets": [
{
"expr": "rate(envoy_http_downstream_rq_total[1m])",
"legendFormat": "{{method}} {{status}}"
}
]
},
{
"title": "Response Time",
"type": "graph",
"targets": [
{
"expr": "histogram_quantile(0.95, rate(envoy_http_downstream_rq_time_bucket[1m]))",
"legendFormat": "95th percentile"
},
{
"expr": "histogram_quantile(0.50, rate(envoy_http_downstream_rq_time_bucket[1m]))",
"legendFormat": "50th percentile"
}
]
},
{
"title": "Upstream Health",
"type": "stat",
"targets": [
{
"expr": "envoy_cluster_upstream_healthy",
"legendFormat": "{{cluster_name}}"
}
]
}
]
}
}
故障排查指南
1. 常见问题和解决方案
配置验证
# 验证Envoy配置
envoy --mode validate -c envoy.yaml
# 检查配置语法
envoy -c envoy.yaml --log-level debug --log-format '[%Y-%m-%d %T.%e][%t][%l][%n] %v'
日志分析
# 启用详细日志
envoy -c envoy.yaml --log-level trace --component-log-level upstream:debug,connection:debug
# 实时查看访问日志
tail -f /var/log/envoy/access.log | jq '.'
# 分析错误模式
grep -E "(5\d\d|ERROR|WARN)" /var/log/envoy/envoy.log | tail -20
管理接口调试
# 检查集群状态
curl http://localhost:9901/clusters
# 查看监听器状态
curl http://localhost:9901/listeners
# 检查统计信息
curl http://localhost:9901/stats | grep http
# 查看配置转储
curl http://localhost:9901/config_dump
# 检查证书信息
curl http://localhost:9901/certs
2. 性能调优指南
系统级优化
# 增加文件描述符限制
ulimit -n 65536
# 调整TCP参数
echo 'net.core.somaxconn = 8192' >> /etc/sysctl.conf
echo 'net.ipv4.tcp_max_syn_backlog = 8192' >> /etc/sysctl.conf
sysctl -p
Envoy配置优化
# 性能优化配置
static_resources:
listeners:
- name: optimized_listener
# 启用端口重用
reuse_port: true
# 设置TCP backlog
tcp_backlog_size: 8192
# 优化连接平衡
connection_balance_config:
exact_balance: {}
filter_chains:
- filters:
- name: envoy.filters.network.http_connection_manager
typed_config:
# 减少缓冲区
per_connection_buffer_limit_bytes: 1048576
# 优化HTTP/2设置
http2_protocol_options:
max_concurrent_streams: 1000
initial_stream_window_size: 268435456
initial_connection_window_size: 268435456
# 禁用不必要的功能
normalize_path: false
strip_any_host_port: false
总结
本文档提供了Envoy的实用配置示例和最佳实践,涵盖了:
- 基础到高级的配置示例: 从简单代理到微服务网关
- 动态配置: xDS控制平面实现
- 容器化部署: Docker和Kubernetes配置
- 安全配置: TLS、认证、授权最佳实践
- 可观测性: 监控、日志、追踪配置
- 性能优化: 系统和应用级别的调优
- 故障排查: 常见问题的诊断和解决方案
这些实践经验可以帮助您在生产环境中更好地部署和运维Envoy代理。