Kubernetes实战:阿里云OSS对象存储的CSI插件集成与优化

张开发
2026/7/1 9:08:19 15 分钟阅读
Kubernetes实战:阿里云OSS对象存储的CSI插件集成与优化
1. 为什么要在Kubernetes中使用阿里云OSS存储在云原生应用开发中持久化存储一直是个头疼的问题。我刚开始接触Kubernetes时最困惑的就是如何处理有状态应用的数据存储。传统方式用本地磁盘明显不靠谱节点挂了数据就丢了用NFS又担心性能瓶颈。直到发现了阿里云OSS对象存储这个问题才迎刃而解。阿里云OSS有几个特别适合Kubernetes场景的优势近乎无限的扩展性不像传统存储需要提前规划容量OSS能自动扩容高可靠性数据默认3副本存储可靠性达到99.999999999%成本优势相比块存储对象存储的价格要低得多跨区域访问应用可以跨可用区甚至跨地域访问同一份数据不过直接使用OSS也有痛点。比如每次都要写SDK代码来访问没法像本地文件系统那样直接操作。这就是CSI插件的价值所在——它把OSS变成了Kubernetes里的普通存储卷应用可以像读写本地文件一样使用OSS。2. 环境准备与基础组件安装2.1 集群与账号准备在开始之前你需要确保一个正常运行的Kubernetes集群我用的是1.18版本阿里云账号开通OSS服务创建好目标Bucket建议选择与集群相同地域这里有个小技巧如果集群节点和OSS在同一个地域可以使用内网Endpoint这样既省流量费又提升速度。我在上海区的集群就用了oss-cn-shanghai-internal.aliyuncs.com这个内网地址。2.2 安装ossfs工具虽然CSI插件能帮我们挂载OSS但底层还是依赖ossfs这个FUSE工具。安装方法很简单# CentOS/RedHat wget http://gosspublic.alicdn.com/ossfs/ossfs_1.80.6_centos7.0_x86_64.rpm sudo yum install ossfs_1.80.6_centos7.0_x86_64.rpm # Ubuntu/Debian wget http://gosspublic.alicdn.com/ossfs/ossfs_1.80.6_ubuntu16.04_amd64.deb sudo apt-get install ./ossfs_1.80.6_ubuntu16.04_amd64.deb安装完成后建议做个基本测试echo my-bucket:my-access-key-id:my-access-key-secret /etc/passwd-ossfs chmod 640 /etc/passwd-ossfs ossfs my-bucket /mnt/oss -o urloss-cn-shanghai.aliyuncs.com如果能在/mnt/oss看到Bucket内容说明基础环境就绪了。3. CSI插件部署详解3.1 RBAC权限配置CSI插件需要一系列Kubernetes权限来管理存储资源。下面这个rbac.yaml是我在生产环境验证过的配置apiVersion: v1 kind: ServiceAccount metadata: name: csi-admin namespace: kube-system --- kind: ClusterRole apiVersion: rbac.authorization.k8s.io/v1 metadata: name: csi-oss-role rules: - apiGroups: [] resources: [persistentvolumes] verbs: [get, list, watch, update, create, delete] - apiGroups: [] resources: [persistentvolumeclaims] verbs: [get, list, watch, update] # 省略其他规则... --- kind: ClusterRoleBinding apiVersion: rbac.authorization.k8s.io/v1 metadata: name: csi-oss-binding subjects: - kind: ServiceAccount name: csi-admin namespace: kube-system roleRef: kind: ClusterRole name: csi-oss-role apiGroup: rbac.authorization.k8s.io应用配置kubectl apply -f rbac.yaml3.2 CSI插件DaemonSet部署CSI插件需要以DaemonSet形式运行在每个节点上。这是我的oss-plugin.yaml优化版本apiVersion: apps/v1 kind: DaemonSet metadata: name: csi-ossplugin namespace: kube-system spec: selector: matchLabels: app: csi-ossplugin template: metadata: labels: app: csi-ossplugin spec: tolerations: - operator: Exists priorityClassName: system-node-critical serviceAccount: csi-admin hostNetwork: true containers: - name: driver-registrar image: registry.cn-hangzhou.aliyuncs.com/acs/csi-node-driver-registrar:v1.2.0 args: - --v5 - --csi-address$(ADDRESS) - --kubelet-registration-path$(DRIVER_REG_SOCK_PATH) env: - name: ADDRESS value: /var/lib/kubelet/plugins/ossplugin.csi.alibabacloud.com/csi.sock - name: DRIVER_REG_SOCK_PATH value: /var/lib/kubelet/plugins/ossplugin.csi.alibabacloud.com/csi.sock volumeMounts: - name: plugin-dir mountPath: /var/lib/kubelet/plugins - name: registration-dir mountPath: /registration - name: oss-plugin image: registry.cn-hangzhou.aliyuncs.com/acs/csi-plugin:v1.16.9 args: - --endpoint$(CSI_ENDPOINT) - --v5 - --driverossplugin.csi.alibabacloud.com env: - name: CSI_ENDPOINT value: unix://var/lib/kubelet/plugins/ossplugin.csi.alibabacloud.com/csi.sock securityContext: privileged: true volumeMounts: - name: plugin-dir mountPath: /var/lib/kubelet/plugins mountPropagation: Bidirectional - name: etc mountPath: /host/etc volumes: - name: plugin-dir hostPath: path: /var/lib/kubelet/plugins type: DirectoryOrCreate - name: registration-dir hostPath: path: /var/lib/kubelet/plugins_registry type: Directory - name: etc hostPath: path: /etc关键改进点使用了更新的镜像版本v1.16.9增加了mountPropagation配置确保挂载传播正确优化了容器启动参数部署命令kubectl apply -f oss-plugin.yaml4. 存储声明与使用实战4.1 创建PersistentVolumePV是集群级别的存储资源这是我的生产环境配置模板apiVersion: v1 kind: PersistentVolume metadata: name: oss-pv spec: capacity: storage: 100Gi # 这个值只作显示用实际不限制OSS容量 accessModes: - ReadWriteMany persistentVolumeReclaimPolicy: Retain csi: driver: ossplugin.csi.alibabacloud.com volumeHandle: oss-pv volumeAttributes: bucket: my-prod-bucket url: oss-cn-shanghai-internal.aliyuncs.com akId: ${ACCESS_KEY_ID} akSecret: ${ACCESS_KEY_SECRET} otherOpts: -o max_stat_cache_size0 -o allow_other安全提示千万不要直接把AK写在yaml里应该用Secretkubectl create secret generic oss-secret \ --from-literalakIdxxx \ --from-literalakSecretxxx然后在PV配置中引用volumeAttributes: akIdSecret: name: oss-secret key: akId akSecretSecret: name: oss-secret key: akSecret4.2 创建PersistentVolumeClaimPVC是应用对存储的请求匹配PV后就能使用apiVersion: v1 kind: PersistentVolumeClaim metadata: name: oss-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 100Gi selector: matchLabels: alicloud-pvname: oss-pv4.3 在Deployment中使用PVC最后在应用部署中挂载apiVersion: apps/v1 kind: Deployment metadata: name: web-app spec: template: spec: containers: - name: app volumeMounts: - name: oss-storage mountPath: /data volumes: - name: oss-storage persistentVolumeClaim: claimName: oss-pvc5. 性能调优与问题排查5.1 常用调优参数在PV的otherOpts中可以添加这些参数提升性能-o max_stat_cache_size100000增加元数据缓存-o enable_noobj_cache缓存不存在的文件状态-o multipart_size10调整分片大小(MB)我测试过的优化组合otherOpts: -o max_stat_cache_size100000 -o enable_noobj_cache -o multipart_size20 -o allow_other5.2 常见问题解决问题1Pod启动时报transport endpoint not connected解决方法检查ossfs进程是否正常运行节点上执行ps aux | grep ossfs问题2文件操作延迟高优化方案确保使用内网Endpoint增加缓存参数问题3CSI插件Pod CrashLoopBackOff排查步骤kubectl logs -n kube-system pod-name -c oss-plugin检查/var/log/messages中的FUSE错误6. 生产环境最佳实践经过多个项目的实战我总结了这些经验Bucket规划按业务维度划分Bucket不要所有应用共用一个权限控制为每个应用创建单独的RAM子账号限制最小权限监控告警配置OSS的云监控关注请求次数和流量指标备份策略虽然OSS很可靠但重要数据还是要开启版本控制冷热分离对低频访问数据启用低频访问存储类型降低成本一个典型的目录结构建议/my-bucket /app1-production /uploads /static /app2-staging /logs /backups

更多文章