add help url

2025-01-30 14:09:48 +08:00 · 2022-05-24 11:53:31 +08:00 · 2022-05-24 11:53:31 +08:00 · cf447feb24
commit cf447feb24
parent 4ab9308235
5 changed files with 279 additions and 103 deletions
--- a/docs/example/pic/README.md
+++ b/docs/example/pic/README.md
@ -1,55 +0,0 @@
-架构图
-
-![image](https://user-images.githubusercontent.com/20157705/167534673-322f4784-e240-451e-875e-ada57f121418.png)
-
-
-多集群管理
-
-![image](https://user-images.githubusercontent.com/20157705/167534695-d63b8239-e85e-42c4-bc7b-5999b9eff882.png)
-
-
-分布式存储
-
-![image](https://user-images.githubusercontent.com/20157705/167534724-733ad796-745e-47e1-9224-9e749f918cf2.png)
-
-
-在线debug
-
-![image](https://user-images.githubusercontent.com/20157705/167534731-8d19cab9-1420-46cf-8a1d-a4c68823c63d.png)
-
-
-pipeline编排
-
-![image](https://user-images.githubusercontent.com/20157705/167534748-9adf82ae-fd08-46f1-9ba6-a60b55bb8d3b.png)
-
-
-job模板
-
-![image](https://user-images.githubusercontent.com/20157705/167534770-505ffce8-8172-49be-9506-b265cd6ed465.png)
-
-
-nni超参搜索
-
-![image](https://user-images.githubusercontent.com/20157705/167534784-255f101a-3273-4eea-9254-f2df6879ddf1.png)
-
-
-分布式框架
-
-![image](https://user-images.githubusercontent.com/20157705/167534807-ca9a847f-45dc-4acb-a124-099e5915d81f.png)
-
-
-推理服务
-
-![image](https://user-images.githubusercontent.com/20157705/167534820-9202851a-a97c-41f7-8d63-900d73e4c57e.png)
-
-
-实时大模型训练
-
-![image](https://user-images.githubusercontent.com/20157705/167534836-418855cf-daef-45a5-85c9-3bb1b7135f4f.png)
-
-
-界面效果
-
-![image](https://user-images.githubusercontent.com/20157705/167534850-e7f40f1e-058d-4370-be01-8bbcaf80c3e0.png)
-
-
--- a/docs/example/readme.md
+++ b/docs/example/readme.md
@ -79,7 +79,8 @@ __notebook__：开启一个jupyter-notebook，自动挂载个人工作目录。

 ### jupyter示例：

-![image](https://user-images.githubusercontent.com/20157705/167538488-cba41bf6-ba66-4150-b17e-f31f5cc5013d.png)
+<img width="70%" alt="167874734-5b1629e0-c3bb-41b0-871d-ffa43d914066" src="https://user-images.githubusercontent.com/20157705/167538488-cba41bf6-ba66-4150-b17e-f31f5cc5013d.png">
+

 ### vscode示例：

@ -100,37 +101,15 @@ __notebook__：开启一个jupyter-notebook，自动挂载个人工作目录。

 ![image](https://user-images.githubusercontent.com/20157705/167538625-39c19c33-a63d-44fa-a16a-2aaa7b480190.png)

-### 常用基础镜像
-
-#### ubuntu
-
-cuda10.2-cudnn7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.2-cudnn7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.2-cudnn7-python3.7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.2-cudnn7-python3.8
-
-cuda10.1-cudnn7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.1-cudnn7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.1-cudnn7-python3.6
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.1-cudnn7-python3.7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.1-cudnn7-python3.8
-
-cuda10.0-cudnn7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.0-cudnn7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.0-cudnn7-python3.6
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.0-cudnn7-python3.7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.0-cudnn7-python3.8
-
-cuda9.1-cudnn7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda9.1-cudnn7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda9.1-cudnn7-python3.6
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda9.1-cudnn7-python3.7
- ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda9.1-cudnn7-python3.8
-
-
-cuda10.1-cuda10.0-cuda9.0-cudnn7.6
- ai.tencentmusic.com/tme-public/gpu:ubuntu18.04-python3.6-cuda10.1-cuda10.0-cuda9.0-cudnn7.6-base
-
+扩展字段高级配置：
+```bash
+{
+  "volume_mount":"kubeflow-user-workspace(pvc):/mnt,kubeflow-archives(pvc):/archives",
+  "resource_memory":"8G",
+  "resource_cpu": "4"
+}
+```
+[基础镜像和封装方法参考](https://github.com/tencentmusic/cube-studio/tree/master/images)

 # 配置/调试/定时运行pipeline

@ -187,7 +166,8 @@ pod效果：

 配置定时：pipeline编辑界面

-![image](https://user-images.githubusercontent.com/20157705/167538811-3644c420-5b00-4c13-af75-c672aef899b2.png)
+<img width="50%" alt="167874734-5b1629e0-c3bb-41b0-871d-ffa43d914066" src="https://user-images.githubusercontent.com/20157705/167538811-3644c420-5b00-4c13-af75-c672aef899b2.png">
+

 查看路径：训练-定时调度记录

@ -201,3 +181,257 @@ pod效果：
 	1、平台会根据pipeline的配置决定是否发起调度。
 	2、状态链接中可以看到本地调度发起的workflow的运行情况
 	3、日志链接中可以看到本地调度发起的日志
+
+# nni超参搜索
+
+可以参考[nni官网](https://github.com/microsoft/nni)的书写方式
+
+## 超参空间
+必须是标准的json。示例
+```
+{
+    "batch_size": {"_type":"choice", "_value": [16, 32, 64, 128]},
+    "hidden_size":{"_type":"choice","_value":[128, 256, 512, 1024]},
+    "lr":{"_type":"choice","_value":[0.0001, 0.001, 0.01, 0.1]},
+    "momentum":{"_type":"uniform","_value":[0, 1]}
+}
+```
+不同超参算法支持不同的超参空间
+
+|choice |choice(nested) |randint |uniform |quniform |loguniform |qloguniform |normal |qnormal |lognormal |qlognormal |
+| :----- | :----- | :----- | :----- | :----- | :----- | :----- | :----- | :----- | :----- | :----- | 
+|TPE Tuner |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |
+|Random Search Tuner |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |
+|Anneal Tuner |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |
+|Evolution Tuner |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |
+|SMAC Tuner |✓ | |✓ |✓ |✓ |✓ | | | | | |
+|Batch Tuner |✓ | | | | | | | | | | |
+|Grid Search Tuner |✓ | |✓ | |✓ | | | | | | |
+|Hyperband Advisor |✓ | |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |✓ |
+|Metis Tuner |✓ | |✓ |✓ |✓ | | | | | | |
+|GP Tuner |✓ | |✓ |✓ |✓ |✓ |✓ | | | |
+  
+## 代码要求
+
+### 参数接收
+启动超参搜索，会根据用户配置的超参搜索算法，选择好超参的可选值，并将选择值传递给用户的容器。例如上面的超参定义会在用户docker运行时传递下面的参数。所以用户不需要在启动命令或参数中添加这些变量，系统会自动添加，用户只需要在自己的业务代码中接收这些参数，并根据这些参数输出值就可以了。
+
+```
+--lr=0.021593113434583065 --num-layers=5 --optimizer=ftrl
+```
+
+### 结果上报
+业务方容器和代码启动接收超参进行迭代计算，通过主动上报结果来进行迭代。
+示例如下，用户代码需要能接受超参可取值为输入参数，同时每次迭代通过nni.report_intermediate_result上报每次epoch的结果值，并使用nni.report_final_result上报每次实例的结果值。 
+```
+import os
+import argparse
+import logging,random,time
+import nni
+from nni.utils import merge_parameter
+
+logger = logging.getLogger('mnist_AutoML')
+
+def main(args):
+    test_acc=random.randint(30,50)
+    for epoch in range(1, 11):
+        test_acc_epoch= random.randint(3,5)
+        time.sleep(3)
+        test_acc+=test_acc_epoch
+        # 上报当前迭代目标值
+        nni.report_intermediate_result(test_acc)
+    # 上报最总目标值
+    nni.report_final_result(test_acc)
+
+
+def get_params():
+    # 必须接收超参数为输入参数
+    parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
+    parser.add_argument('--batch_size', type=int, default=64, help='input batch size for training (default: 64)')
+
+    args, _ = parser.parse_known_args()
+    return args
+
+
+if __name__ == '__main__':
+    try:
+        # get parameters form tuner
+        tuner_params = nni.get_next_parameter()
+        params = vars(merge_parameter(get_params(), tuner_params))
+        print(tuner_params,params)
+        main(params)
+    except Exception as exception:
+        logger.exception(exception)
+        raise
+```
+
+## web发起一个搜索实验
+
+![image](https://user-images.githubusercontent.com/20157705/169943169-6fb72bdf-0913-4873-92be-6702b11084c7.png)
+
+## web查看搜索效果
+
+可以参考：https://nni.readthedocs.io/zh/stable/Tutorial/WebUI.html
+
+总览界面可以看到实验的id，和当前示例运行的状态
+
+![image](https://user-images.githubusercontent.com/20157705/169943044-65efa03d-6023-4675-978e-e2b10570dc54.png)
+
+![image](https://user-images.githubusercontent.com/20157705/169943083-9eef65fd-dd1f-4a75-8100-c794be9a236b.png)
+
+可以看每次trial的运行情况，计算出来的目标值
+
+
+![image](https://user-images.githubusercontent.com/20157705/169943117-43a19fc7-7598-44d6-82bf-af32ca618d12.png)
+
+也可以看某次trial中每次epoch得到的结果值
+
+# 内部服务
+
+##  普通服务
+
+### 开发注册
+
+1、开发你的服务化镜像，push到docker仓库内
+
+2、注册你的服务
+
+![image](https://user-images.githubusercontent.com/20157705/169932303-0ec981cc-09ca-423c-96f9-da164ed309da.png)
+
+## mysql web服务
+
+镜像：ai.tencentmusic.com/tme-public/phpmyadmin
+
+环境变量：
+```
+PMA_HOST=xx.xx.xx.xx
+PMA_PORT=xx
+PMA_USER=xx
+PMA_PASSWORD=xx
+```
+端口：80
+
+## mongo web服务
+镜像：mongo-express:0.54.0
+
+环境变量：
+```
+ME_CONFIG_MONGODB_SERVER=xx.xx.xx.xx
+ME_CONFIG_MONGODB_PORT=xx
+ME_CONFIG_MONGODB_ENABLE_ADMIN=true
+ME_CONFIG_MONGODB_ADMINUSERNAME=xx
+ME_CONFIG_MONGODB_ADMINPASSWORD=xx
+ME_CONFIG_MONGODB_AUTH_DATABASE=xx
+VCAP_APP_HOST=0.0.0.0
+VCAP_APP_PORT=8081
+ME_CONFIG_OPTIONS_EDITORTHEME=ambiance
+```
+端口：8081
+
+## redis web
+镜像：ai.tencentmusic.com/tme-public/patrikx3:latest
+
+环境变量
+```
+REDIS_NAME=xx
+REDIS_HOST=xx
+REDIS_PORT=xx
+REDIS_PASSWORD=xx
+```
+端口：7843
+
+## 图数据库neo4j
+
+镜像：ai.tencentmusic.com/tme-public/neo4j:4.4
+
+环境变量
+```
+NEO4J_AUTH=neo4j/admin
+```
+端口：7474,7687
+
+## jaeger链路追踪
+
+镜像：jaegertracing/all-in-one:1.29
+
+端口：5775,16686
+
+# 推理服务
+
+## 版本/域名/pod的关系
+`$服务名=$服务类型-$模型名-$模型版本(只取版本中的数字)`
+
+![image](https://user-images.githubusercontent.com/20157705/169943323-0849f8fd-b20e-4036-9ce5-33892a5bb643.png)
+
+`$k8s-deploymnet-name=$服务名`
+
+![image](https://user-images.githubusercontent.com/20157705/169943360-b7883e39-f070-4dbb-af16-caf021e3b7fa.png)
+
+`$k8s-hpa-name=$服务名`  
+
+在最大最小副本数不一致时创建hpa  
+
+![image](https://user-images.githubusercontent.com/20157705/169943401-6e7abef7-29e2-4986-a4c9-cb3d5da4a7f0.png)
+
+`$k8s-service-name=$服务名`  用于域名的代理  
+
+`$k8s-service-name=$服务名-external`   用户ip/L5的代理  
+
+![image](https://user-images.githubusercontent.com/20157705/169943472-34b161c2-b487-4aab-a335-f45465bda33b.png)
+
+
+## 系统自带域名
+
+自动配置域名需要泛域名支持。例如泛域名为domain = *.kfserving.woa.com
+
+生产域名
+
+http://$服务名.service.$domain  
+
+测试环境域名  
+
+http://test.$服务名.service.$domain  
+http://debug.$服务名.service.$domain  
+
+## 自定义域名
+
+用户可通过host字段配置服务的访问域名，但是必须与泛域名结尾
+
+多个服务可以配置相同的域名
+
+## 流量复制和分流
+
+多个服务（可以是相同模型或者不同模型间）配置相同的域名  
+1、分流属性字段控制分配多少流量到其他服务上，剩余流量归属于当前服务  
+2、流量镜像字段控制复制多少流量到其他服务上。但只会将当前服务的响应返回给客户端  
+
+![image](https://user-images.githubusercontent.com/20157705/169944196-bd98064d-124f-4233-af24-5b226ab38831.png)
+
+## 灰度升级
+
+1、同一个服务灰度升级，只需要修改服务的配置，重新部署，服务会自动滚动升级pod  
+2、不同服务进行灰度升级。比如同一个模型的不同版本之间，那么多个服务使用相同的域名，新部署的服务上线正常后，会自动下线同域名的旧服务。  
+
+## 弹性伸缩容
+
+弹性伸缩容的触发条件：可以使用自定义指标，可以使用其中一个指标或者多个指标，示例：cpu:50%,mem:%50,gpu:50%  
+
+## 环境变量
+
+系统携带的环境变量
+```bash
+KUBEFLOW_ENV=test
+KUBEFLOW_MODEL_PATH=
+KUBEFLOW_MODEL_VERSION=
+KUBEFLOW_MODEL_IMAGES=
+KUBEFLOW_MODEL_NAME=
+KUBEFLOW_AREA=shanghai/guangzhou
+
+K8S_NODE_NAME=
+K8S_POD_NAMESPACE=
+K8S_POD_IP=
+K8S_HOST_IP=
+K8S_POD_NAME=
+```
+
+
--- a/docs/example/images/readme.md
+++ b/docs/example/images/readme.md
@ -1,6 +1,6 @@
 # 在线构建镜像

-![](../pic/tapd_20424693_1630748567_87.png)
+![image](https://user-images.githubusercontent.com/20157705/167538625-39c19c33-a63d-44fa-a16a-2aaa7b480190.png)

 扩展字段高级配置(例如)：
 ```
--- a/install/docker/config.py
+++ b/install/docker/config.py
@ -707,19 +707,16 @@ NOTEBOOK_GPU_TYPE='NVIDIA'

 # 各类model list界面的帮助文档
 HELP_URL={
-    "pipeline":"http://xx.xx/xx",
-    "job_template":"http://xx.xx/xx",
-    "task":"http://xx.xx/xx",
-    "hp":"http://xx.xx/xx",
-    "nni":"http://xx.xx/xx",
-    "images":"http://xx.xx/xx",
-    "notebook":"http://xx.xx/xx",
-    "service":"http://xx.xx/xx",
-    "kfserving":"http://xx.xx/xx",
-    "inferenceservice":"http://xx.xx/xx",
-    "model":"http://xx.xx/xx",
-    "run":"http://xx.xx/xx",
-    "docker":"http://xx.xx/xx"
+    "pipeline":"https://github.com/tencentmusic/cube-studio/tree/master/docs/example",
+    "job_template":"https://github.com/tencentmusic/cube-studio/tree/master/job-template",
+    "task":"https://github.com/tencentmusic/cube-studio/tree/master/docs/example",
+    "nni":"https://github.com/tencentmusic/cube-studio/tree/master/docs/example",
+    "images":"https://github.com/tencentmusic/cube-studio/tree/master/images",
+    "notebook":"https://github.com/tencentmusic/cube-studio/tree/master/docs/example",
+    "service":"https://github.com/tencentmusic/cube-studio/tree/master/docs/example",
+    "inferenceservice":"https://github.com/tencentmusic/cube-studio/tree/master/docs/example",
+    "run":"https://github.com/tencentmusic/cube-studio/tree/master/docs/example",
+    "docker":"https://github.com/tencentmusic/cube-studio/tree/master/docs/example"
 }

 # 不使用模板中定义的镜像而直接使用用户镜像的模板名称
--- a/myapp/cli.py
+++ b/myapp/cli.py
@ -161,7 +161,7 @@ def init():
                        "range": "",
                        "default": "ai.tencentmusic.com/tme-public/ubuntu-gpu:cuda10.1-cudnn7-python3.6",
                        "placeholder": "",
-                        "describe": "要调试的镜像，<a target='_blank' href='https://github.com/tencentmusic/cube-studio/tree/master/docs/example/images'>基础镜像参考<a>",
+                        "describe": "要调试的镜像，<a target='_blank' href='https://github.com/tencentmusic/cube-studio/tree/master/imagess'>基础镜像参考<a>",
                        "editable": 1,
                        "condition": "",
                        "sub_args": {}