Machine Learning Serving - BentoML 사용법

Machien Learning Serving 라이브러리인 BentoML 사용법에 대해 정리한 글입니다
- 키워드 : BentoML Serving, Bentoml Tutorial, Bentoml ai, bentoml artifacts, bentoml github, bentoml serve, AI model serving, MLOps Serving

BentoML

데이터 사이언스팀이 만든 모델을 쉽게 테스트, 배포, 통합할 수 있어야 함
- 이를 위해 데이터 과학자는 서버에 모델 파일 또는 Protobuf 파일을 업로드하는게 아닌, 예측 서비스를 구축하는데 도움이 되는 도구가 필요함
데이터 과학자 입장에서 정말 적은 코드로 프러덕션 서비스까지 가능한 BentoML
- CLI에서 굉장히 많은 기능을 제공하고 있음
들어가기 전에 사용되는 용어 간단 정리(글 읽으신 후, 다시 보셔도 좋아요)
- Bento : 일본의 도시락 요리. 위키피디아 - 우리의 머신러닝 모델이 패킹된 것
- Yatai : 일본식 포장마차로 타코야키, 오코노미야키 등을 판매. 위키피디아 - 머신러닝 모델을 관리해주는 친구
- Pack : 음식을 포장해서 Bento로 만드는 행위 - 머신러닝 모델을 저장하는 과정
- 우리는 Bento에 들어갈 음식(모델 Artifact 및 코드)을 만들고 BentoML에게 패킹(포장)을 요청, 배달(Deploy)을 자동으로 해준다고 이해하면 된다. Bento를 확인하고 싶다면 Yatai 가서 패킹되거나 배달된 Bento를 확인할 수 있음
참고로 Github Repository에 사용한 코드, 샘플로 작성하고 있는 코드를 확인할 수 있음
Machine Learning Serving
- 학습한 머신러닝 모델을 프러덕션 환경(실제 환경)에서 사용할 수 있도록 만드는 것을 뜻함
- Online Serving은 API로 만드는 것을 의미하며, Offline Serving은 배치로 처리하는 것을 의미함
- 더 궁금하시면 MLOps concepts for busy engineers: model serving 참고

주요 특징

Online / Offline Serving
높은 성능 : Flask 기반 모델보다 100배의 처리량을 가지고, Adaptive Micro Batching 메커니즘을 활용함
DevOps의 Best Practices 적용
모델 관리를 위한 웹 대시보드 존재
지원하는 ML 프레임워크
- 정말 많은 프레임워크를 지원함. 메이저 머신러닝 프레임워크는 거의 지원
Programming으로 접근 가능 => Python Script로 작성해서 Airflow 등에서 실행 가능

사용법

1) 모델 학습(기존에 익숙한 방식으로 학습)
2) Prediction Service Class 생성
- BentoML에서 제공하는 Artifact를 사용해 생성
- 서빙 로직 코드가 저장된 인퍼런스 API와 모델이 정의되어야 함
3) Prediction Service에 학습한 모델 저장
- 별도로 BENTOML_HOME를 설정하지 않으면 ~/bentoml/repository/{service_name}/{service_version} 경로에 저장됨
4) Serving - Local
5) Prediction Request(Inference Job 등) - Local
6) 모델 API 서버 컨테이너화
- 컨테이너를 클라우드 서비스에 배포

BentoML 코드로 바로 시작하기

설치
```
  pip3 install bentoml
```
환경 변수에 BENTOML_HOME가 설정되어 있으면 BentoML에서 만드는 Artifact가 해당 경로에 저장됨. 참고 코드
- 처음엔 그냥 설정하지 않고 먼저 BentoML을 경험한 후, 수정하시는 것을 추천
- 참고로 BENTOML_HOME 환경 변수가 설정되어 있지 않으면 Default로 ~/bentoml에 저장됨
- (선택) BENTOML_HOME 설정하기(필요시 .zshrc, bash_profile에 저장
```
  export BENTOML_HOME='원하는 경로'
```
- BENTOML_HOME에 bentoml.cfg를 생성하면 BentoML 관련 Config를 설정할 수 있음. 샘플 config 파일

1) 모델 학습 코드

main.py로 저장

  from sklearn import svm
  from sklearn import datasets
	
  # Load training data
  iris = datasets.load_iris()
  X, y = iris.data, iris.target
	
  # Model Training
  clf = svm.SVC(gamma='scale')
  clf.fit(X, y)

2) Prediction Service Class 생성
- iris_classifier.py로 저장
```
  import pandas as pd
	
  from bentoml import env, artifacts, api, BentoService
  from bentoml.adapters import DataframeInput
  from bentoml.frameworks.sklearn import SklearnModelArtifact
	
  @env(infer_pip_packages=True)
  @artifacts([SklearnModelArtifact('model')])
  class IrisClassifier(BentoService):
      """
      A minimum prediction service exposing a Scikit-learn model
      """
	
      @api(input=DataframeInput(), batch=True)
      def predict(self, df: pd.DataFrame):
          """
          An inference API named `predict` with Dataframe input adapter, which codifies
          how HTTP requests or CSV files are converted to a pandas Dataframe object as the
          inference API function input
          """
          return self.artifacts.model.predict(df)
```
- BentoService를 상속해서 Prediction Service Class를 생성함
- API로 사용할 함수에 @api 데코레이터를 설정하면 추후 inference API로 만들어줌
  - API의 input, output 설정, batch 유무를 인자로 받을 수 있음 : @api(input=DataframeInput(), batch=True)
- BentoML에서 이미 만든 Artifact를 사용하며, 위에서 생성한 클래스에 @artifacts 데코레이터를 사용함. @artifacts([SklearnModelArtifact(‘model’)]) 여기서 ‘model’은 Prediction Service Class에서 부를 이름. predict 함수에서 self.artifacts.model.predict이 있는데 이 model을 의미함
- 환경 설정도 @env 데코레이터를 사용하면 가능함. 해당 코드에선 pip 패키지를 추론에서 requirements.txt를 생성하며, 직접 버전을 명시할 수도 있고 conda/Docker 활용 가능

3) Prediction Service에 학습한 모델 저장

1)에서 생성한 main.py 아래에 다음 코드 추가

  # import the IrisClassifier class defined above
  from iris_classifier import IrisClassifier
	
  # Create a iris classifier service instance
  iris_classifier_service = IrisClassifier()
	
  # Pack the newly trained model artifact
  iris_classifier_service.pack('model', clf)
	
  # Save the prediction service to disk for model serving
  saved_path = iris_classifier_service.save()

Prediction Service Class를 import한 후 객체 생성
1)에서 생성한 모델인 clf를 iris_classifier_service에 pack함(‘model’에 주입)
그 후 Prediction Service를 BENTOML_HOME에 저장하는 코드

main.py 스크립트 실행
```
  python main.py
```
- 실행하면 다음과 같은 메세지가 출력됨
```
  [2021-04-17 21:13:38,284] INFO - BentoService bundle 'IrisClassifier:20210418115149_73C784' saved to: /Users/byeon/bentoml/repository/IrisClassifier/20210417211338_284825
```
- 이제 ~/bentoml로 이동해서 어떤 것이 생성되었는지 확인
  - bentoml 폴더 아래에 logs, repository 폴더가 생성됨
    - repository 아래엔 학습한 모델 정보가 저장되며, Dockerfile, environment 등이 자동으로 생성됨!!!!! (기가 막힙니다..)
    - IrisClassifier 폴더에 모델 피크 파일도 저장됨
    - 코드가 궁금하신 분은 제 Github에서 확인하셔도 좋을 것 같습니다
- IrisClassifier 클래스 이름을 IrisClassifier1로 변경하고 다시 main.py 실행
  - 이제 IrisClassifier1라는 폴더가 생김. 즉, 폴더 이름은 클래스 이름으로 생성됨
4) Serving - Local
- Serving은 CLI에서 가능
```
  bentoml serve IrisClassifier:latest
```
- 이제 localhost:5000으로 접근하면 Swagger UI를 확인할 수 있음
  - 기본적으로 infra에 feedback, healthz, metadata, metrics가 생성되고 app엔 Prediction Service Class에서 정의한 predict가 보임

Prediction Request(Inference Job 등) - Local

  curl -i \
    --header "Content-Type: application/json" \
    --request POST \
    --data '[[5.1, 3.5, 1.4, 0.2]]' \
    http://localhost:5000/predict

prediction.log에 예측 로그를 확인할 수 있음

Yatai 서버 실행

Yatai는 Model Management Component로 Repository에 저장된 모델, 배포된 모델을 보여줌
마찬가지로 CLI로 실행 가능
```
  bentoml yatai-service-start
```

Docker로 실행하려면

  docker run \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -v ~/bentoml:/bentoml \
    -p 3000:3000 \
    -p 50051:50051 \
    bentoml/yatai-service:latest

명령어를 실행하면 Usage in Python, Usage in CLI가 나오는데, 해당 명령어를 통해 yatai에 push, pull retrieve, delete할 수 있음
- 저장된 IrisClassifier 이름을 확인해서 Push할 수 있음
```
  sudo bentoml push IrisClassifier:20210417211338_284825 --yatai-url=127.0.0.1:50051
```
- localhost:3000에 접근하면 저장된 모델을 확인할 수 있음. 저는 그 사이에 여러 테스트를 해서 모델이 여러개 나옴
- Detail을 클릭하면 세부 내용이 나옴
6) 모델 API 서버 컨테이너화
- 마찬가지로 CLI에서 가능
```
  bentoml containerize IrisClassifier:latest -t iris-classifier
```
- 완료 후, docker images에 저장된 이미지를 확인할 수 있음
클라우드 배포는 추후 다른 글로 다룰 예정
- 추가적으로 궁금하신 분은 공식 문서의 Deployment Guides 참고

BentoML 핵심 개념

1) bentoml.BentoService
2) Service Environment
3) Model Artifact 패키징
4) Model Management & Yatai
5) Model Artifact Metadata
6) API Function and Adapters
7) Model Serving
8) Labels
9) Retrieving BentoServices
10) Web UI 커스텀

1) bentoml.BentoService

bentoml.BentoService는 예측 서비스를 만들기 위한 베이스 클래스
- @bentoml.artifacts 데코레이터를 통해 여러 머신러닝 모델을 포함할 수 있음
- @bentoml.api의 인자인 input에 DataframeInput, JsonInput, ImageInput, TfTensorInput 등을 넣을 수 있으며, output도 JsonOutput 등을 사용할 수 있음
  - API 함수 코드에서 self.artifacts.ARTIFACT_NAME으로 접근할 수 있음. 위에서 실행한 코드는 self.artifacts.model로 접근했음
- BentoService는 __main__ 모듈에서 정의할 수 없고 항상 파일로 저장해야 함. 노트북에서 사용하려면 %writefile을 사용
BentoService를 저장하려면 save 메소드 사용
- 머신러닝 프레임워크, Artifact를 기반으로 모델을 저장
- BentoService 클래스에 필요한 pip 종속성을 자동으로 추출하고 requirements.txt에 저장함
- 모든 파이썬 코드 종속성 저장
- 생성된 파일을 특정 디렉토리에 저장
- save 함수는 내부적으로 save_to_dir를 호출함
BentoML bundle은 예측 서비스에서 실행될 모든 코드, 파일, 설정이 저장된 파일 디렉토리
- BentoML bundle은 도커 컨테이너 이미지 또는 바이너리로 생각할 수 있음. Train 과정에서 Bundle이 생성됨

2) Service Environment

PyPI Packages

@betoml.env(infer_pip_packages=True)를 사용하면 자동으로 필요한 라이브러리를 추론함
requirements_txt_file을 지정할 수도 있음

  @bentoml.env(
    requirements_txt_file="./requirements.txt"
  )
  class ExamplePredictionService(bentoml.BentoService):
	
      @bentoml.api(input=DataframeInput(), batch=True)
      def predict(self, df):
          return self.artifacts.model.predict(df)

@bentoml.env(pip_packages=[])를 사용하면 버전을 지정해서 저장함

  @bentoml.env(
    pip_packages=[
      'scikit-learn==0.24.1',
      'pandas @https://github.com/pypa/pip/archive/1.3.1.zip',
    ]
  )
  class ExamplePredictionService(bentoml.BentoService):
	
      @bentoml.api(input=DataframeInput(), batch=True)
      def predict(self, df):
          return self.artifacts.model.predict(df)

@bentoml.env(coda_channels=[], conda_dependencies=[])로 Conda 패키지 의존성도 처리할 수 있음
- 단, Conda 패키지는 AWS Lambda에서 작동하지 않음(플랫폼의 제한)

Custom Docker Image
- @bentoml.env(Docker_base_image="image_:v1")로 사용할 수 있음
- BentoML Slim 기본 이미지는 90MB라 유용할 수 있음(bentoml/model-server:0.12.0-slim-py37)
Init Bash Script
- Docker 컨테이너 셋팅하는 스크립트를 인자로 주입할 수 있음
- @bentoml.env(setup_sh="init_script.sh")

@bentoml.ver를 사용해 버전을 지정할 수 있음

major, minor
Document

  from bentoml import ver, artifacts
  from bentoml.service.artifacts.common import PickleArtifact
	
  @ver(major=1, minor=4)
  @artifacts([PickleArtifact('model')])
  class MyMLService(BentoService):
      pass
	
  svc = MyMLService()
  svc.pack("model", trained_classifier)
  svc.set_version("2019-08.iteration20")
  svc.save()
	
  # The final produced BentoService bundle will have version:
  # "1.4.2019-08.iteration20"

3) Model Artifact 패키징

Artifact API(@artifacts)를 사용하면 모델을 지정할 수 있음
- 모델을 Load할 때 모델 Serilization, deserialization를 자동으로 처리함
- 여러 아티팩트를 지정할 수 있음

import bentoml
from bentoml.adapters import DataframeInput
from bentoml.frameworks.sklearn import SklearnModelArtifact
from bentoml.frameworks.xgboost import XgboostModelArtifact
	
@bentoml.env(infer_pip_packages=True)
@bentoml.artifacts([
    SklearnModelArtifact("model_a"),
    XgboostModelArtifact("model_b")
])
class MyPredictionService(bentoml.BentoService):
	
    @bentoml.api(input=DataframeInput(), batch=True)
    def predict(self, df):
        # assume the output of model_a will be the input of model_b in this example:
        df = self.artifacts.model_a.predict(df)
	
        return self.artifacts.model_b.predict(df)

모델 a의 output이 b 모델의 input이 되는 경우를 구현한 코드
위 코드에서 @bentoml.artifacts([ ])을 사용해서 sklearn, XGBoost 모델 2개를 사용함(각각의 이름은 model_a, model_b)

svc = MyPredictionService()
svc.pack('model_a', my_sklearn_model_object)
svc.pack('model_b', my_xgboost_model_object)
svc.save()

보통 예측 서비스당 하나의 모델을 권장하며 관련 없는 모델은 별도로 분리함
- 위 예시처럼 여러 모델이 의존하는 경우에만 이렇게 사용

4) Model Management & Yatai

BentoService의 save 메소드는 번들 파일을 ~/bentoml/repository/{서비스 이름}/{서비스 버전}에 저장함
- 메타 데이터는 로컬 SQLite에 저장됨(~/bentoml/storage.db)
모델 리스트 확인
```
  bentoml list
```
특정 모델 정보 가져오기
```
  bentoml get IrisClassifier
```
Yatai
- BentoML의 Model Management Component
  - 일본식 포장마차를 뜻하는 단어
  - CLI, Web UI, BentoML 번들을 생성하기 위한 Python API 제공
  - 팀 전용 Yatai 서버를 구축해서 팀의 모든 모델을 관리하고 CI/CD를 구축할 수 있음

YataiService

모델 저장소나 배포를 관리하는 컴포넌트
기본적으로 local YataiService를 사용
커스텀해서 Model Repository를 수정할 수 있음
YataiService의 host server를 설정할 수 있음
추천 방식
- PostgreSQL DB와 S3 Bucket으로 저장

  > docker run -p 3000:3000 -p 50051:50051 \
      -e AWS_SECRET_ACCESS_KEY=... -e AWS_ACCESS_KEY_ID=...  \
      bentoml/yatai-service \
      --db-url postgresql://scott:tiger@localhost:5432/bentomldb \
      --repo-base-url s3://my-bentoml-repo/
	
  * Starting BentoML YataiService gRPC Server
  * Debug mode: off
  * Web UI: running on http://127.0.0.1:3000
  * Running on 127.0.0.1:50051 (Press CTRL+C to quit)
  * Usage: `bentoml config set yatai_service.url=127.0.0.1:50051`
  * Help and instructions: https://docs.bentoml.org/en/latest/guides/yatai_service.html
  * Web server log can be found here: /Users/chaoyu/bentoml/logs/yatai_web_server.log

참고로 YataiService는 인증을 제공하지 않으므로 같은 VPC에서 접근하도록 하는게 좋음

5) Model Artifact Metadata

Accuracy, 사용한 데이터셋, static 정보 등 사용자에게 의미있는 정보를 저장할 수 있음
메타데이터에 정보를 추가하고 싶으면 pack할 때 metadata 인자로 넘겨주면 됨

# Using the example above.
svc = MyPredictionService()
svc.pack(
    'model_a',
    my_sklearn_model_object,
    metadata={
        'precision_score': 0.876,
        'created_by': 'joe'
    }
)
svc.pack(
    'model_b',
    my_xgboost_model_object,
    metadata={
        'precision_score': 0.792,
        'mean_absolute_error': 0.88
    }
)
svc.save()

참고로 Model Arficat Metadadata는 immutable함(변하지 않음)

메타 데이터 접근하는 방법

1) CLI

  bentoml get MyPredictionService:latest

2) REST API

  bentoml serve MyPredictionService:latest
  # or
  bentoml serve-gunicorn MYPredictionService:latest

  - 그 후 URL path/metada로 접근

3) 파이썬으로 접근

  from bentoml import load
	
  svc = load('path_to_bento_service')
  print(svc.artifacts['model'].metadata)

6) API Function and Adapters

BentoService API는 클라이언트가 예측 서비스에 접근하기 위한 End Point
Adapter는 API callback 함수를 정의하고 다양한 형태로 예측을 요청하는 추상화 레이어
- Adapters
- API 핸들링 함수로 정의됨

@bentoml.api를 사용해 InputAdapter 인스턴스에 넘김

  class ExamplePredictionService(bentoml.BentoService):
		
      @bentoml.api(input=DataframeInput(), batch=True)
      def predict(self, df):
          assert type(df) == pandas.core.frame.DataFrame
          return postprocessing(model_output)

API의 함수에서 데이터 전처리 등으로 활용할 수 있음

  from my_lib import preprocessing, postprocessing, fetch_user_profile_from_database
		
  class ExamplePredictionService(bentoml.BentoService):
		
  @bentoml.api(input=DataframeInput(), batch=True)
  def predict(self, df):
      user_profile_column = fetch_user_profile_from_database(df['user_id'])
      df['user_profile'] = user_profile_column
      model_input = preprocessing(df)
      model_output = self.artifacts.model.predict(model_input)
      return postprocessing(model_output)

사용자가 정의한 API 함수에 전달된 입력 변수는 인퍼런스 input list임. 입력 데이터의 배치를 처리함. Micro Batching을 수행함

Batch API 설정하기
- API에 batch=True를 지정하면 List로 Input을 넣어줘야 함
- batch=False를 지정하면 한번에 하나씩 Input

Batch Input 처리하는 동안 Data Validation도 가능

Input data에서 특정한 경우 에러를 발생시킬 수 있음
- 즉, 데이터가 invalid, malformatted한 경우
discard API를 사용해 에러를 발생할 수 있음

  from typings import List
  from bentoml import env, artifacts, api, BentoService
  from bentoml.adapters import JsonInput
  from bentoml.types import JsonSerializable, InferenceTask  # type annotations are optional
	
  @env(infer_pip_packages=True)
  @artifacts([SklearnModelArtifact('classifier')])
  class MyPredictionService(BentoService):
	
          @api(input=JsonInput(), batch=True)
          def predict_batch(self, parsed_json_list: List[JsonSerializable], tasks: List[InferenceTask]):
               model_input = []
               for json, task in zip(parsed_json_list, tasks):
                    if "text" in json:
                        model_input.append(json['text'])
                    else:
                        task.discard(http_status=400, err_msg="input json must contain `text` field")
	
              results = self.artifacts.classifier(model_input)
	
              return results

HTTP 응답, CLI 추론 작업 출력 등을 디테일하게 작성할 수 있음

  from bentoml.types import JsonSerializable, InferenceTask, InferenceError  # type annotations are optional
	
  class MyService(bentoml.BentoService):
	
      @bentoml.api(input=JsonInput(), batch=False)
      def predict(self, parsed_json: JsonSerializable, task: InferenceTask) -> InferenceResult:
          if task.http_headers['Accept'] == "application/json":
              predictions = self.artifact.model.predict([parsed_json])
              return InferenceResult(
                  data=predictions[0],
                  http_status=200,
                  http_headers={"Content-Type": "application/json"},
              )
          else:
              return InferenceError(err_msg="application/json output only", http_status=400)

http_status를 200, 400 등으로 정의하거나 data를 예측의 첫 값만 취한다거나
Batch가 True인 경우
- 예측 결과값을 for loop

  import bentoml
  from bentoml.types import JsonSerializable, InferenceTask, InferenceError  # type annotations are optional
	
  class MyService(bentoml.BentoService):
	
      @bentoml.api(input=JsonInput(), batch=True)
      def predict(self, parsed_json_list: List[JsonSerializable], tasks: List[InferenceTask]) -> List[InferenceResult]:
          rv = []
          predictions = self.artifact.model.predict(parsed_json_list)
          for task, prediction in zip(tasks, predictions):
              if task.http_headers['Accept'] == "application/json":
                  rv.append(
                      InferenceResult(
                          data=prediction,
                          http_status=200,
                          http_headers={"Content-Type": "application/json"},
                  ))
              else:
                  rv.append(InferenceError(err_msg="application/json output only", http_status=400))
                  # or task.discard(err_msg="application/json output only", http_status=400)
          return rv

여러 API 사용하기

input으로 dataframe인 경우 predict, input으로 json인 경우 predict_json을 사용하도록 두 API를 생성할 수 있음

  from my_lib import process_custom_json_format
	
  class ExamplePredictionService(bentoml.BentoService):
	
      @bentoml.api(input=DataframeInput(), batch=True)
      def predict(self, df: pandas.Dataframe):
          return self.artifacts.model.predict(df)
	
      @bentoml.api(input=JsonInput(), batch=True)
      def predict_json(self, json_arr):
          df = process_custom_json_format(json-arr)
          return self.artifacts.model.predict(df)

Operational API
- 추론 요청을 처리하는 대신 예측 서비스 config 업데이트 요청을 처리하거나, 새로 도착한 데이터로 모델을 재학습시키는 API를 만들 수 있음
- 다만 아직 Beta라 공개되진 않고 이메일로 연락달라고 함

7) Model Serving

BentoService가 Bento로 저장되면 다양한 방법으로 배포할 수 있음
3가지 방식
- Online Serving : API endpoint를 통해 실시간 예측
- Offline Batch Serving : 배치로 처리한 후, 결과를 스토리지에 저장함
- Edge Serving : 모바일, IoT 기기에 모델 배포

Online API Serving

BentoService를 저장하기만 하면 REST API 서버를 쉽게 만들 수 있음

  bentoml serve IrisClassifier:latest

API Server Dockerization
- Bento를 저장하면 Dockerfile이 생성됨
- docker build 가능

  saved_path=$(bentoml get IrisClassifier:latest --print-location --quiet)
	
	
  # Build docker image using saved_path directory as the build context, replace the
  # {username} below to your docker hub account name
  docker build -t {username}/iris_classifier_bento_service $saved_path
	
  # Run a container with the docker image built and expose port 5000
  docker run -p 5000:5000 {username}/iris_classifier_bento_service
	
  # Push the docker image to docker hub for deployment
  docker push {username}/iris_classifier_bento_service

Adaptive Micro-Batching
- 0.12.0부터 Default 설정
- 마이크로 배치는 예측 요청을 작은 배치로 그룹화해 모델 추론 작업에서 배처 처리의 성능 이점을 발휘하는 기술
- BentoML은 Clipper에서 영감을 받아 마이크로 배치 레이어를 구현함
- BentoML API는 사용자의 코드 변경없이 마이크로 배치와 함께 작동하도록 설계됨
- 자세한 내용은 공식 문서의 Micro Batching 참고

Python API(Programmatic Access)

1) 저장된 Bento를 load

  import bentoml
	
  bento_service = bentoml.load(saved_path)
  result = bento_service.predict(input_data)

2) PyPI 패키지로 설치

  saved_path=$(bentoml get IrisClassifier:latest --print-location --quiet)
	
  pip install $saved_path

3) Command Line에서 사용

  # With BentoService name and version pair
  bentoml run IrisClassifier:latest predict --input '[[5.1, 3.5, 1.4, 0.2]]'
  bentoml run IrisClassifier:latest predict --input-file './iris_test_data.csv'
	
  # With BentoService's saved path
  bentoml run $saved_path predict --input '[[5.1, 3.5, 1.4, 0.2]]'
  bentoml run $saved_path predict --input-file './iris_test_data.csv'

만약 이미 설치되어 있다면 특정해서 사용할 수 있음(BentoService Class name)

  IrisClassifier run predict --input '[[5.1, 3.5, 1.4, 0.2]]'
  IrisClassifier run predict --input-file './iris_test_data.csv'

8) Labels

최대 63글자, dash(-), underscore(_), dot(.), 숫자, 알파벳 사용 가능

예시

  “cicd-status”: “success”
  “data-cohort”: “2020.9.10-2020.9.11”
  “created_by”: “Tim_Apple”

Bento Bundle로 저장할 경우에도 Label 지정

svc = MyBentosService()
svc.pack('model', model)
svc.save(labels={"framework": "xgboost"})

배포를 위한 Label 지정

현재(21년 4월 기준) CLI로만 설정 가능

  $ # In any of the deploy command, you can add labels via --label option
  $ bentoml azure-functions deploy my_deployment --bento service:name \
      --labels key1:value1,key2:value2

Label selector
- Label selector를 제공함. equality-based와 set-based 2가지로 찾을 수 있음
- Equality-based requirements
  - = 또는 != 사용
- Set-based requirements
  - In, NotIn, Exists, DoesNotExist
```
  bentoml get bento_name --labels "key1=value1, key2 In (value2, value3)"
```

9) Retrieving BentoServices

학습한 모델을 저장한 후, Artifact bundle을 찾을 수 있음

--target_dir flag를 사용

  bentoml retrieve ModelServe --target_dir=~/bentoml_bundle/

10) Web UI 커스텀

@bentoml.web_static_content를 사용하면 웹 프론트엔드에 추가할 수 있음

예시 Github

  @env(auto_pip_dependencies=True)
  @artifacts([SklearnModelArtifact('model')])
  @web_static_content('./static')
  class IrisClassifier(BentoService):
	
      @api(input=DataframeInput(), batch=True)
      def predict(self, df):
          return self.artifacts.model.predict(df)

bentoml 명령어

bentoml –help를 입력하면 다양한 명령어를 확인할 수 있음
config 설정, containerize, delete, deployment, run, serve, yatai-service-start, azure, ec2 등

FAQ

FAQ에 다양한 Serving 라이브러리와 비교를 하며 차이점을 잘 설명해주고 있음
- 이 문서만 봐도 매우 유용함! 꼭 보시길 추천
- Tensorflow Serving
- Clipper
- AWS SageMaker
- MLFlow
- Cortex
- Seldon
그 외에도 BentoML은 Horizontal Scaling을 하는지?에 대한 질문도 있음. Serving쪽이 춘추전국 시대라서 이렇게 하나씩 비교해주는 FAQ가 마음에 들었음

정리

장점
- API 서버를 자동으로 만들어주는 편리함
- CLI에서 정말 다양한 것들을 제공해주고 있음
- 엄청 좋은 퍼포먼스(이건 조금 더 확인이 필요)
고민할 포인트
- BentoML은 코드를 프러덕션에 배포하는 것에 집중하고 있고, Auto Scailing, AB Test, MAB, Monitoring은 추가로 설정해줘야 함(이 부분 BentoML도 준비하고 있다고 함) => KFServing 등과 결합하면 좋을듯
- 수평 확장도 고려해야 좋을듯
- 아직 많은 Use Case가 없어서 어떻게 아키텍처를 가져갈지 고민 필요
앞으로 진행할 내용
- 쿠버네티스, 클라우드에 배포 글 작성
- Advanced Guides 모두 정독
- Adaptive Micro Batching 정리 => 성능 테스트
- Production 모델에 적용
- Gallery의 예제 하나씩 모두 확인해보기
GCP AI Platform에 배포하는 방법도 PR이 올라옴. 점점 더 생태계가 좋아질 것으로 예상

Reference

카일스쿨 유튜브 채널을 만들었습니다. 데이터 분석, 커리어에 대한 내용을 공유드릴 예정입니다.

PM을 위한 데이터 리터러시 강의를 만들었습니다. 문제 정의, 지표, 실험 설계, 문화 만들기, 로그 설계, 회고 등을 담은 강의입니다

이 글이 도움이 되셨거나 의견이 있으시면 댓글 남겨주셔요.

Buy me a coffee