vertex AI Resume Google cloud computing

(1)

Training jobs can be:

single node training → one worker pool distributed training → multiple worker pools

Custom job

:

Its the basic way to run your custom ML training code in Vertex AI.

Steps:

Create costum container image → push to container registery → create costum job or

Use prebuilt container

- Custom jobs can run on a persistent resource instead of creating new compute resources during job startup.

A training pipeline in Vertex AI orchestrates custom training jobs with additional steps, such as loading a dataset or uploading the model to Vertex AI after the training job is successfully completed.

Difference between aiplatform, aiplatformv1 and aiplatform v1 beta1 - aiplatform exposes the latest version of the Vertex AI API

- aiplatformv1 or aiplatformv1_beta1 used to develop applications that use older versions of the Vertex AI API.

Costum Job Python API documentation link:

https://cloud.google.com/python/docs/reference/aiplatform/latest/google.cloud.aiplatform.CustomJob#google_cloud_aiplatform_CustomJob

Vertex AI Notebooks:

1- Vertex AI Workbench: a JupyterLab experience and advanced customization capabilities.

2- Colab Enterprise: serverless, and collaborative environment, AI-powered code assistance

(2)

TensorBoard instance is a managed service that monitors your machine learning (ML) training and experiments. It provides a graphical user interface (GUI) that you can use to visualize your ML metrics and logs.

#####

One worker pool for single-node training (WorkerPoolSpec), or multiple worker pools for distributed training

staging_bucket: Bucket for produced custom job artifacts

An experiment is a collection of training jobs that are designed to test a particular hypothesis about your data or your machine learning model.

experiment_run is a specific instance of an experiment