Inference APIv1alpha1

Download OpenAPI

Introduction

Scaleway Inference at Scale provides a flexible Machine Learning (ML) inference service which deploys a trained model accessible through a REST API on managed infrastructure with built-in scalability.

With Inference at Scale, you can deploy a trained model in a production environment with a single http request and use it to make predictions.

You can make your model accessible from anywhere whether you're building an app or want to share the model with other users, without worrying about availability or scalability.

Inference at Scale is built on top of Kapsule and Serverless.

The Scaleway Inference at Scale API is used to manage you models and can be reached at:

For each model in a ready state, an associated endpoint is generated and can be used for inference.

  1. Install curl
  2. (optional) Install jq
  3. Get your project ID from the credentials page
  4. Get you secret key from the credentials page
  5. URL to a downloadable model You can store your model in Scaleway Object Storage and get the public link to access it.

To call the Scaleway API, you need to add the X-Auth-Token header up to your requests. The value for this header is your secret key.

If you have not used the inference product before, the expected response will be like this (otherwise you will see a list of all your existing models):

Export your model for serving

As of now, only Tensorflow Serving and ONNX are supported as machine learning framework backend. The recommended method is to use models in the format of ONNX as it makes supported machine learning framework interoperable.

  • To export your model to ONNX, look here for tutorials on how to proceed.
  • Be aware that not all models are compatible with ONNX. RNNs for example, are not. Test your ONNX predictions locally before using this product.
  • To export your Tensorflow model to be TFServing-compatible, look here.
  • Compress the generated folder in a zip file; this is what you will upload in the next step.

To work, Inference at Scale needs to download your model from a URL.

To store your model and make it accessible, you can use Scaleway Object Storage.

Supported serving backends:

Supported serving backend versions:

Tensorflow ServingONNXRuntime
latestlatest
2.0.01.0.0
1.15.00.5.0
1.14.00.4.0
1.13.1
1.12.3

The serving backend must be specified when creating a new model.

If you don't need a specific version you can use the latest version of each backend, but be carefull about compatibility issues as these projects get updated.

For Inference at Scale, a model can be in one of the following states:

  • creating: first step of creating a new model ;
  • converting: converts your model in a format compatible with the serving framework:
    for now, this step only checks that the model format matches the serving framework. Later on, the conversion will be handled for you ;
  • building: creates a docker image using your model and the serving framework ;
  • deploying: deploys containers of the built image on Serverless, making your model accessible for inference ;
  • ready: the model is ready for use ;
  • pausing: deletes the Serverless containers, making the model unavailable for inference ;
  • paused: the model is paused, meaning that the endpoint is unavailable for prediction. To make predictions you need to resume the model ;
  • deleting: deletes the model and all associated resources ;
  • locked: the model is locked and cannot be used : resources can be locked in agreement to the Scaleway Terms of Services : for more information about a locked model, please contact the Scaleway Support ;
  • error: the model has encountered an error.

To make predictions (also known as inference), first you need to create a model.

Once the model is ready, you can get the associated endpoint using the GetModel method.

This endpoint offers 2 methods:

  • a GET returns the documentation about the api associated with the model;
  • a POST can be used for making predictions.

For the post, you need to serialize your data properly according to the input of your model.

Prediction request

To make a prediction, you must send a JSON payload with data formatted in one of the following way:

  • array data ;
  • binary data (for images) ;
  • string data.
Array data

For any type of input, mostly numpy arrays.

Binary data (images)

In case you have big images that decomposing into array would make too big payloads, you can export it to base64 and send it with this format:

Some fields are used for interpreting the binary, reconstructing the image and feeding it into the model (we use the pillow module for this):

  • you can use RGB or BGR for the "img_mode" field ;
  • you can use channels_last or channels_first for the "img_format" field ;
  • the "img_size" field is the two-dimensional shape of the image expected by your model.
Request example

Let's assume your have your payload in the right format saved in a JSON file payload.json. You can send it for prediction to your endpoint with a simple POST request:

Prediction response

Note: For now, only array-shaped output are supported.

Response example:

Prediction errors

If something unexpected happens, the endpoint will return a different response, for example:

About

Service information

Returns general information about the service.

GET
/inference/v1alpha1
200 Response

name
string

description
string

version
string

documentation_url
nullable string
Response Example

List all models linked to the provided secret key.

GET
/inference/v1alpha1/models
Query Parameters

organization_id
nullable string

project_id
nullable string

page
number
Page number. The default value is 1.

page_size
number
Page size. The default value is 20.

order_by
string
Possible values are created_at_desc, created_at_asc, name_asc, name_desc, framework_asc and framework_desc. The default value is created_at_desc.
200 Response

models
array

total_count
number
Response Example

Create a new model and make it ready for inference. The whole process can take a few minutes. The status goes through the following stages: creating, converting, building, deploying, ready. For details about status, please refer to Model lifecycle section above.

POST
/inference/v1alpha1/models
Body

project_id
string
Project id.

path
string
You can store your model in [Scaleway Object Storage](https://console.scaleway.com/object-storage/buckets) and get the public link to access it. The format of the model must match with the `framework` field. .

name
string
Name of the model.

framework
nullable string
Format must be `<backend>:<version>`. For supported version, please refer to `Serving Framework` section above. .

config
map
A map of key/value pairs passed as environment variables in the running instance of your model. The environment variable `input_type` is used to reshape input data when making predictions using `numpy.reshape(x, y).astype(input_type)`. It must be a valid python type: see [python types](https://docs.scipy.org/doc/numpy-1.13.0/user/basics.types.html#data-types) for details. By default, it is set to `float32`. .
Request Example
200 Response

path
string
URL of the original ML model used to create the Inference model.

name
string
Name of the model.

framework
nullable string
Format must be `<backend>:<version>`. For supported version, please refer to `Serving Framework` section above. .

endpoint
nullable string
URL to the running Inference model.

error_message
nullable string
Details of the error, if any occured when managing the model.

status
string
The status in which the model is. Possible values are ready, paused, error, creating, converting, building, deploying, pausing, deleting and locked. The default value is ready.

config
map
A map of environment variables key/value pairs.

id
string
UUID of the model.

project_id
string
Project id.

created_at
string
Date at which the model has been created (RFC 3339 format).

updated_at
string
Date at which the model has been last updated (RFC 3339 format).

organization_id
string
Orgnaization ID.
Response Example

Get the model for the given id.

GET
/inference/v1alpha1/models/{model_id}
Path Parameters

model_id
required string
200 Response

path
string
URL of the original ML model used to create the Inference model.

name
string
Name of the model.

framework
nullable string
Format must be `<backend>:<version>`. For supported version, please refer to `Serving Framework` section above. .

endpoint
nullable string
URL to the running Inference model.

error_message
nullable string
Details of the error, if any occured when managing the model.

status
string
The status in which the model is. Possible values are ready, paused, error, creating, converting, building, deploying, pausing, deleting and locked. The default value is ready.

config
map
A map of environment variables key/value pairs.

id
string
UUID of the model.

project_id
string
Project id.

created_at
string
Date at which the model has been created (RFC 3339 format).

updated_at
string
Date at which the model has been last updated (RFC 3339 format).

organization_id
string
Orgnaization ID.
Response Example

Update the model for the given id.

PATCH
/inference/v1alpha1/models/{model_id}
Path Parameters

model_id
required string
Body

name
nullable string
Request Example
200 Response

path
string
URL of the original ML model used to create the Inference model.

name
string
Name of the model.

framework
nullable string
Format must be `<backend>:<version>`. For supported version, please refer to `Serving Framework` section above. .

endpoint
nullable string
URL to the running Inference model.

error_message
nullable string
Details of the error, if any occured when managing the model.

status
string
The status in which the model is. Possible values are ready, paused, error, creating, converting, building, deploying, pausing, deleting and locked. The default value is ready.

config
map
A map of environment variables key/value pairs.

id
string
UUID of the model.

project_id
string
Project id.

created_at
string
Date at which the model has been created (RFC 3339 format).

updated_at
string
Date at which the model has been last updated (RFC 3339 format).

organization_id
string
Orgnaization ID.
Response Example

Delete the model for the given id. The status goes through deleting. Once deleted the model doesn't exist anymore.

DELETE
/inference/v1alpha1/models/{model_id}
Path Parameters

model_id
required string
200 Response

path
string
URL of the original ML model used to create the Inference model.

name
string
Name of the model.

framework
nullable string
Format must be `<backend>:<version>`. For supported version, please refer to `Serving Framework` section above. .

endpoint
nullable string
URL to the running Inference model.

error_message
nullable string
Details of the error, if any occured when managing the model.

status
string
The status in which the model is. Possible values are ready, paused, error, creating, converting, building, deploying, pausing, deleting and locked. The default value is ready.

config
map
A map of environment variables key/value pairs.

id
string
UUID of the model.

project_id
string
Project id.

created_at
string
Date at which the model has been created (RFC 3339 format).

updated_at
string
Date at which the model has been last updated (RFC 3339 format).

organization_id
string
Orgnaization ID.
Response Example

Pause the model for the given id. This will disable the endpoint, which means you cannot use it for inference anymore. The status goes through the following stages: pausing, paused. For details about status, please refer to Model lifecycle section above.

POST
/inference/v1alpha1/models/{model_id}/pause
Path Parameters

model_id
required string
Body

Request Example
200 Response

path
string
URL of the original ML model used to create the Inference model.

name
string
Name of the model.

framework
nullable string
Format must be `<backend>:<version>`. For supported version, please refer to `Serving Framework` section above. .

endpoint
nullable string
URL to the running Inference model.

error_message
nullable string
Details of the error, if any occured when managing the model.

status
string
The status in which the model is. Possible values are ready, paused, error, creating, converting, building, deploying, pausing, deleting and locked. The default value is ready.

config
map
A map of environment variables key/value pairs.

id
string
UUID of the model.

project_id
string
Project id.

created_at
string
Date at which the model has been created (RFC 3339 format).

updated_at
string
Date at which the model has been last updated (RFC 3339 format).

organization_id
string
Orgnaization ID.
Response Example

Resume the model for the given id. The model must be in paused status. This will deploy the model, enabling the endpoint, and make it ready for inference again. The status goes through the following stages: deploying, ready. For details about status, please refer to Model lifecycle section above.

POST
/inference/v1alpha1/models/{model_id}/resume
Path Parameters

model_id
required string
Body

Request Example
200 Response

path
string
URL of the original ML model used to create the Inference model.

name
string
Name of the model.

framework
nullable string
Format must be `<backend>:<version>`. For supported version, please refer to `Serving Framework` section above. .

endpoint
nullable string
URL to the running Inference model.

error_message
nullable string
Details of the error, if any occured when managing the model.

status
string
The status in which the model is. Possible values are ready, paused, error, creating, converting, building, deploying, pausing, deleting and locked. The default value is ready.

config
map
A map of environment variables key/value pairs.

id
string
UUID of the model.

project_id
string
Project id.

created_at
string
Date at which the model has been created (RFC 3339 format).

updated_at
string
Date at which the model has been last updated (RFC 3339 format).

organization_id
string
Orgnaization ID.
Response Example