How to Deploy a Machine Learning Model as a Streaming Application
Streaming based machine learning predictions are very useful when it comes to high traffic and democratizing predictions of models in organizations.
Let’s examine a use case where a streaming based machine learning system is more efficient compared to real-time (synchronous) predictions (RESTful in most cases) and batch predictions.
You’re leading the recommenders system team at a e-commerce company and your goal is to develop a model that will power product recommendation carousels in many parts of the website using the same model. After some data analysis, you concluded that recent user behaviour is crucial signal for delivering high quality product recommendations. Therefore, a batch based recommender system is not the best option.
How about deploying the recommender model as a REST API? One naive way to architect this solution is to call the recommender model every time a user lands on a page of the website where a product recommendation carousel is served. The problem here is that this piece of client code will need potentially to calculate the required features and call the model through a REST API. Another option is to build a REST service that would get as input the user id and get back the recommended products. This REST service will need to query some database that keeps a state of the user in order to calculate the required features and call the model.
The problem is that you need to add the same client code on many parts of the website and that makes it very hard to maintain it. Also, it would be challenging to deal with high traffic.
If you have already set up a clickstream infrastructure (usually Kafka or AWS Kinesis) that captures all user actions on your website, you can create a streaming application that subscribes to specific events, generates the appropriate features, calls the REST API with these features and forwards the predictions to some other topic. Then, your application engineers can subscribe to this topic and use the output on any part of the website. You could even set up a pipeline that pushes predictions from the latter topic to a NoSQL database (such as Cassandra or DynamoDB) and build an API in front of this database in order to provide multiple patterns of access to these predictions.
Let’s talk about democratizing predictions of models in organizations now! The latter mentioned approach of having model predictions in some Kafka topic and in a NoSQL database will enable any product/engineering team to access them without having to call your REST API services that comes at the cost of integration and putting unnecessary traffic burden on it.
So, how about having a MLOps tool where data scientists and ML scientists/engineers only have to specify the prediction logic of their trained model in a
predict(input_json) Python function, and then this tool will automatically deploy this model as a REST API and set up a streaming pipeline around it.
As you see in the diagram below, you can create with 2 CLI commands:
- The model REST API, i.e. deploy the trained model as a REST API
- The streaming based prediction pipeline: a) Features topic, b) Model Prediction Streaming App and c) Model Predictions topic
But how? Let’s assume that your trained model is stored at
s3://my-bucket/recommender-model-outputs/output-1/model.tar.gz . The first step is to deploy this model as a REST API:
sagify cloud deploy -m s3://my-bucket/recommender-model-outputs/output-1/model.tar.gz -n 1 -e ml.m4.xlarge
This command will deploy your trained model in 1 EC2 instance of type m4.xlarge and will return the endpoint name (let’s assume it’s
The second step is to set up the streaming based prediction pipeline:
sagify cloud create-streaming-inference --name recommender-worker --endpoint-name my-recommender-endpoint-1 --input-topic-name features --output-topic-name model-predictions --type SQS
This command will create a Lambda function with name
recommender-worker that will consume features from the SQS queue
features , then, it will call the endpoint
my-recommender-endpoint-1 for each features message and, finally, forward the predictions to the SQS queue
It’s so easy! Please, check out the docs for more info!