The 12+ Factor Model

Modifying the 12-factor app to include machine learning operations principles.

and

May 05, 2023

The 12-factor app methodology has been widely discussed, offering a set of guidelines for building web applications that are easy to deploy, maintain, and scale—trying to answer questions like how can we make the continuous deployment process as smooth as possible by making them portable to different cloud systems?

Low Coupling and High Cohesion in Systems.

One of the wonderful aspects of the 12 factors lies in their ability to clearly outline the requirements and expectations focus for infrastructure and deployment teams and their needs from developers, particularly when teams are onboarding new applications and use cases. This clear alignment is helpful for a fast-growing start-up when you need to onboard people quickly or when you are a large corporation wanting to keep low couplings between teams.

Low-coupling is essential for creating flexible, maintainable systems. However, achieving low-coupling has always been the bane of machine learning systems. With the constant evolution of ideas, methods, and tools, it’s challenging to maintain a clear separation between data, code, and infrastructure throughout the lifecycle of machine learning applications. This high degree of coupling is why small, cross-functional teams can excel in machine learning. These teams can effectively counteract high coupling with high cohesion, allowing them to adapt rapidly to new ideas and technologies.

As we explore the unique challenges of deploying machine learning models, we will also consider how adapting 12-factor app principles to models can be adapted to encourage low coupling in machine learning systems and teams. By doing so, we hope to create a more comprehensive framework for machine learning deployment, bridging the gap between development, infrastructure, and deployment teams.

Join us as we dive into the key questions and challenges that arise when adapting the 12-factor app principles to machine learning deployment, and contribute your thoughts and suggestions to help create a more effective and cohesive approach to machine learning operations.

Challenges in Machine Learning Deployment

The first challenge most newcomers to ML and MLOps is data versioning and model versioning. How do we handle versioning training and validation data sets, ensuring that the correct data is used during the development and testing? How should we version machine learning models and maintain a history of model performance over time? Should there be a new factor focusing on model versioning and reproducibility?

As we approach deployments, questions, and challenges associated with hardware and infrastructure arise. Machine learning models often require specialized hardware, such as edge devices, GPUs, or Application-Specific Integrated Chips (ASICs), which can complicate deployment. Additionally, the challenge of managing model weights and memory layouts may necessitate tools like TVM, OctoML, or NeuronSDK to create on-the-fly pipelines. The original factor puts us in a situation where we could potentially have multiple containers, each with various libraries necessary to run our models in each edge device or Application Specific Integrated Chips (ASIC).

Another question would be combining trained weights with models’ machine learning deployment. Should weights be stored separately from the containers and pulled during the container ramp-up or cached within the image? How do I scale this quickly? These decisions impact scalability, maintainability, and adherence to the original 12-factor app principles.

Furthermore, logging requirements in machine learning deployments extend beyond the typical messages and debug info with metrics and large volumes of data. Metadata needs to be logged for efficient debugging of data-specific issues or retraining.

Should I keep all the data during the inference? Could metrics beyond DORA-like give me insights into how well my model performs?

How about scaling strategies for real-time interactive apps or processing pipelines that may differ from those for machine learning models? While traditional apps can scale horizontally or vertically, models often require more powerful vertical scaling for optimal performance and return on investment (ROI). This raises questions about concurrency.

Stateless processes also present challenges in machine learning. While statelessness ensures repeatability, it may not be ideal for models that need to learn in real-time or retain contextual information across user interactions.

Indeed, if a 12-factor app is deployed as part of a real-time interactive app or a processing pipeline, the scaling can be potentially different. When processing data, apps or models can scale horizontally or vertically. Models, however, tend to, and generally require us, to utilize the full power of vertical scaling. For efficiency and ROI, a spinning GPU costs the same as an idle one. Factor 8 is insufficient, and the process may not be first-class citizens since scaling out process-wise can be prohibitively expensive. Now, this may not always be true as costs go down, or potentially the use of cheaper application-specific integrated chips becomes more common.

Statefullness is fun in ML, though stateless processes sound great if you want repeatable results; what if you want the model to learn in real-time? While there are ways to do it for short context, imagine an LLM using the whole conversation up to that point as input to contain the context of previous talking points. We may also want the model to retain some information as other users use the previously gained knowledge.

So what features or factors should we include to cover the issues above? Are there other factors we potentially have missed?

So all great questions, none with clear answers; what a great post! Hey, at least we are being honest.

Machine learning operations face unique challenges that the original 12-factor app methodology does not directly address. These challenges include model and data versioning, hardware and infrastructure requirements, model weights and memory layout management, extended logging requirements, real-time learning and statefulness, and scalability. To accommodate these specific needs, revisiting the original 12 factors and proposing amendments or additional factors tailored for machine learning operations is essential. This should allow more considerable cohesion among large teams.

Summary

By examining each of the 12 factors and adapting them to address the challenges machine learning deployments face, we can better guide the development and deployment of machine learning models. The proposed changes and additional factors will ensure smoother deployment, maintainability, and scalability of machine learning models while preserving the core principles of the 12-factor app methodology. This approach will help bridge the gap between traditional app deployment and the unique requirements of machine learning operations.

Up Next:

In upcoming posts, we will address these and how to adapt them while preserving its core principles. If you or your team has taken this challenge, we also would love to hear from you and include your experience in future posts.

Diverging Asymptotically

Discussion about this post