Книга: Machine Learning System Design
Назад: 1 Essentials of machine learning system design
Дальше: 3 Preliminary research

2 Is there a problem?

This chapter covers

  • Problem space and solution space: which comes first?
  • Defining a problem as the most important step
  • Defining risks and limitations
  • Costs of a mistake

To succeed in machine learning (ML) system design, you literally need to be an expert in multiple fields, including project management, ML and deep learning, leadership, product management, and software engineering. However, when stripped down to the bones, even the most complex and sophisticated solutions in ML system design will have the same framework and fundamentals as any other product.

The variety and amount of sheer knowledge gained in recent years gives you unprecedented freedom to choose exactly the approach you want toward your ML system, but no matter how refined the instruments of your choice are, they’re no more than implementation mediums.

What are the business goals? How big is the budget? How flexible are the deadlines? Will the potential output cover and exceed overall costs? These are among the crucial questions that you need to ask yourself before scoping your ML project.

But before you start addressing these questions, there is a paramount action that will lay the foundation for successful ML system design, and it’s finding and articulating the problem your solution will solve (or help solve). This is a seemingly trivial point, especially for skilled engineers, but based on our own experience in the area, skipping this step in your preliminary work is deceptively dangerous. It goes even further when we realize that some problems cannot be solved on a proper level, due to either the current state of available technologies or the aleatoric uncertainty of the ill-posed problem. While in the first case, the problem can be a candidate for the future solution (e.g., today’s level of text generation would seem totally unachievable for an ML engineer in the early 2010s), the second case means the problem should not be tackled at all (e.g., one cannot build an algorithm that can beat casino roulette).

In this chapter, we will cover the importance of knowing the problem before developing a solution; we will highlight risks and limitations you may face while defining a problem; and we will touch on what consequences can follow a mistakenly defined problem.

2.1 Problem space vs. solution space

I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail.
—Abraham Maslow, American psychologist

Imagine a boss coming to an engineer with an exciting new idea for a mind-blowing feature (we’ve all been there). For the sake of illustration, let’s make the example more specific. Steve works as an ML engineer in a growing SaaS company. Steve’s boss, Linda, just got back from a meeting with Jack, VP of sales, on a problem his team has been dealing with—too many customer leads with too few managers to handle them. Jack wonders if the ML team could come up with an AI solution that would automatically rank customer leads from best to worst based on potential profit for the company. This would help the sales team pick potential cash cows first and handle remaining leads residually. On paper, the feature looks stunning. It seems like a no-brainer!

Steve, a young but meticulous specialist, immediately has numerous questions regarding this project. What’s the due date for delivery? How big is the dataset of existing leads to build an ML model around? What’s the maximum time allowed to score a lead? What accuracy do we expect? What information do we have about each lead? How fast should the system be? What exactly does a “promising lead” imply? Which sales system do we integrate our solution with? After some back-and-forth Q&A, Steve knows the following:

Steve gets back to his desk and starts scoping the project. “Okay, this looks easy. We can frame it as a ranking or classification problem, craft some features, train a model, expose an API, integrate, and deploy—that should be it.” However, two things still bother him:

Three hours later, his browser is full of tabs with a few shot classification techniques and documentation on CRM API. He wants to suggest a precise time estimate on project delivery to his colleagues, but he’ll have a hard time doing that because of one crucial mistake that may cost a lot at the early stage: while thinking and asking questions, he focused on the solution space, not the problem space.

To Steve’s understanding, the information he received was more than enough to come up with a suitable solution, while in reality, it was just the tip of the iceberg. The remaining context could only be discovered by asking numerous specifying questions of multiple people involved in the project.

What are the problem space and solution space? These are two exploration paradigms that cover different perspectives of a problem. While both are crucial, the former should always precede the latter (figure 2.1).

figure
Figure 2.1 An experienced engineer always handles the problem space first with specifying questions.

The problem space is often defined with “what?” and “why?” questions, often even with chains of such questions. There is even a popular technique named “Five Whys” that recommends stacking your “why?” questions on top of each other to dig to the very origin of the problem you’re analyzing. Typical questions often look like this:

After exploration, you are expected to have an understanding of what you should build and why.

The “what?” part, in its turn, is about understanding the customer and functional attributes (figure 2.2)—for example, “A tool that annotates customer leads with a score showing how likely it is that the deal will happen; it should assign the scores before sales managers plan their work at a Monday weekly meeting.”

figure
Figure 2.2 The questions you must ask before starting your project and the crucial difference between them

In some companies, asking these questions is a job done solely by product managers. However, it’s not very productive for engineers to exclude themselves from problem space analysis, as a proper understanding of problems affects the final result immensely.

The solution space is somewhat the opposite. It’s less about the problem and customer needs and more about the implementation. Here, we talk about frameworks and interfaces, discuss how things work under the hood, and consider technical risks. However, implementation should never be done before we reach a consistent understanding of a problem.

Reaching a solid understanding before thinking of a technical implementation allows you to consider various workarounds, some of which may significantly reduce the project scope. Maybe there is a third-party plugin for CRM that is designed to solve this problem. Maybe the cost of errors for the ML part of such a problem is not really that important despite Jack’s first answer (stakeholders often start with the statement they need accuracy close to 100%!). Maybe the data shows that 95% of empty leads can be filtered out with simple rule-based heuristics. All of these assumptions lie outside the story, but if proven, each of them is an essential part of the overall context. It is unveiling this context that will give you insight into the problem.

There are two reasons why we began the chapter with Steve’s story. First, it’s common and will most probably resonate with you in one way or another. Second, it is applicable for any scenario, be it building a new system, modifying an existing solution, or passing an interview in a tech company.

Third, and most important, the scale and effect of consequences that derive from this kind of approach can be damaging to varying degrees:

All these cases require understanding the problem first.

2.2 Finding the problem

Organizations which design systems (in the broad sense used here) are constrained to produce designs which are copies of the communication structures of these organizations.
—Melvin E. Conway

Some old-school enterprise companies still keep the culture that encourages ordinary engineers to focus on low-level implementation (just coding) and leave the design (including problem understanding and decomposition) to architects and system analysts. From our experience, due to increasing flexibility requirements, this culture is disappearing rapidly, giving way to more horizontal structures with more problem understanding delegated to individual contributors.

This means engineers don’t have to be solid experts in the domain (it can be too complicated for a person without a proper background). The reason is simple: it’s hard to learn the nuances of building a stock exchange or manufacturing quality control between meetings, code reviews, and training new state-of-the-art neural networks. But having a broad understanding is a must before starting an ML system design.

We encourage you to write down a problem statement using an inverted pyramid scheme with a high-level understanding in its basement and nuances at the top. It is a common and effective top-down approach that will help you gather as much general information as possible, determine what data is most valuable to your project, and then, using point-by-point leading questions, delve into the specifics of a problem (figure 2.3).

figure
Figure 2.3 The inverted pyramid scheme is an approach we recommend for gathering data required for a successful project launch.

On the very top level, you can formulate the helicopter-view understanding of the problem. That’s the level understandable to any C-level office of the organization where people don’t care too much about ML algorithms or software architecture—for example:

Having such a statement at the start gives many opportunities for the next exploration steps. Just try to question every word in a given sentence to make sure you can explain it to a 10-year-old child. Who are fraudsters? How do they attack? What report gave the initial insight about excessive prices? What bothers our customers the most? Where is the most time wasted? How do we measure user engagement? How are recommendations related to this metric? Ask yourself or your colleagues questions until you’re ready to build the next, broader block of the pyramid that expands the initial one.

This next pyramid block requires more specific, well-thought-out questions. One of the successful techniques is looking for the origin of the previous-level answers. How do we decide this behavior was fraudulent? What kind of manual tuning do our customers have to perform? How are user engagement and recommendation engine performance currently correlated?

An even more powerful technique involves looking for inconsistencies in answers; people tend to group objects based on their similarity and distinguish objects based on their differences. There may be similar users; some are considered spammers and should be banned, while others are still legit, even if their behavior may overlap with that of a person outside of the problem domain. For an uninformed observer, the same added margin for similar goods may be acceptable or not, but what are the criteria? An engineer here doesn’t have to find all the splitting criteria in a problem statement (they’re not decision trees), but that’s a good field to catch crucial signals and generate insights. This can be summarized by the following statement: trying to understand what people want is important; trying to understand what they need is critical.

Be sure to involve all the interested parties in the process. It’s not only your boss or product manager who cares about the project; you’re likely to have multiple stakeholders (it is crucial to understand which of the stakeholders is responsible for budgets and will be a point of approval for a given component of the system). Often, it is recommended to chat with experts at different levels to envelop both strategic and tactical perspectives. A high-level executive knows much about the goal of a given initiative. On the other hand, individual contributors who currently handle the absence of a designed system know tricks and details that may substantially affect the design.

Once you feel confident enough to explain the problem in simple terms, it’s time to wrap it up. We recommend writing down your problem understanding. Usually, it’s several paragraphs of text, but this text will eventually become the cornerstone of your design document. Don’t polish it too much for now; it’s just your first (although very important) step.

The importance of this step may vary, depending on the organization or environment. Sometimes the problem is easy to understand and, usually, very hard to solve—this is a common case in established competitive markets. Another side of the spectrum is startups that disrupt existing markets; here, the initial understanding of the disruption is rarely correct. One of the authors worked at a company where up to 50% of his time on a project was spent on defining goals and relevant context. After the context was clear, the ML engineering part of the project was smooth and straightforward.

Once the problem statement is explicit enough, it’s time to think about what we, as ML engineers, can do with it.

2.2.1 How we can approximate a solution through an ML system

Inexperienced or just hasty engineers often first try to drag the problem directly into a Procrustean bed of well-known ML algorithm families like supervised or unsupervised learning or a classification or regression problem. We don’t think it’s the best way to start.

For an external observer, an ML model is like a magic oracle: a universal machine that can answer any properly formulated question. Your job as an ML engineer would be to approximate its behavior—build this oracle using ML algorithms—but before mimicking it, we need to find the right question and teach users to ask it. In less metaphoric words, here we reframe a business problem into a software/ML problem.

Some questions may seem very straightforward:

Even with the metaphor of a magical oracle, we often had to leave multiple remarks that affected this potential answer. We’ll pay attention to similar details and remarks here and there in the book, but the highlight here is the following: there may be no single simple answer for the problem, and your ML system design must be aware of it in advance.

In our pricing example, there may be a spectrum of goals, from maximizing profit right here right now to growing the company in the long run. A good ML system would be able to adapt to a specific point in this spectrum. In the following chapters, we will discuss the tech aspects of doing so.

Many ML practitioners, including the famous Andrew Ng, a renowned AI expert, professor at Stanford University, and founder of Landing AI, suggest using a heuristic of a human expert: let’s build a system that answers in the same manner as the expert in the area would. It works for many domains (health care is a great example) and sets the bar of an early understanding of how solvable problems are with AI approaches. Unfortunately, it comes with disadvantages as well: there are problems where machines perform better than people. Such problems usually happen in domains when data is represented as a log of events (often a human behavior), not something carefully labeled. It’s easy to find such cases in the ad tech and finance industries. So human-level performance may be a fair bar to reach, but it’s not always the case.

And only after the question is clear does it make sense to dig into the way of algorithm approximation and draft a model capable of doing it. It doesn’t have to be a single model: a pipeline of various models or algorithms is often a legit tradeoff. We will cover problem decompositioning as part of a preliminary search covered in the next chapter.

2.3 Risks, limitations, and possible consequences

Imagine you’ve built a fraud detection system: it scores user activity and prevents malicious events by suspending risky accounts. It’s a precious thing—zero fraudsters have come through since its launch, and the customer success team is happy. But recently, the marketing team launched a big ad campaign, and your perfect fraud detector banned a fair share of new users based on their traffic source (it’s unknown and therefore somewhat suspicious, according to your algorithms). Negative effects on marketing could have been way more significant than the efficiency in detecting fraud activity.

You may find this example obvious and not worth attention. However, the reality is ruthless: situations like this often happen in companies where teams are misaligned, and that was one of the risks you should have kept in mind while designing the system. You shouldn’t think, “Our team is professional; a failure like that just can’t happen here.” So explicit thinking about risks is the way to go, as there’s a high chance of potential risks spreading beyond the project team or a single department.

With great power comes great responsibility—this popular proverb is very applicable to ML software. ML is no doubt powerful. But besides the power, it has one more important and dangerous attribute, which is opaqueness for most observers, especially when the model under the hood is complicated. Thus, professional system designers should be aware of potential risks and existing limitations.

Software development classics suggest the idea of functional and nonfunctional requirements. In short, functional requirements are about the functionality of a new feature or system, its value, and its user flow, while nonfunctional requirements are about aspects like performance, security, portability, and so on. In other words, functional requirements determine what we should design, and nonfunctional requirements shape the understanding of how it should work under the hood. So when we talk about potential risks and limitations, we effectively gather nonfunctional requirements.

The cornerstone of any defensive strategy is a risk model. Simply put, it’s an answer to the “What are we protecting from?” question. What are the worst scenarios possible, and what should we avoid? Answers like “incorrect model prediction” are not informative at all. A detailed understanding aligned with all possible stakeholders is absolutely required.

Understanding the risks and limitations will affect many future decisions, and we will cover this later in chapters dedicated to datasets, metrics, reporting, and fallback. Before we do, though, we’d like to give a couple of examples displaying how considering (or ignoring) valuable data can affect your goal setting.

2.4 Costs of a mistake

When talking about the costs of a mistake, we’d like to quote Steve McConnell, who precisely defines the difference between robustness and correctness in his book Code Complete (2nd ed., Microsoft Press, 2004) using examples of building an X-ray machine and a video game:

As the video game and X-ray examples show us, the style of error processing that is most appropriate depends on the kind of software the error occurs in. These examples also illustrate that error processing generally favors more correctness or more robustness. Developers tend to use these terms informally, but, strictly speaking, these terms are at opposite ends of the scale from each other. Correctness means never returning an inaccurate result; returning no result is better than returning an inaccurate result. Robustness means always trying to do something that will allow the software to keep operating, even if that leads to results that are inaccurate sometimes.
Safety-critical applications tend to favor correctness over robustness. It is better to return no result than to return a wrong result. The radiation machine is a good example of this principle. Consumer applications tend to favor robustness to correctness. Any result whatsoever is usually better than the software shutting down. The word processor I’m using occasionally displays a fraction of a line of text at the bottom of the screen. If it detects that condition, do I want the word processor to shut down?

This concept is even more applicable to ML systems, as they tend to be obscure for both developers and end users. A set of if and while statements is easier to keep in mind compared to enormous sequences of matrix multiplication in modern deep neural networks.

Imagine you’re building an entertainment app like an AR mask for Snap or TikTok. In the worst case, the added effect will look ugly for a frame—not a big risk, so robustness is a proper approach here. The opposite case is an ML solution for medical or transport needs. Would you prefer a self-driving car that just moves forward when it’s not sure if there is a pedestrian nearby? Definitely not: that’s why you want to opt for correctness here.

We’ll talk more about this tradeoff and practical aspects of it in the third part of the book. At this point, we should only mention that understanding the costs of mistakes is one of the critical points in gathering predesign information. This is effectively a quantitative development of the risks concept: for risks, we define what can go wrong and what we want to avoid and later try to assign numerical attributes. A numerical aspect may vary greatly depending on a problem and doesn’t have to be precise at this point, but it’s essential for shaping the landscape.

From our experience, people often tend to think more about positive scenarios, while in reality, negative outcomes require more attention. The logic is simple: usually any system has one (or a few) positive scenarios, and many failure modes are considered negative scenarios. Of course, the probability of each failure mode is usually way less compared to the probability of a good outcome, but it’s not always the case if we measure expected values. Imagine a trading system that makes a few cents in 99% of deals and loses the whole capital with the probability of 0.1% or, to be more dramatic, a medical diagnostic system that saves 3 minutes per patient for highly paid doctors but misses a serious but curable disease for every 1,000th patient.

Some mistakes, though, can be harmless or even positive. Back in 2018, Arseny worked in a company making an AR application—a virtual try-on for footwear. The app allowed the user to see how a pair of shoes looked on their feet before purchasing it. One of the first versions of the app contained an underfitted model responsible for foot detection and tracking. As a result, shoes were often rendered not only on human feet but also on top of pet paws and even toys. Many of the early users found it hilarious, so the cost of such a mistake was not significant. But, as the time went on, the effect disappeared after the model performance was improved for more conventional-user scenarios.

While estimating the cost of a mistake, you should also remember there may be second-order consequences. For example, your antifraud system might ban too many legitimate users today, and tomorrow they may spread this by word of mouth about your app (“Never use it; they banned me for nothing”), which may bury your growth potential. Your recommendation system provides unrelated suggestions, and later you end up training a new model based on logs of rare clicks on such a poor recommendation, thus falling into a negative feedback loop.

Another classic example of the cost of a mistake is credit risk scoring, a common task that can be found in almost any bank. Before being accepted or rejected, a borrower’s application is usually processed by an ML-based system to output the risk score. This risk score can be either 1/0 (with a specified threshold) or vary and be continuous between 0 and 1.

Obviously, the cost of giving a loan to a potentially defaulting client and providing no loan of the same amount to a customer who would repay it successfully is not the same. How many people does the system need to repay the loan to the bank to outweigh one person who would go bankrupt? Shall we count people/credits given or the amount of money lent? Do we expect this ratio to be constant over time? Answering all of these questions and taking this information into account greatly increases the chance of a project being considered successful.

What does it mean for a person designing an ML system? Identifying the risk landscape helps us understand what kind of problems are to be avoided. Some errors are almost harmless, some can greatly affect business, and some can be life-threatening. A proper understanding of the costs of a mistake with regard to the system being designed is critical for the next steps, as it shapes requirements for reliability and data gathering, suggests better metrics, and may affect other aspects of design.

Summary

Назад: 1 Essentials of machine learning system design
Дальше: 3 Preliminary research