Книга: Machine Learning System Design
Назад: 3 Preliminary research
Дальше: Part 2 Early stage

4 Design document

This chapter covers

  • The most common myths around the design document
  • Defining antigoals for an even sharper focus on core objectives
  • Drafting a design document based on the information available
  • Reviewing a design document
  • The evolution of a design document

Once you have defined the problem your system should solve, as well as a list of stakeholders, and have a rough understanding of what technologies and solutions would be most appropriate for the product, as described in chapter 3, it is time to prepare a design document.

It is worth noting here that there is no set-in-stone order of actions at the early stage of creating a machine learning (ML) system. You can start preparing a design document as soon as you’ve identified the problem and goals (especially if you work in a startup, where the speed of delivery is often more important than following processes). But since this book is presented as a checklist, the list of actions is also displayed in a traditional sequence.

As one of the authors’ managers once said, no fancy recommendation algorithm can beat a customer with a shopping list. These people have a goal and a plan for achieving it. Nothing can stop them.

If you think about it, writing code is just providing a specific set of instructions to achieve a particular goal. In a sense, a design document is a meta-algorithm set to accomplish a specific goal, with the involvement of many subalgorithms. Still, the design document is being regularly challenged by many as either one of the horsemen of bureaucratization or a rudiment used out of inertia.

In this chapter, we will examine the most common myths around the design document. We will introduce and define the concept of antigoals as additional guidelines that lead you toward the project’s objectives, and we will start the practical part of this book, represented by two design documents based on close-to-real-life scenarios.

4.1 Common myths surrounding the design document

Over the years, the design document has seen a number of false assumptions and misinterpretations that may prevent you from putting together a well-organized paper appropriate for your project. Next we’ll examine the most common misconceptions and explain why you shouldn’t dwell on them.

4.1.1 Myth #1. Design documents work only for big companies but not startups

You could argue that dedicating part of your workload to preparing design documents would make only sense for big companies. There is a fair premise in this counterpoint: mature organizations need to invest more time and resources into writing design documents compared to a startup with a dozen employees. It doesn’t mean, though, that small companies should prepare no design documents at all: as a well-known quote says, “Plan is nothing; planning is everything.” The beauty of writing a design document lies in revealing blind spots in your vision, on both the product side and the technical side, which will save you a lot in the midterm, especially if you cut off irrelevant data. For the latter, we recommend applying the method we call “antigoals,” which we will dedicate a separate section to later.

When the book was in early access, there was a common comment shared by our early readers: “Well, this is good, but that is not how it works in startups.” While agreeing that startups’ delivery cadence is different, we still stick to the idea that the design phase is necessary. It is true that cofounders and early engineers can find their consensus during a coffee break, whereas a massive corporation would waste 6 months on the same scope. We also agree that writing formal docs may be inefficient, but that is not what we advocate for. A simple note with a short description can be enough as soon as you are sure it gets all collaborators on the same page. Ignoring good practices of software and ML engineering is fine while you’re hunting for a prize at a hackathon, but the hackathon style barely works at longer distances.

4.1.2 Myth #2. Design documents are efficient only for complex projects

There’s a grain of truth to this statement if you look at a design document in its classic sense: a large, labor-intensive effort involving every detail of the final product, from general scope to risk validation upon deployment. After all, the compilation of such a document alone can take more time than the lifetime of the project itself!

Typically, such an argument can come either from a person with a lack of flexibility or from an ardent opponent of the design document who is eager to use any argument in their favor.

Practice shows that even for smaller projects, a well-structured design document ensures early identification of potential risks, serves as a reference for future enhancements if the project eventually expands, and, most importantly, helps prevent scope creep when every other stakeholder is tempted to add just one more feature.

Even simple initiatives can benefit from a design document with a proportional level of detail.

4.1.3 Myth #3. Every design document should be based on a template

Many companies, especially well-established enterprises, maintain their recommended templates with a strict, rigid structure, and it can be useful, considering the scale of their businesses. However, we recommend avoiding setting design docs templates in stone. Based on our experience, the template should never be a sacred dogma. Such templates may try to serve too many goals all at once, thus getting bloated and discouraging people from preparing and studying those documents. That is why we recommend keeping the core template minimalistic and extending parts here and there depending on system-specific requirements and context.

At first glance, the process of creating a design document may seem straightforward and simple. In reality, right from the start, you will encounter a whole load of factors that will interfere with the process and set you several steps back if ignored.

Remember: your task is not to create a draft document and convince everyone of its purity and correctness. Your task is to find as many weak points as possible (including motivating your stakeholders to find them) so that eventually, after a number of iterations, you have a document that allows you to start developing your ML system.

4.1.4 Myth #4. Every design document should lead to a deployed system

If you are an engineer and need to build a machine, you need to start with a blueprint. Other engineers will review it and provide feedback, which will probably lead to another iteration of blueprints—and another and another until your design is finally ready to be brought to life.

The same principle applies to designing ML systems. An ML system is a highly complex machine of interconnected domains that requires thorough preparation when your design document undergoes multiple iterations before implementation. Still, more often than not, a good design document leads to no ML project at all.

This might sound absurd, but let’s imagine you’re set to choose between two options:

Realizing that 90% of results can be derived from two IF statements can be frustrating, but it is still much better than the first of the two options.

4.2 Goals and antigoals

One of the goals of a design document is to reduce uncertainty about a problem by setting cornerstones and boundaries. Before the document is drafted, the level of understanding of both the problem and the solution is low and inconsistent among all involved parties. A technique that can help address such a problem is using antigoals—inverse statements that can help us narrow down both the problem space and the solution space.

Each part of a design document can be viewed as an answer to multiple questions: what are the goals of a potential system, what are the key success criteria, what tech aspects should we focus on, how do we solve a given subproblem, etc. A rookie mistake would be to miss tradeoffs and enumerate endless goals for the system: for example, it should do X, Y, and Z; have high performance; be precise; be easy to maintain and cheap to develop; and be intuitively understandable. Obviously, it’s impossible to successfully fit all the good properties into one system, and you will require an approach to counterbalance this possible excessiveness.

Setting antigoals allows us to strike out the aspects we don’t really care about that much and additionally highlight those we see as crucial. Let’s say we’re building a system that will be used internally, and the output artifacts are various reports to be read by the executive team and analysts. We can assume right off the bat that processing time won’t be critical for such a system—it’s only necessary to make it work fast enough that reports are ready by morning. Thus, “processing time” will be the first to join the list of antigoals so that we don’t bother ourselves with this parameter. Or imagine building a recommendation engine for a boutique store: you sure won’t need to support millions of items if the current number of goods contains only three digits (see figure 4.1), meaning excessive productivity is a no-go for the end solution.

figure
Figure 4.1 A shop with <1,000 goods for sale and low traffic should not aim for scalability when building a recommendation system, as almost any tech solution can handle its load these days.

Antigoals like this help us focus only on important aspects and drop the ones that have no positive effect on reaching the main goal of the system.

The following example suggests what the lists of goals and antigoals would look like for a boutique store’s recommendation engine:

A similar logic is applicable to other blocks of a design document. If you formed an idea on implementation and later realized it had an intrinsic critical flaw, it would make sense to mention this issue in the document as a counterexample. Imagine you are designing a scalable system and considered using cloud infrastructure intensively until you learned that the biggest potential customer has strict limitations on using its own hardware for privacy reasons. In this case, a single sentence like “Cloud solution X could be a good option for data storage, but not applicable in this case because of Y’s cloud privacy restrictions” can set important limitations and may spark ideas on alternative tech implementations: “If X is fine from the technical perspective, are there open source X alternatives that can be installed on our own servers?”

Antigoals should not be considered the main source of information in your design document but can become a spice that adds a missing flavor, growing into an essential part of the document’s structure.

4.3 Design document structure

In this section, we could have focused on theoretical information about the contents and structure of the classic design document, but the truth is, a design document you prepare for an ML system will hardly rely on practices applied in traditional software development. On top of that, its structure may vary from company to company, so we do not think it makes sense to dwell on layout nuances. Instead, we recommend focusing more on what items need to be covered. Plus, our goal is to showcase the design document as an entity within ML system design. For that reason, starting with this section and for the rest of the book, at the end of each chapter there will be a large practical block representing a part of a design document that incorporates the main message from the given chapter. We see it as a crucial component of this book, which will go side by side with theory and campfire stories while offering an example of applying real-life solutions to problems.

In what follows, we will introduce you to two fictional cases, each with its own specifics, features, problems, and context. These two cases will form the basis of two different design documents, which will gradually grow and evolve from chapter to chapter, adding more depth and complexity. Eventually we will have two fully formed documents at our disposal.

In this section, we start to outline a design document for a project as it might have been written in real life. For this purpose, we introduce a fictional company, Supermegaretail, a retail company with a demand forecast project to launch.

In section 4.4, we give a very brief example of what the first chapter of a design document can look like. We will include only major topics; otherwise, it would not fit into a single book.

As you can see, even a brief overview of the problem to solve and research using the previously gathered data can easily force us to write a 10-page doc. This draft will help us decide if we need to go further or if it is better to stop right now and avoid a complicated ML solution.

The next section of this chapter is no less important: it gives a practical example of how to review a design document. If you’re new to ML system design, you probably haven’t reached the stage of your career where you have enough experience and credibility to be involved in this kind of working routine. However, stepping up to review your first design doc is just a matter of time, so it’s better to be prepared beforehand, and you will see some practical advice on the reviewing basics.

4.4 Reviewing a design document

Audi alteram partem [Let the other side be heard as well]
—Latin proverb

So far, we haven’t seen a draft design doc written by a single person that would be complete enough to implement right from the start. However, we’ve come across some really decent drafts, which is more than enough after the first iteration.

This fact is essential and quite easily explained. Complex systems require input from many people with diverse expertise and backgrounds. As a design document author, part of your job is to make it more manageable for all the involved parties to navigate. Outlining your doc with chapters and subchapters will help domain experts see where to go from the beginning. Otherwise, the natural reaction for most people when they see a 10+ page doc is to close it and forget it.

Here come the first two critical points: the design doc must be accessible and visible to as many people as possible and easy to navigate for all participants.

As soon as people start reviewing any kind of content, they begin to criticize and offer alternatives. As an author, you want to encourage this type of behavior. After all, what are the chances you had the best and most appropriate design after the first iteration?

Try to derive an explanation for each proposition/fixture, as they could emerge from different conditions:

Try to understand the reasoning behind every input and solicit additional information until you fully understand the reasons. From our personal experience, the least helpful input (on the first iteration) would sound like “looks good to me.” Try to find a part that looks the most questionable to you and ask the reviewer about it, expressing your concerns. A generally good practice would be to have a list of concerns, including things you are not sure of, to target reviewers’ attention and facilitate requests.

A popular failure mode for design documents is writing too generically. That is a huge drawback for a design doc, and often it is caused by the fact that a single person may not have enough context to fill in all the gaps. As an initial author, you need to facilitate the others’ inputs—for example, highlight some problematic areas with a lack of required information and encourage the reviewers to add missing parts of the puzzle.

We discussed how to create a design doc and what to expect from reviewers, but because the title of this section is “Reviewing a design document,” let’s try to reverse our suggestions and apply them from the reviewer’s standpoint:

4.4.1 Design document review example

The case we’ve chosen for the our second example design document is the stock photo company we mentioned in chapter 3. Meet PhotoStock Inc., where we’ve been hired to build a modern search tool that will be able to find the most relevant shots upon customer text queries while providing excellent performance and displaying the most relevant images in stock.

The business is effectively a marketplace: photographers join the platform and upload their shots; customers who are looking for specific images for illustrative purposes (editors, designers, ad professionals) purchase rights for these photos. The marketplace makes money through commission from sales. The company is highly interested in making an effective search system on its website.

We provide part of a raw and poorly written design document based on what we discussed in the previous chapters and comment on it as if we were reviewing the document. This time, text highlighted in italics represents reviewers’ comments.

You can see some patterns in the comments, such as

Early feedback at the design review stage can save a lot of time in the later stages. Questions should initiate and facilitate a healthy discussion and unlock better solutions and should never be aggressive or toxic.

4.5 A design doc is a living thing

This section was initially planned to be myth #5 in the list from the beginning of this chapter, but we believe this point is important enough to have its own spot as a separate section.

So why should there be no fear or hesitation in editing or criticizing a design doc at any stage? The answer is that a design doc is truly a living thing.

Usually, the evolution of design docs looks like this:

  1. First iteration
  2. Feedback from peers
  3. Rewrite 60% of the doc
  4. Feedback from peers
  5. Rewrite 30% of the doc
  6. Feedback from peers
  7. Rewrite 10% of the doc
  8. Start implementing the system
  9. (Three months later) input from the real world
  10. Rewrite 30% of the doc

With an evolution like that, you need to expect that the only time you could complete the design doc would be if you finished implementing the system, but even this is not guaranteed.

As soon as your system is implemented, life will expose its flaws, which you will have to address; or product managers decide they need new features, and the system has to be extended; or the government issues a new piece of legislation, which you have to consider; or there is an infrastructure migration or a new use case. You name it. To perform these changes, engineers need to understand the system and read design documents. By that time, a new pattern or technology could arise that perfectly fits the system.

If this is not the case, new features and refactoring need to be reflected in the design doc, bringing us to the design doc evolution mentioned earlier.

That is why a design document is never over. It is a living thing, as long as you have a service it describes. Even if you leave the company, others need to take the banner from you, if they don’t want to end up with a completely unsupportable system.

Rewriting a solid share of the design doc may seem discouraging, but it is something you can benefit from in the long run. For complex systems, it even makes sense to practice the “design it twice” approach—admit that your first design is likely not the best one, and design it twice, taking two radically different approaches. As practice shows, this approach can reveal hidden problems and opportunities. Let us quote A Philosophy of Software Design (Yaknyam, 2018) by John Outerhout (a nice book we recommend to feel the spirit of a good design):

I have noticed that the design-it-twice principle is sometimes hard for really smart people to embrace. When they are growing up, smart people discover that their first quick idea about any problem is sufficient for a good grade; there is no need to consider a second or third possibility. This makes it easy to develop bad work habits. However, as these people get older, they get promoted into environments with harder and harder problems. Eventually, everyone reaches a point where your first ideas are no longer good enough; if you want to get really great results, you have to consider a second possibility, or perhaps a third, no matter how smart you are. The design of large software systems falls in this category: no-one is good enough to get it right with their first try.

A good design (and a good design doc, respectively) should reduce various complexity aspects of the system, be it understanding, building, modifying, or maintaining. And if the system promises to be complex from the very beginning, spending additional time to reduce this complexity in advance via multiple iterations is often a good investment.

People who prefer building over thinking may feel irritated by this point: “Come on, first you suggest writing docs instead of writing code, and now you suggest doing it over and over again?” Well, it makes little sense to run many iterations once you no longer receive new information, and sometimes you can’t improve the design before some proof of concept is written. However, designing things twice is often a fair tradeoff between agility and preparedness.

Summary

Назад: 3 Preliminary research
Дальше: Part 2 Early stage