Deadlocks - Effective Rust - David Drysdale - RutLib.com - Ваша домашняя библиотека

Книга: Effective Rust

Назад: Loop Transformation

Дальше: Item 24: Re-export dependencies whose types appear in your API

balance_ -= amount;

return true;

}

private:

mutable std::mutex mu_; // protects balance_

int64_t balance_;

};

The internal balance_ field is now protected by a mutex mu_: a synchronization

object that ensures that only one thread can successfully hold the mutex at a time. A

caller can acquire the mutex with a call to std::mutex::lock(); the second and sub‐

sequent callers of std::mutex::lock() will block until the original caller invokes

std::mutex::unlock(), and then one of the blocked threads will unblock and pro‐

ceed through std::mutex::lock().

All access to the balance now takes place with the mutex held, ensuring that its value

is consistent between check and modificais also worth highlighting: it’s an RAII class (see t calls lock() on creation and

unlock() on destruction. This ensures that the mutex is unlocked when the scope

7 The third category of behavior is thread-hostile: code that’s dangerous in a multithreaded environment even if all access to it is externally synchronized.

148 | Chapter 3: Concepts

exits, reducing the chances of making a mistake around balancing manual lock()

and unlock() calls.

However, the thread safety here is still fragile; all it takes is one erroneous modifica‐

tion to the class:

// Add a new C++ method...

void pay_interest(int32_t percent) {

// ...but forgot about mu_

int64_t interest = (balance_ * percent) / 100;

balance_ += interest;

}

and the thread safety has been destroyed.

Data races in Rust

For a book about Rust, this Item has covered a lot of C++, so consider a straightfor‐

ward translation of this class into Rust:

pub struct BankAccount {

balance: i64,

}

impl BankAccount {

pub fn new() -> Self {

BankAccount { balance: 0 }

}

pub fn balance(&self) -> i64 {

if self.balance < 0 {

panic!("** Oh no, gone overdrawn: {}", self.balance);

}

self.balance

}

pub fn deposit(& mut self, amount: i64) {

self.balance += amount

}

pub fn withdraw(& mut self, amount: i64) -> bool {

if self.balance < amount {

return false;

}

self.balance -= amount;

true

}

8 The Clang C++ compiler includes a option, sometimes known as annotalysis, that allows data to be annotated with information about which mutexes protect which data, and functions to be annotated with information about the locks they acquire. This gives compile-time errors when these invariants are broken, like Rust; however, there is nothing to enforce the use of these annotations in the first place—for example, when a thread-compatible library is used in a multithreaded environment for the first time.

Item 17: Be wary of shared-state parallelism | 149

along with the functions that try to pay into or withdraw from an account forever:

pub fn pay_in(account: & mut BankAccount) {

loop {

if account.balance() < 200 {

println!("[A] Running low, deposit 400");

account.deposit(400);

}

std::thread::sleep(std::time::Duration::from_millis(5));

}

pub fn take_out(account: & mut BankAccount) {

loop {

if account.withdraw(100) {

println!("[B] Withdrew 100, balance now {}", account.balance());

} else {

println!("[B] Failed to withdraw 100");

}

std::thread::sleep(std::time::Duration::from_millis(20));

}

This works fine in a single-threaded context—even if that thread is not the main

thread:

{

let mut account = BankAccount::new();

let _payer = std::thread::spawn(move || pay_in(& mut account));

// At the end of the scope, thè_payer` thread is detached

// and is the sole owner of thèBankAccount`.

}

but a naive attempt to use the BankAccount across multiple threads:

D O E S N O T C O M P I L E

{

let mut account = BankAccount::new();

let _taker = std::thread::spawn(move || take_out(& mut account));

let _payer = std::thread::spawn(move || pay_in(& mut account));

}

immediately falls foul of the compiler:

error[E0382]: use of moved value: àccount`

--> src/main.rs:102:41

100 | let mut account = BankAccount::new();

| ----------- move occurs because àccount` has type

| `broken::BankAccount`, which does not implement the

150 | Chapter 3: Concepts

| `Copy` trait

101 | let _taker = std::thread::spawn(move || take_out(&mut account));

| ------- ------- variable

| | moved due to

| | use in closure

| |

| value moved into closure here

102 | let _payer = std::thread::spawn(move || pay_in(&mut account));

| ^^^^^^^ ------- use occurs due

| | to use in closure

| |

| value used here after move

The rules of the borrow checker () make the problem clear: there are two

mutable references to the same item, one more than is allowed. The rules of the bor‐

row checker are that you can have a single mutable reference to an item, or multiple

(immutable) references, but not both at the same time.

This has a curious resonance with the definition of a data race at the start of this Item:

enforcing that there is a single writer, or multiple readers (but never both), means

that there can be no data races. By enforcing memory safety

As with C++, some kind of synchronization is needed to make this struct thread-

safe. ust version “wraps”

the protected data rather than being a standalone object (as in C++):

pub struct BankAccount {

balance: std::sync::Mutex< i64>,

}

The Mutex object with RAII behavior, like C++’s std::lock_guard: the mutex is automatically released at the end

of the scope when the guard is dropped. (In contrast to C++, Rust’s Mutex has no

methods that manually acquire or release the mutex, as they would expose developers

to the danger of forgetting to keep these calls exactly in sync.)

To be more precise, lock() actually returns a Result that holds the MutexGuard, to

cope with the possibility that the Mutex has been poisoned. Poisoning happens if a

thread fails while holding the lock, because this might mean that any mutex-protected

invariants can no longer be relied on. In practice, lock poisoning is sufficiently rare

(and it’s sufficiently desirable that the program terminates when it happens) that it’s

common to just .unwrap() the Result (despite the advice in

The MutexGuard object also acts as a proxy for the data that is enclosed by the Mutex,

by implementing the Deref and DerefMut), allowing it to be used both

for read operations:

Item 17: Be wary of shared-state parallelism | 151

impl BankAccount {

pub fn balance(&self) -> i64 {

let balance = *self.balance.lock().unwrap();

if balance < 0 {

panic!("** Oh no, gone overdrawn: {}", balance);

}

balance

}

and for write operations:

impl BankAccount {

// Note: no longer needs `&mut self`.

pub fn deposit(&self, amount: i64) {

*self.balance.lock().unwrap() += amount

}

pub fn withdraw(&self, amount: i64) -> bool {

let mut balance = self.balance.lock().unwrap();

if *balance < amount {

return false;

}

*balance -= amount;

true

}

There’s an interesting detail lurking in the signatures of these methods: although they

are modifying the balance of the BankAccount, the methods now take &self rather

than &mut self. This is inevitable: if multiple threads are going to hold references to

the same BankAccount, by the rules of the borrow checker, those references had better

not be mutable. It’s also another instance of the interior mutability pattern described

in : borrow checks are effectively moved from compile time to runtime but

now with cross-thread synchronization behavior. If a mutable reference already

exists, an attempt to get a second blocks until the first reference is dropped.

Wrapping up shared state in a Mutex mollifies the borrow checker, but there are still

) to fix:

D O E S N O T C O M P I L E

{

let account = BankAccount::new();

let taker = std::thread::spawn(|| take_out(&account));

let payer = std::thread::spawn(|| pay_in(&account));

// At the end of the scope, àccountìs dropped but

// thè_takerànd `_payer` threads are detached and

// still hold (immutable) references to àccount`.

}

152 | Chapter 3: Concepts

error[E0373]: closure may outlive the current function, but it borrows àccount`

which is owned by the current function

--> src/main.rs:206:40

206 | let taker = std::thread::spawn(|| take_out(&account));

| ^^ ------- àccountìs

| | borrowed here

| |

| may outlive borrowed value àccount`

note: function requires argument type to outlivè'static`

--> src/main.rs:206:21

206 | let taker = std::thread::spawn(|| take_out(&account));

| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

help: to force the closure to take ownership of àccount` (and any other

referenced variables), use thèmovè keyword

206 | let taker = std::thread::spawn(move || take_out(&account));

| ++++

error[E0373]: closure may outlive the current function, but it borrows àccount`

which is owned by the current function

--> src/main.rs:207:40

207 | let payer = std::thread::spawn(|| pay_in(&account));

| ^^ ------- àccountìs

| | borrowed here

| |

| may outlive borrowed value àccount`

note: function requires argument type to outlivè'static`

--> src/main.rs:207:21

207 | let payer = std::thread::spawn(|| pay_in(&account));

| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

help: to force the closure to take ownership of àccount` (and any other

referenced variables), use thèmovè keyword

207 | let payer = std::thread::spawn(move || pay_in(&account));

| ++++

The error message makes the problem clear: the BankAccount is going to be dropped

at the end of the block, but there are two new threads that have a reference to it and

that may carry on running afterward. (The compiler’s suggestion for how to fix the

problem is less helpful—if the BankAccount item is moved into the first closure, it will

no longer be available for the second closure to receive a reference to it!)

The standard tool for ensuring that an object remains active until all references to it

are gone is a reference-counted pointer, and Rust’s variant of this for multithreaded

use is

Item 17: Be wary of shared-state parallelism | 153

let account = std::sync::Arc::new(BankAccount::new());

account.deposit(1000);

let account2 = account.clone();

let _taker = std::thread::spawn(move || take_out(&account2));

let account3 = account.clone();

let _payer = std::thread::spawn(move || pay_in(&account3));

Each thread gets its own copy of the reference-counting pointer, moved into the clo‐

sure, and the underlying BankAccount will be dropped only when the refcount drops

to zero. This combination of Arc<Mutex<T>> is common in Rust programs that use

shared-state parallelism.

Stepping back from the technical details, observe that Rust has entirely avoided the

problem of data races that plagues multithreaded programming in other languages.

Of course, this good news is restricted to safe Rust—unsafe code () and FFI

y not be data-race free—but it’s still a remarka‐

ble phenomenon.

Standard marker traits

There are two standard traits that affect the use of Rust objects between threads. Both

of these traits are marker traits () that have no associated methods but have special significance to the compiler in multithreaded scenarios:

• trait indicates that items of a type are safe to transfer between threads; ownership of an item of this type can be passed from one thread to another.

• trait indicates that items of a type can be safely accessed by multiple threads, subject to the rules of the borrow checker.

Another way of saying this is to observe that Send means T can be transferred

between threads, and Sync means that &T can be transferred between threads.

Both of these traits are piler automatically derives them for new types, as long as the constituent parts of the type also implement Send/Sync.

The majority of safe types implement Send and Sync, so much so that it’s clearer to

understand what types don’t implement these traits (written in the form impl !Sync

for Type).

A type that doesn’t implement Send is one that can be used only in a single thread.

The canonical example of this is the unsynchronized reference-counting pointer

plementation of this type explicitly assumes single-threaded use (for speed); there is no attempt at synchronizing the internal refcount for multi-154 | Chapter 3: Concepts

threaded use. As such, transferring an Rc<T> between threads is not allowed; use

Arc<T> (with its additional synchronization overhead) for this case.

A type that doesn’t implement Sync is one that’s not safe to use from multiple threads

via non-mut references (as the borrow checker will ensure there are never multiple

mut references). The canonical examples of this are the types that provide interior

mutability in an unsynchronized way. Use Mutex<T> or to provide interior mutability in a multithreaded environment.

Raw pointer types like *const T and *mut T also implement neither Send nor Sync;

see I.

Deadlocks

Now for the bad news. Although Rust has solved the problem of data races (as previ‐

ously described), it is still susceptible to the second terrible problem for multithreaded

code with shared state: deadlocks.

Consider a simplified multiple-player game server, implemented as a multithreaded

application to service many players in parallel. Two core data structures might be a

collection of players, indexed by username, and a collection of games in progress,

indexed by some unique identifier:

struct GameServer {

// Map player name to player info.

players: Mutex<HashMap<String, Player>>,

// Current games, indexed by unique game ID.

games: Mutex<HashMap<GameId, Game>>,

}

Both of these data structures are Mutex-protected and so are safe from data races.

However, code that manipulates both data structures opens up potential problems. A

single interaction between the two might work fine:

impl GameServer {

/// Add a new player and join them into a current game.

fn add_and_join(&self, username: & str, info: Player) -> Option<GameId> {

// Add the new player.

let mut players = self.players.lock().unwrap();

players.insert(username.to_owned(), info);

// Find a game with available space for them to join.

let mut games = self.games.lock().unwrap();

for (id, game) in games.iter_mut() {

if game.add_player(username) {

return Some(id.clone());

}

Item 17: Be wary of shared-state parallelism | 155

None

}

However, a second interaction between the two independently locked data structures

is where problems start:

impl GameServer {

/// Ban the player identified by ùsernamè, removing them from

/// any current games.

fn ban_player(&self, username: & str) {

// Find all games that the user is in and remove them.

let mut games = self.games.lock().unwrap();

games

.iter_mut()

.filter(|(_id, g)| g.has_player(username))

.for_each(|(_id, g)| g.remove_player(username));

// Wipe them from the user list.

let mut players = self.players.lock().unwrap();

players.remove(username);

}

To understand the problem, imagine two separate threads using these two methods,

where their execution happens in the order shown in

Table 3-1. Thread deadlock sequence

Thread 1

Thread 2

Enters add_and_join() and immediately

acquires the players lock.

Enters ban_player() and immediately

acquires the games lock.

Tries to acquire the games lock; this is held

by thread 2, so thread 1 blocks.

Tries to acquire the players lock; this is held

by thread 1, so thread 2 blocks.

At this point, the program is deadlocked: neither thread will ever progress, nor will

any other thread that does anything with either of the two Mutex-protected data

structures.

The root cause of this is a lock inversion: one function acquires the locks in the order

players then games, whereas the other uses the opposite order (games then players).

This is a simple example of a more general problem; the same situation can arise with

longer chains of nested locks (thread 1 acquires lock A, then B, then it tries to acquire

C; thread 2 acquires C, then tries to acquire A) and across more threads (thread 1

locks A, then B; thread 2 locks B, then C; thread 3 locks C, then A).

156 | Chapter 3: Concepts

A simplistic attempt to solve this problem involves reducing the scope of the locks, so

there is no point where both locks are held at the same time:

/// Add a new player and join them into a current game.

fn add_and_join(&self, username: & str, info: Player) -> Option<GameId> {

// Add the new player.

{

let mut players = self.players.lock().unwrap();

players.insert(username.to_owned(), info);

}

// Find a game with available space for them to join.

{

let mut games = self.games.lock().unwrap();

for (id, game) in games.iter_mut() {

if game.add_player(username) {

return Some(id.clone());

}

None

}

/// Ban the player identified by ùsernamè, removing them from

/// any current games.

fn ban_player(&self, username: & str) {

// Find all games that the user is in and remove them.

{

let mut games = self.games.lock().unwrap();

games

.iter_mut()

.filter(|(_id, g)| g.has_player(username))

.for_each(|(_id, g)| g.remove_player(username));

}

// Wipe them from the user list.

{

let mut players = self.players.lock().unwrap();

players.remove(username);

}

(A better version of this would be to encapsulate the manipulation of the players

data structure into add_player() and remove_player() helper methods, to reduce

the chances of forgetting to close out a scope.)

This solves the deadlock problem but leaves behind a data consistency problem: the

players and games data structures can get out of sync with each other, given an exe‐

cution sequence like the one shown in

Item 17: Be wary of shared-state parallelism | 157

Table 3-2. State inconsistency sequence

Thread 1

Thread 2

Enters add_and_join("Alice") and adds Alice to the

players data structure (then releases the players lock).

Enters ban_player("Alice") and removes Alice

from all games (then releases the games lock).

Removes Alice from the players data structure; thread

1 has already released the lock, so this does not block.

Carries on and acquires the games lock (already released by

thread 2). With the lock held, adds “Alice” to a game in progress.

At this point, there is a game that includes a player that doesn’t exist, according to the

players data structure!

The heart of the problem is that there are two data structures that need to be kept in

sync with each other. The best way to do this is to have a single synchronization

primitive that covers both of them:

struct GameState {

players: HashMap<String, Player>,

games: HashMap<GameId, Game>,

}

struct GameServer {

state: Mutex<GameState>,

// ...

}

Advice

The most obvious advice for avoiding the problems that arise with shared-state paral‐

lelism is simply to avoid shared-state parallelism. The quotes from the

unicate by sharing memory; instead, share memory by communicating.”

The Go language has channels that are suitable for this ; for Rust, equivalent functionality is included in the standard library in the

returns a (Sender, Receiver) pair that allows values of a particular type to be communicated between threads.

If shared-state concurrency can’t be avoided, then there are some ways to reduce the

chances of writing deadlock-prone code:

• Put data structures that must be kept consistent with each other under a single lock.

• Keep lock scopes small and obvious; wherever possible, use helper methods that

get and set things under the relevant lock.

158 | Chapter 3: Concepts

• Avoid invoking closures with locks held; this puts the code at the mercy of whatever closure gets added to the codebase in the future.

• Similarly, avoid returning a MutexGuard to a cal er: it’s like handing out a loaded

gun, from a deadlock perspective.

• Include deadlock detection tools in your CI system (

• As a last resort: design, document, test, and police a locking hierarchy that

describes what lock orderings are allowed/required. This should be a last resort

because any strategy that relies on engineers never making a mistake is likely to

be doomed to failure in the long term.

More abstractly, multithreaded code is an ideal place to apply the following general

advice: prefer code that’s so simple that it is obviously not wrong, rather than code

that’s so complex that it’s not obviously wrong.

Item 18: Don’t panic

It looked insanely complicated, and this was one of the reasons why the snug plastic cover it

fitted into had the words DON’T PANIC printed on it in large friendly letters.

—Douglas Adams

The title of this Item would be more accurately described as prefer returning a Result

to using panic! (but don’t panic is much catchier).

Rust’s panic mechanism is primarily designed for unrecoverable bugs in your pro‐

gram, and by default it terminates the thread that issues the panic!. However, there

are alternatives to this default.

In particular, newcomers to Rust who have come from languages that have an excep‐

tion system (such as Ja

as a way to simulate exceptions, because it appears to provide a mechanism for catch‐

ing panics at a point further up the call stack.

Consider a function that panics on an invalid input:

fn divide(a: i64, b: i64) -> i64 {

if b == 0 {

panic!("Cowardly refusing to divide by zero!");

}

a / b

}

Trying to invoke this with an invalid input fails as expected:

// Attempt to discover what 0/0 is...

let result = divide(0, 0);

Item 18: Don’t panic | 159

thread 'main' panicked at 'Cowardly refusing to divide by zero!', main.rs:11:9

note: run with `RUST_BACKTRACE=1ènvironment variable to display a backtrace

A wrapper that uses catch_unwind to catch the panic:

fn divide_recover(a: i64, b: i64, default: i64) -> i64 {

let result = std::panic::catch_unwind(|| divide(a, b));

match result {

Ok(x) => x,

Err(_) => default,

}

appears to work and to simulate catch:

let result = divide_recover(0, 0, 42);

println!("result = {result}");

result = 42

Appearances can be deceptive, however. The first problem with this approach is that

panics don’t always unwind; there is a (which is also accessible via a Cargo.toml ) that shifts panic behavior so that it immediately aborts the process:

thread 'main' panicked at 'Cowardly refusing to divide by zero!', main.rs:11:9

note: run with `RUST_BACKTRACE=1ènvironment variable to display a backtrace

/bin/sh: line 1: 29100 Abort trap: 6 cargo run --release

This leaves any attempt to simulate exceptions entirely at the mercy of the wider

project settings. It’s also the case that some target platforms (for example, WebAssem‐

bly) always abort on panic, regardless of any compiler or project settings.

A more subtle problem that’: if a panic occurs midway through an operation on a data structure, it removes any guarantees

that the data structure has been left in a self-consistent state. Preserving internal

invariants in the presence of exceptions has been known to be extremely difficult

this is one of the main reasons why

Finally, panic propagation also with FFI (foreign function interface) boundaries (); use catch_unwind to prevent panics in Rust code from propagat-ing to non-Rust cal ing code across an FFI boundary.

So what’s the alternative to panic! for dealing with error conditions? For library

code, the best alterna, by returning a Result with an appropria). This allows the library user to make

9 Tom Cargill’s 1994 plate code, as does Herb Sutter’.

160 | Chapter 3: Concepts

their own decisions about what to do next—which may involve passing the problem

on to the next caller in line, via the ? operator.

The buck has to stop somewhere, and a useful rule of thumb is that it’s OK to panic!

(or to unwrap(), expect(), etc.) if you have control of main; at that point, there’s no

further caller that the buck could be passed to.

Another sensible use of panic!, even in library code, is in situations where it’s very

rare to encounter errors, and you don’t want users to have to litter their code

with .unwrap() calls.

If an error situation should occur only because (say) internal data is corrupted, rather

than as a result of invalid inputs, then triggering a panic! is legitimate.

It can even be occasionally useful to allow panics that can be triggered by invalid

input but where such invalid inputs are out of the ordinary. This works best when the

relevant entrypoints come in pairs:

• An “infallible” version whose signature implies it always succeeds (and which

panics if it can’t succeed)

• A “fallible” version that returns a Result

For the former, Rust’ suggest that the panic! should be documented in a specific section of the inline documenta

and entrypoints in the standard library are an example of the latter (although in this case, the panics are

actually deferred to the point where a String constructed from invalid input gets

used).

Assuming that you are trying to comply with the advice in this Item, there are a few

things to bear in mind. The first is that panics can appear in different guises; avoiding

panic! also involves avoiding the following:

•

Item 18: Don’t panic | 161

Harder to spot are things like these:

• slice[index] when the index is out of range

• x / y when y is zero

The second observation around avoiding panics is that a plan that involves constant

vigilance of humans is never a good idea.

However, constant vigilance of machines is another matter: adding a check to your

continuous integration (see t spots new, potentially panicking

code is much more reliable. A simple version could be a simple grep for the most

common panicking entrypoints (as shown previously); a more thorough check could

involve additional tooling from the Rust ecosystem (

build variant tha crate.

Item 19: Avoid reflection

Programmers coming to Rust from other languages are often used to reaching for

reflection as a tool in their toolbox. They can waste a lot of time trying to implement

reflection-based designs in Rust, only to discover that what they’re attempting can

only be done poorly, if at all. This Item hopes to save that time wasted exploring dead

ends, by describing what Rust does and doesn’t have in the way of reflection, and

what can be used instead.

Reflection is the ability of a program to examine itself at runtime. Given an item at

runtime, it covers these questions:

• What information can be determined about the item’s type?

• What can be done with that information?

Programming languages with full reflection support have extensive answers to these

questions. Languages with reflection typically support some or all of the following at

runtime, based on the reflection information:

• Determining an item’s type

• Exploring its contents

• Modifying its fields

• Invoking its methods

Languages that have this level of reflection support also tend to be dynamically typed

languages (e.g., uby), but there are also some notable statically typed languages tha.

162 | Chapter 3: Concepts

Rust does not support this type of reflection, which makes the advice to avoid reflec‐

tion easy to follow at this level—it’s just not possible. For programmers coming from

languages with support for full reflection, this absence may seem like a significant gap

at first, but Rust’s other features provide alternative ways of solving many of the same

problems.

C++ has a more limited form of reflection, known as run-time type identification

(RTtor returns a unique identifier for every type, for objects of polymorphic type (roughly: classes with virtual functions):

typeid

Can recover the concrete class of an object referred to via a base class reference

Allows base class references to be converted to derived classes, when it is safe and

correct to do so

Rust does not support this RTTI style of reflection either, continuing the theme that

the advice of this Item is easy to follow.

Rust does support some features that provide similar

module, but they’re limited (in ways we will explore) and so best avoided unless no

other alternatives are possible.

The first reflection-like feature from std::any looks like magic at first—a way of

determining the name of an item’s type. The following example uses a user-defined

tname() function:

let x = 42u32;

let y = vec![3, 4, 2];

println!("x: {} = {}", tname(&x), x);

println!("y: {} = {:?}", tname(&y), y);

to show types alongside values:

x: u32 = 42

y: alloc::vec::Vec<i32> = [3, 4, 2]

The implementation of tname() reveals what’s up the compiler’s sleeve: the function is

generic (as per vocation of it is actually a different function (tname::<u32> or tname::<Square>):

fn tname<T: ?Sized>(_v: & T) -> &'static str {

std::any::type_name::<T>()

}

Item 19: Avoid reflection | 163

The implementa library function, which is also generic. This function has access only to compile-time information;

there is no code run that determines the type at runtime. Returning to the trait object

demonstrates this:

let square = Square::new(1, 2, 2);

let draw: & dyn Draw = &square;

let shape: & dyn Shape = &square;

println!("square: {}", tname(&square));

println!("shape: {}", tname(&shape));

println!("draw: {}", tname(&draw));

Only the types of the trait objects are available, not the type (Square) of the concrete

underlying item:

square: reflection::Square

shape: &dyn reflection::Shape

draw: &dyn reflection::Draw

The string returned by type_name is suitable only for diagnostics—it’s explicitly a

“best-effort” helper whose contents may change and may not be unique—so don’t

attempt to parse type_name results. If you need a globally unique type identifier, use

use std::any::TypeId;

fn type_id<T: 'static + ?Sized>(_v: & T) -> TypeId {

TypeId::of::<T>()

}

println!("x has {:?}", type_id(&x));

println!("y has {:?}", type_id(&y));

x has TypeId { t: 18349839772473174998 }

y has TypeId { t: 2366424454607613595 }

The output is less helpful for humans, but the guarantee of uniqueness means that the

result can be used in code. However, it’s usually best not to use TypeId directly but to

use the trait instead, because the standard library has additional functionality for working with Any instances (described below).

The Any trait has a single method , which returns the TypeId value for the type that implements the trait. You can’t implement this trait yourself, though,

because Any already comes with a blanket implementation for most arbitrary types T:

impl<T: 'static + ?Sized> Any for T {

fn type_id(&self) -> TypeId {

TypeId::of::<T>()

}

164 | Chapter 3: Concepts

The blanket implementation doesn’t cover every type T: the T: 'static lifetime

bound means that if T includes any references that have a non-'static lifetime, then

TypeId is not implemented for T. This is a t’s imposed because lifetimes aren’t fully part of the type: TypeId::of::<&'a T> would be the

same as TypeId::of::<&'b T>, despite the differing lifetimes, increasing the likeli‐

hood of confusion and unsound code.

that a trait object is a fat pointer that holds a pointer to the underlying item, together with a pointer to the trait implementation’s vtable. For Any, the

vtable has a single entry, for a type_id() method that returns the item’s type, as

shown in

let x_any: Box< dyn Any> = Box::new(42u64);

let y_any: Box< dyn Any> = Box::new(Square::new(3, 4, 3));

Figure 3-4. Any trait objects, each with pointers to concrete items and vtables

Item 19: Avoid reflection | 165

Aside from a couple of indirections, a dyn Any trait object is effectively a combina‐

tion of a raw pointer and a type identifier. This means that the standard library can

offer some additional generic methods that are defined for a dyn Any trait object;

these methods are generic over some additional type T:

Indicates whether the trait object’s type is equal to some specific other type T

Returns a reference to the concrete type T, provided that the trait object’s type

matches T

Returns a mutable reference to the concrete type T, provided that the trait object’s

type matches T

Observe that the Any trait is only approximating reflection functionality: the pro‐

grammer chooses (at compile time) to explicitly build something (&dyn Any) that

keeps track of an item’s compile-time type as well as its location. The ability to (say)

downcast back to the original type is possible only if the overhead of building an Any

trait object has already happened.

There are comparatively few scenarios where Rust has different compile-time and

runtime types associated with an item. Chief among these is trait objects: an item of a

concrete type Square can be coerced into a trait object dyn Shape for a trait that the

type implements. This coercion builds a fat pointer (object + vtable) from a simple

pointer (object/item).

that Rust’s trait objects are not really object-oriented. It’s not the case that a Square is-a Shape; it’s just that a Square implements Shape’s interface.

The same is true for trait bounds: a trait bound Shape: Draw does not mean is-a; it

just means also-implements because the vtable for Shape includes the entries for the

methods of Draw.

For some simple trait bounds:

trait Draw: Debug {

fn bounds(&self) -> Bounds;

}

trait Shape: Draw {

fn render_in(&self, bounds: Bounds);

fn render(&self) {

self.render_in(overlap(SCREEN_BOUNDS, self.bounds()));

}

166 | Chapter 3: Concepts

the equivalent trait objects:

let square = Square::new(1, 2, 2);

let draw: & dyn Draw = &square;

let shape: & dyn Shape = &square;

have a layout with arrows (shown in ; repeated from t make the problem clear: given a dyn Shape object, there’s no immediate way to build a dyn

Draw trait object, because there’s no way to get back to the vtable for impl Draw for

Square—even though the relevant part of its contents (the address of the

Square::bounds() method) is theoretically recoverable. (This is likely to change in

later versions of Rust; see the final section of this Item.)

Figure 3-5. Trait objects for trait bounds, with distinct vtables for Draw and Shape

Comparing this with the previous diagram, it’s also clear that an explicitly construc‐

ted &dyn Any trait object doesn’t help. Any allows recovery of the original concrete

type of the underlying item, but there is no runtime way to see what traits it imple‐

ments, or to get access to the relevant vtable that might allow creation of a trait object.

So what’s available instead?

Item 19: Avoid reflection | 167

The primary tool to reach for is trait definitions, and this is in line with advice for

other languages— Effective Java Item 65 recommends, “Prefer interfaces to reflection.”

If code needs to rely on the availability of certain behavior for an item, encode that

behavior as a trait (ven if the desired behavior can’t be expressed as a set of method signatures, use marker traits to indicate compliance with the desired behavior—it’s safer and more efficient than (say) introspecting the name of a class to check

for a particular prefix.

Code that expects trait objects can also be used with objects having backing code that

was not available at program link time, because it has been dynamically loaded at

runtime (via dlopen(3) or equivalent)—which means that monomorphization of a

’t possible.

Relatedly, reflection is sometimes also used in other languages to allow multiple

incompatible versions of the same dependency library to be loaded into the program

at once, bypassing linkage constraints that There Can Be Only One. This is not

needed in Rust, where Cargo already copes with multiple versions of the same library

Finally, macros—especially derive macros—can be used to auto-generate ancillary

code that understands an item’s type at compile time, as a more efficient and more

type-safe equivalent to code that parses an item’s contents at runtime. dis‐

cusses Rust’s macro system.

Upcasting in Future Versions of Rust

The text of this Item was first written in 2021, and remained accurate all the way until

the book was being prepared for publication in 2024—at which point a new feature is

due to be added to Rust that changes some of the details.

enables upcasts that convert a trait object dyn T to a trait object dyn U, when U is one of T’s supertraits (trait T: U {...}). The feature

is gated on #![feature(trait_upcasting)] in advance of its official release,

expected to be Rust version 1.76.

For the preceding example, that means a &dyn Shape trait object can now be con‐

verted to a &dyn Draw trait object, edging closer to the is-a relationship of

version has a knock-on effect on the internal details of the vtable implementation, which are likely to become more complex than the versions

shown in

However, the central points of this Item are not affected—the Any trait has no super‐

traits, so the ability to upcast adds nothing to its functionality.

168 | Chapter 3: Concepts

Item 20: Avoid the temptation to over-optimize

Just because Rust al ows you to write super cool non-al ocating zero-copy algorithms safely,

doesn’t mean every algorithm you write should be super cool, zero-copy and non-al ocating.

—

Most of the Items in this book are designed to help existing programmers become

familiar with Rust and its idioms. This Item, however, is all about a problem that can

arise when programmers stray too far in the other direction and become obsessed

with exploiting Rust’s potential for efficiency—at the expense of usability and main‐

tainability.

Data Structures and Allocation

Like pointers in other languages, Rust’s references allow you to reuse data without

making copies. Unlike other languages, Rust’s rules around reference lifetimes and

borrows allow you to reuse data safely. However, complying with the borrow check‐

) that make this possible can lead to code that’s harder to use.

This is particularly relevant for data structures, where you can choose between allo‐

cating a fresh copy of something that’s stored in the data structure or including a ref‐

erence to an existing copy of it.

As an example, consider some code that parses a data stream of bytes, extracting data

encoded as type-length-value (TLV) structures where data is transferred in the fol‐

lowing format:

• One byte describing the type of the value (stored in the type_code field here)

• One byte describing the length of the value in bytes (used here to create a slice of

the specified length)

• Followed by the specified number of bytes for the value (stored in the value

field):

/// A type-length-value (TLV) from a data stream.

#[derive(Clone, Debug)]

pub struct Tlv<'a> {

pub type_code: u8,

pub value: &'a [u8],

}

pub type Error = &'static str; // Some local error type.

10 The field can’t be named type because that’s a reserved keyword in Rust. It’s possible to work around this restriction by using the r# (giving a field r#type: u8), but it’s normally easier just to rename the field.

Item 20: Avoid the temptation to over-optimize | 169

/// Extract the next TLV from the ìnput`, also returning the remaining

/// unprocessed data.

pub fn get_next_tlv(input: &[u8]) -> Result<(Tlv, &[u8]), Error> {

if input.len() < 2 {

return Err("too short for a TLV");

}

// The TL parts of the TLV are one byte each.

let type_code = input[0];

let len = input[1] as usize;

if 2 + len > input.len() {

return Err("TLV longer than remaining data");

}

let tlv = Tlv {

type_code,

// Reference the relevant chunk of input data

value: & input[2..2 + len],

};

Ok((tlv, &input[2 + len..]))

}

This Tlv data structure is efficient because it holds a reference to the relevant chunk

of the input data, without copying any of the data, and Rust’s memory safety ensures

that the reference is always valid. That’s perfect for some scenarios, but things become

more awkward if something needs to hang onto an instance of the data structure (as

discussed in

For example, consider a network server that is receiving messages in the form of

TLVs. The received data can be parsed into Tlv instances, but the lifetime of those

instances will match that of the incoming message—which might be a transient

Vec<u8> on the heap or might be a buffer somewhere that gets reused for multiple

messages.

That induces a problem if the server code ever wants to store an incoming message so

that it can be consulted later:

pub struct NetworkServer<'a> {

// ...

/// Most recent max-size message.

max_size: Option<Tlv<'a>>,

}

/// Message type code for a set-maximum-size message.

const SET_MAX_SIZE: u8 = 0x01;

impl<'a> NetworkServer<'a> {

pub fn process(& mut self, mut data: &'a [u8]) -> Result<(), Error> {

while !data.is_empty() {

let (tlv, rest) = get_next_tlv(data)?;

match tlv.type_code {

170 | Chapter 3: Concepts

SET_MAX_SIZE => {

// Save off the most recent `SET_MAX_SIZÈ message.

self.max_size = Some(tlv);

}

// (Deal with other message types)

// ...

_ => return Err("unknown message type"),

}

data = rest; // Process remaining data on next iteration.

}

Ok(())

}

This code compiles as is but is effectively impossible to use: the lifetime of the Net

workServer has to be smaller than the lifetime of any data that gets fed into its pro

cess() method. That means that a straightforward processing loop:

D O E S N O T C O M P I L E

let mut server = NetworkServer::default();

while !server.done() {

// Read data into a fresh vector.

let data: Vec< u8> = read_data_from_socket();

if let Err(e) = server.process(&data) {

log::error!("Failed to process data: {:?}", e);

}

fails to compile because the lifetime of the ephemeral data gets attached to the longer-

lived server:

error[E0597]: `datà does not live long enough

--> src/main.rs:375:40

372 | while !server.done() {

| ------------- borrow later used here

373 | // Read data into a fresh vector.

374 | let data: Vec<u8> = read_data_from_socket();

| ---- binding `datà declared here

375 | if let Err(e) = server.process(&data) {

| ^^^^^ borrowed value does not live

| long enough

...

378 | }

| - `datà dropped here while still borrowed

Switching the code so it reuses a longer-lived buffer doesn’t help either:

Item 20: Avoid the temptation to over-optimize | 171

D O E S N O T C O M P I L E

let mut perma_buffer = [0u8; 256];

let mut server = NetworkServer::default(); // lifetime within `perma_buffer`

while !server.done() {

// Reuse the same buffer for the next load of data.

read_data_into_buffer(& mut perma_buffer);

if let Err(e) = server.process(&perma_buffer) {

log::error!("Failed to process data: {:?}", e);

}

This time, the compiler complains that the code is trying to hang on to a reference

while also handing out a mutable reference to the same buffer:

error[E0502]: cannot borrow `perma_bufferàs mutable because it is also

borrowed as immutable

--> src/main.rs:353:31

353 | read_data_into_buffer(&mut perma_buffer);

| ^^^^^^^^^^^^^^^^^ mutable borrow occurs here

354 | if let Err(e) = server.process(&perma_buffer) {

| -----------------------------

| | |

| | immutable borrow occurs here

| immutable borrow later used here

The core problem is that the Tlv structure references transient data—which is fine for

transient processing but is fundamentally incompatible with storing state for later.

However, if the Tlv data structure is converted to own its contents:

#[derive(Clone, Debug)]

pub struct Tlv {

pub type_code: u8,

pub value: Vec< u8>, // owned heap data

}

and the get_next_tlv() code is correspondingly tweaked to include an additional

call to .to_vec():

// ...

let tlv = Tlv {

type_code,

// Copy the relevant chunk of data to the heap.

// The length field in the TLV is a single ù8`,

// so this copies at most 256 bytes.

value: input[2..2 + len].to_vec(),

};

172 | Chapter 3: Concepts

then the server code has a much easier job. The data-owning Tlv structure has no

lifetime parameter, so the server data structure doesn’t need one either, and both var‐

iants of the processing loop work fine.

Who’s Afraid of the Big Bad Copy?

One reason why programmers can become overly obsessed with reducing copies is

that Rust generally makes copies and allocations explicit. A visible call to a method

like .to_vec() or .clone(), or to a function like Box::new(), makes it clear that

copying and allocation are occurring. This is in contrast to C++, where it’s easy to

inadvertently write code that blithely performs allocation under the covers, particu‐

larly in a copy-constructor or assignment operator.

Making an allocation or copy operation visible rather than hidden isn’t a good reason

to optimize it away, especially if that happens at the expense of usability. In many sit‐

uations, it makes more sense to focus on usability first, and fine-tune for optimal effi‐

ciency only if performance is genuinely a concern—and if benchmarking (see

indicates that reducing copies will have a significant impact.

Also, the efficiency of your code is usually important only if it needs to scale up for

extensive use. If it turns out that the trade-offs in the code are wrong, and it doesn’t

cope well when millions of users start to use it—well, that’s a nice problem to have.

However, there are a couple of specific points to remember. The first was hidden

behind the weasel word general y when pointing out that copies are generally visible.

The big exception to this is Copy types, where the compiler silently makes copies

willy-nilly, shifting from move semantics to copy semantics. As such, the advice in

bears repeating here: don’t implement Copy unless a bitwise copy is valid and

fast. But the converse is true too: do consider implementing Copy if a bitwise copy is

valid and fast. For example, enum types that don’t carry additional data are usually eas‐

ier to use if they derive Copy.

The second point that might be relevant is the potential trade-off with no_std use.

t it’s often possible to write code that’s no_std-compatible with

only minor modifications, and code that avoids allocation altogether makes this more

straightforward. However, targeting a no_std environment that supports heap alloca‐

tion (via the alloc library, also described in y give the best balance of

usability and no_std support.

Item 20: Avoid the temptation to over-optimize | 173

References and Smart Pointers

So very recently, I’ve consciously tried the experiment of not worrying about the hypothetical

perfect code. Instead, I call .clone() when I need to, and use Arc to get local objects into

threads and futures more smoothly.

And it feels glorious.

—

Designing a data structure so that it owns its contents can certainly make for better

ergonomics, but there are still potential problems if multiple data structures need to

make use of the same information. If the data is immutable, then each place having its

own copy works fine, but if the information might change (which is very commonly

the case), then multiple copies means multiple places that need to be updated, in sync

with each other.

Using Rust’s smart pointer types helps solve this problem, by allowing the design to

shift from a single-owner model to a shared-owner model. The Rc (for single-

threaded code) and Arc (for multithreaded code) smart pointers provide reference

counting that supports this shared-ownership model. Continuing with the assump‐

tion that mutability is needed, they are typically paired with an inner type that allows

interior mutability, independently of Rust’s borrow checking rules:

RefCell

For interior mutability in single-threaded code, giving the common

Rc<RefCell<T>> combination

Mutex

For interior mutability in m

mon Arc<Mutex<T>> combination

This transition is covered in more detail in the GuestRegister exam,

but the point here is that you don’t have to treat Rust’s smart pointers as a last resort.

It’s not an admission of defeat if your design uses smart pointers instead of a complex

web of interconnected reference lifetimes— smart pointers can lead to a simpler, more

maintainable, and more usable design.

174 | Chapter 3: Concepts

CHAPTER 4

Dependencies

When the Gods wish to punish us, they answer our prayers.

—Oscar Wilde

For decades, the idea of code reuse was merely a dream. The idea that code could be

written once, packaged into a library, and reused across many different applications

was an ideal, realized only for a few standard libraries and for corporate in-house

tools.

The growth of the internet and the rise of open source software finally changed that.

The first openly accessible repository that held a wide collection of useful libraries,

prehensive Perl Archive Network, online since 1995. Today, almost every modern language has a

comprehensive collection of open source libraries available, housed in a package

repository tha

However, new problems come along with that ease, convenience, and speed. It’s usu‐

al y still easier to reuse existing code than to write it yourself, but there are potential

pitfalls and risks that come along with dependencies on someone else’s code. This

chapter of the book will help you be aware of these.

The focus is specifically on Ry of the concerns, topics, and issues covered apply equally well to other toolchains (and other

languages).

1 With the notable exception of C and C++, where package management remains somewhat fragmented.

175

Item 21: Understand what semantic versioning promises

If we acknowledge that SemVer is a lossy estimate and represents only a subset of the possible

scope of changes, we can begin to see it as a blunt instrument.

—Titus Winters, Software Engineering at Google (O’Reilly)

Cargo, Rust’s package manager, allows automa

for Rust code according to semantic versioning (semver). A Cargo.toml stanza like:

[dependencies]

serde = "1.4"

indicates to cargo what ranges of semver versions are acceptable for this dependency.

The ying precise ranges of acceptable versions, but the following are the most commonly used variants:

"1.2.3"

Specifies that any version that’s semver-compatible with 1.2.3 is acceptable

"^1.2.3"

Is another way of specifying the same thing more explicitly

"=1.2.3"

Pins to one particular version, with no substitutes accepted

"~1.2.3"

Allows versions that are semver-compatible with 1.2.3 but only where the last

specified component changes (so 1.2.4 is acceptable but 1.3.0 is not)

"1.2.*"

Accepts any version that matches the wildcard

Examples of what these specifica.

Table 4-1. Cargo dependency version specification

Specification

1.2.2 1.2.3 1.2.4 1.3.0 2.0.0

"1.2.3"

Yes

"^1.2.3"

Yes

"=1.2.3"

Yes

"~1.2.3"

Yes

"1.2.*"

Yes

"1.*"

Yes

"*"

Yes

176 | Chapter 4: Dependencies

When choosing dependency versions, Cargo will generally pick the largest version

that’s within the combination of all of these semver ranges.

Because semantic versioning is at the heart of cargo’s dependency resolution process,

this Item explores more details about what semver means.

Semver Essentials

The essentials of semantic versioning are listed in the

, reproduced here:

Given a version number MAJOR.MINOR.PATCH, increment the:

• MAJOR version when you make incompatible API changes

• MINOR version when you add functionality in a backward compatible manner

• PATCH version when you make backward compatible bug fixes

An important point lurks in the

3. Once a versioned package has been released, the contents of that version MUST

NOT be modified. Any modifications MUST be released as a new version.

Putting this into different words:

• Changing anything requires a new patch version.

• Adding things to the API in a way that means existing users of the crate still com‐

pile and work requires a minor version upgrade.

• Removing or changing things in the API requires a major version upgrade.

There is one more important ver rules:

4. Major version zero (0.y.z) is for initial development. Anything MAY change at any

time. The public API SHOULD NOT be considered stable.

Cargo adapts this last rule slightly, “left-shifting” the earlier rules so that changes in

the leftmost non-zero component indicate incompatible changes. This means that

0.2.3 to 0.3.0 can include an incompatible API change, as can 0.0.4 to 0.0.5.

Semver for Crate Authors

In theory, theory is the same as practice. In practice, it’s not.

As a crate author, the first of these rules is easy to comply with, in theory: if you touch

anything, you need a new release. Using Git tch releases can help with this—by default, a tag is fixed to a particular commit and can be moved only with a

manual --force option. Cra also get automatic policing of Item 21: Understand what semantic versioning promises | 177

this, as the registry will reject a second attempt to publish the same crate version. The

main danger for noncompliance is when you notice a mistake just after a release has

gone out, and you have to resist the temptation to just nip in a fix.

The semver specification covers API compatibility, so if you make a minor change to

behavior that doesn’t alter the API, then a patch version update should be all that’s

needed. (However, if your crate is widely depended on, then in practice you may need

to be aware of

someone out there is likely to

unchanged.)

The difficult part for crate authors is the latter rules, which require an accurate deter‐

mination of whether a change is back compatible or not. Some changes are obviously

incompatible—removing public entrypoints or types, changing method signatures—

and some changes are obviously backward compatible (e.g., adding a new method to

a struct, or adding a new constant), but there’s a lot of gray area left in between.

To help with this, the goes into considerable detail as to what is and is not back compatible. Most of these details are unsurprising, but there are a few areas

worth highlighting:

• Adding new items is usual y safe—but may cause clashes if code using the crate

already makes use of something that happens to have the same name as the new

item.

— , because all of the crate’s items are then automatically in the user’s main namespace.

— Even without a wildcard im (with a default implementation; has a chance of clashing with an existing name.

• Rust’s insistence on covering all possibilities means that changing the set of avail‐

able possibilities can be a breaking change.

— Performing a match on an enum m

, that’s a breaking change (unless the enum is already marked as adding non_exhaustive is also a breaking change).

— Explicitly creating an instance of a struct requires an initial value for all

fields, so is a breaking change. Structures that have private fields are OK, because crate

users can’t explicitly construct them anyway; a struct can also be marked

as non_exhaustive to prevent external users from performing explicit

construction.

178 | Chapter 4: Dependencies

• Changing a trait so it is (y users that build trait objects for the trait will stop being able to compile their

code.

• Adding a new blanket implementation for a trait is a breaking change; any users

that already implement the trait will now have two conflicting implementations.

• Changing the license of an open source crate is an incompatible change: users of

your crate who have strict restrictions on what licenses are acceptable may be

broken by the change. Consider the license to be part of your API.

• Changing the default fea) of a crate is potentially a breaking

change. Removing a default feature is almost certain to break things (unless the

feature was already a no-op); adding a default feature may break things depend‐

ing on what it enables. Consider the default feature set to be part of your API.

• Changing library code so that it uses a new feature of Rust might be an incompat‐

ible change, because users of your crate who have not yet upgraded their com‐

piler to a version that includes the feature will be broken by the change. However,

most Rust crates treat a minimum supported Rust version (MSRV) increase as a

, so consider whether the MSRV forms part of your API.

An obvious corollary of the rules is this: the fewer public items a crate has, the fewer

things there are that can induce an incompa).

However, there’s no escaping the fact that comparing all public API items for compat‐

ibility from one release to the next is a time-consuming process that is likely to yield

only an approximate (major/minor/patch) assessment of the level of change, at best.

Given that this comparison is a somewhat mechanical process, hopefully tooling

) will arrive to make the process easier

If you do need to make an incompatible major version change, it’s nice to make life

easier for your users by ensuring that the same overall functionality is available after

the change, even if the API has radically changed. If possible, the most helpful

sequence for your crate users is as follows:

1. Release a minor version update that includes the new version of the API and that

marks the older variantion of how to migrate.

2. Release a major version update that removes the deprecated parts of the API.

A more subtle point is make breaking changes breaking. If your crate is changing its

behavior in a way that’s actually incompatible for existing users but that could reuse

2 For example, is a tool that attempts to do something along these lines.

Item 21: Understand what semantic versioning promises | 179

the same API: don’t. Force a change in types (and a major version bump) to ensure

that users can’t inadvertently use the new version incorrectly.

For the less tangible parts of your API—such as the or the license—consider

t detects changes, using tooling (e.g., cargo-deny;

see

Finally, don’t be afraid of version 1.0.0 because it’s a commitment that your API is

now fixed. Lots of crates fall into the trap of staying at version 0.x forever, but that

reduces the already-limited expressivity of semver from three categories (major/

minor/patch) to two (effective-major/effective-minor).

Semver for Crate Users

For the user of a crate, the theoretical expectations for a new version of a dependency

are as follows:

• A new patch version of a dependency crate Should Just Work.™

• A new minor version of a dependency crate Should Just Work,™ but the new parts

of the API might be worth exploring to see if there are now cleaner or better ways

of using the crate. However, if you do use the new parts, you won’t be able to

revert the dependency back to the old version.

• All bets are off for a new major version of a dependency; chances are that your

code will no longer compile, and you’ll need to rewrite parts of your code to

comply with the new API. Even if your code does still compile, you should check

that your use of the API is still valid after a major version change, because the con‐

straints and preconditions of the library may have changed.

In practice, even the first two types of change may cause unexpected behavior

changes, even in code that still compiles fine, due to Hyrum’s Law.

As a consequence of these expectations, your dependency specifications will com‐

monly take a form like "1.4.3" or "0.7", which includes subsequent compatible ver‐

sions; avoid specifying a completely wildcard dependency like "*" or "0.*". A

completely wildcard dependency says that any version of the dependency, with any

API, can be used by your crate—which is unlikely to be what you really want. Avoid‐

ing wildcards is also a requirement for publishing to crates.io; submissions with

"*"

180 | Chapter 4: Dependencies

However, in the longer term, it’s not safe to just ignore major version changes in

dependencies. Once a library has had a major version change, the chances are that no

further bug fixes—and more importantly, security updates—will be made to the pre‐

vious major version. A version specification like "1.4" will then fall further and fur‐

ther behind as new 2.x releases arrive, with any security problems left unaddressed.

As a result, you need to either accept the risks of being stuck on an old version or

eventual y fol ow major version upgrades to your dependencies. Tools such as cargo

update(tes are available; you can then schedule the upgrade for a time that’s convenient for you.

Discussion

Semantic versioning has a cost: every change to a crate has to be assessed against its

criteria, to decide the appropriate type of version bump. Semantic versioning is also a

blunt tool: at best, it reflects a crate owner’s guess as to which of three categories the

current release falls into. Not everyone gets it right, not everything is clear-cut about

exactly what “right” means, and even if you get it right, there’s always a chance you

may fall foul of Hyrum’s Law.

However, semver is the only game in town for anyone who doesn’t have the luxury of

working in an environmen. As such, understanding its concepts and limitations is necessary for managing dependencies.

Item 22: Minimize visibility

Rust allows elements of the code to either be hidden from or exposed to other parts of

the codebase. This Item explores the mechanisms provided for this and suggests

advice for where and when they should be used.

Visibility Syntax

Rust’s basic unit of visibility is the module. By default, a module’s items (types, meth‐

ods, constants) are private and accessible only to code in the same module and its

submodules.

Code that needs to be more widely available is marked with the pub keyword, making

it public to some other scope. For most Rust syntactic features, making the feature

pub does not automatically expose the contents—the types and functions in a pub mod

are not public, nor are the fields in a pub struct. However, there are a couple of

exceptions where applying the visibility to the contents makes sense:

• Making an enum public automatically makes the type’s variants public too

(together with any fields that might be present in those variants).

Item 22: Minimize visibility | 181

• Making a trait public automatically makes the trait’s methods public too.

So a collection of types in a module:

pub mod somemodule {

// Making àstruct` public does not make its fields public.

#[derive(Debug, Default)]

pub struct AStruct {

// By default fields are inaccessible.

count: i32,

// Fields have to be explicitly marked `pub` to be visible.

pub name: String,

}

// Likewise, methods on the struct need individual `pub` markers.

impl AStruct {

// By default methods are inaccessible.

fn canonical_name(&self) -> String {

self.name.to_lowercase()

}

// Methods have to be explicitly marked `pub` to be visible.

pub fn id(&self) -> String {

format!("{}-{}", self.canonical_name(), self.count)

}

// Making an ènum` public also makes all of its variants public.

#[derive(Debug)]

pub enum AnEnum {

VariantOne,

// Fields in variants are also made public.

VariantTwo(u32),

VariantThree { name: String, value: String },

}

// Making àtrait` public also makes all of its methods public.

pub trait DoSomething {

fn do_something(&self, arg: i32);

}

allows access to pub things and the exceptions previously mentioned:

use somemodule::*;

let mut s = AStruct::default();

s.name = "Miles".to_string();

println!("s = {:?}, name='{}', id={}", s, s.name, s.id());

let e = AnEnum::VariantTwo(42);

println!("e = {e:?}");

#[derive(Default)]

182 | Chapter 4: Dependencies

pub struct DoesSomething;

impl DoSomething for DoesSomething {

fn do_something(&self, _arg: i32) {}

}

let d = DoesSomething::default();

d.do_something(42);

but non-pub things are generally inaccessible:

let mut s = AStruct::default();

s.name = "Miles".to_string();

println!("(inaccessible) s.count={}", s.count);

println!("(inaccessible) s.canonical_name()={}", s.canonical_name());

error[E0616]: field `countòf struct `somemodule::AStructìs private

--> src/main.rs:230:45

230 | println!("(inaccessible) s.count={}", s.count);

| ^^^^^ private field

error[E0624]: method `canonical_nameìs private

--> src/main.rs:231:56

86 | fn canonical_name(&self) -> String {

| ---------------------------------- private method defined here

...

231 | println!("(inaccessible) s.canonical_name()={}", s.canonical_name());

| private method ^^^^^^^^^^^^^^

Some errors have detailed explanations: E0616, E0624.

For more information about an error, try `rustc --explain E0616`.

The most common visibility marker is the bare pub keyword, which makes the item

visible to anything that’s able to see the module it’s in. That last detail is important: if a

somecrate::somemodule module isn’t visible to other code in the first place, anything

that’s pub inside it is still not visible.

However, there are also some more-specific variants of pub that allow the scope of the

visibility to be constrained. In descending order of usefulness, these are as follows:

pub(crate)

Accessible anywhere within the owning crate. This is particularly useful for crate-

wide internal helper functions that should not be exposed to external crate users.

pub(super)

Accessible to the parent module of the current module and its submodules. This

is occasionally useful for selectively increasing visibility in a crate that has a deep

module structure. It’s also the effective visibility level for modules: a plain mod

mymodule is visible to its parent module or crate and the corresponding

submodules.

Item 22: Minimize visibility | 183

pub(in <path>)

Accessible to code in <path>, which has to be a description of some ancestor

module of the current module. This can occasionally be useful for organizing

source code, because it allows subsets of functionality to be moved into submod‐

ules that aren’t necessarily visible in the public API. For example, the Rust stan‐

dard library consolidates all of the iterator to

and has the following:

• A pub(in crate::iter) visibility marker on all of the required adapter

• A pub use of all of the adapters::.

pub(self)

Equivalent to pub(in self), which is equivalent to not being pub. Uses for this

are very obscure, such as reducing the number of special cases needed in code-

generation macros.

The Rust compiler will warn you if you have a code item that is private to the module

but not used within that module (and its submodules):

pub mod anothermodule {

// Private function that is not used within its module.

fn inaccessible_fn(x: i32) -> i32 {

x + 3

}

Although the warning indicates that the code is “never used” in its owning module, in

practice this warning often indicates that code can’t be used from outside the module,

because the visibility restrictions don’t allow it:

warning: function ìnaccessible_fnìs never used

--> src/main.rs:56:8

56 | fn inaccessible_fn(x: i32) -> i32 {

| ^^^^^^^^^^^^^^^

= note: `#[warn(dead_code)]òn by default

Visibility Semantics

Separate from the question of how to increase visibility is the question of when to do

so. The generally accepted answer to this is as little as possible, at least for any code

that may possibly get used and reused in the future.

The first reason for this advice is that visibility changes can be hard to undo. Once a

crate item is public, it can’t be made private again without breaking any code that uses

the crate, thus necessitating a major version bum). The converse is not true: 184 | Chapter 4: Dependencies

moving a private item to be public generally needs only a minor version bump and

leaves craand notice how many are relevant only if there are pub items in play.

A more important—but more subtle—reason to prefer privacy is that it keeps your

options open. The more things that are exposed, the more things there are that need

to stay fixed for the future (absent an incompatible change). If you expose the internal

implementation details of a data structure, a putative future change to use a more effi‐

cient algorithm becomes a breaking change. If you expose internal helper functions,

it’s inevitable that some external code will come to depend on the exact details of

those functions.

Of course, this is a concern only for library code that potentially has multiple users

and a long lifespan. But nothing is as permanent as a temporary solution, and so it’s a

good habit to fall into.

It’s also worth observing that this advice to restrict visibility is by no means unique to

this Item or to Rust:

• The R.

• , 3rd edition, (Addison-Wesley Professional) has the following:

— Item 15: Minimize the accessibility of classes and members.

— Item 16: In public classes, use accessor methods, not public fields.

• Effective C++ by Scott Meyers (Addison-Wesley Professional) has the following

in its second edition:

— Item 18: Strive for class interfaces that are complete and minimal (my italics).

— Item 20: Avoid data members in the public interface.

Item 22: Minimize visibility | 185

Item 23: Avoid wildcard imports

Rust’s use statement pulls in a named item from another crate or module and makes

that name available for use in the local module’s code without qualification. A wild‐

card import (or glob import) of the form use somecrate::module::* says that every

public symbol from that module should be added to the local namespace.

As described in te may add new items to its API as part of a

minor version upgrade; this is considered a backward-compatible change.

The combination of these two observations raises the worry that a nonbreaking

change to a dependency might break your code: what happens if the dependency adds

a new symbol that clashes with a name you’re already using?

At the simplest level, this turns out not to be a problem: the names in a wildcard

import are treated as being lower priority, so any matching names that are in your

code take precedence:

use bytes::*;

// Local `Bytes` type does not clash with `bytes::Bytes`.

struct Bytes(Vec< u8>);

Unfortunately, there are still cases where clashes can occur. For example, consider the

case when the dependency adds a new trait and implements it for some type:

trait BytesLeft {

// Name clashes with thèremaining` method on the wildcard-imported

// `bytes::Buf` trait.

fn remaining(&self) -> usize;

}

impl BytesLeft for &[u8] {

// Implementation clashes with ìmpl bytes::Buf for &[u8]`.

fn remaining(&self) -> usize {

self.len()

}

If any method names from the new trait clash with existing method names that apply

to the type, then the compiler can no longer unambiguously figure out which method

is intended:

D O E S N O T C O M P I L E

let arr = [1u8, 2u8, 3u8];

let v = &arr[1..];

assert_eq!(v.remaining(), 2);

186 | Chapter 4: Dependencies

as indicated by the compile-time error:

error[E0034]: multiple applicable items in scope

--> src/main.rs:40:18

40 | assert_eq!(v.remaining(), 2);

| ^^^^^^^^^ multiplèremaining` found

note: candidate #1 is defined in an impl of the trait `BytesLeft` for the

typè&[u8]`

--> src/main.rs:18:5

18 | fn remaining(&self) -> usize {

| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^

= note: candidate #2 is defined in an impl of the trait `bytes::Buf` for the

typè&[u8]`

help: disambiguate the method for candidate #1

40 | assert_eq!(BytesLeft::remaining(&v), 2);

| ~~~~~~~~~~~~~~~~~~~~~~~~

help: disambiguate the method for candidate #2

40 | assert_eq!(bytes::Buf::remaining(&v), 2);

| ~~~~~~~~~~~~~~~~~~~~~~~~~

As a result, you should avoid wildcard imports from crates that you don’t control.

If you do control the source of the wildcard import, then the previously mentioned

concerns disappear. For example, it’s common for a test module to do use

super::*;. It’s also possible for crates that use modules primarily as a way of dividing

Назад: Loop Transformation

Дальше: Item 24: Re-export dependencies whose types appear in your API