Integration Tests - Effective Rust - David Drysdale - RutLib.com - Ваша домашняя библиотека

Книга: Effective Rust

Назад: Item 24: Re-export dependencies whose types appear in your API

На главную: Предисловие

an exception to the general advice in void wildcard imports).

• The normal visibility rules for modules mean that a unit test has the ability to use

anything from the parent module, whether it is pub or not. This allows for “open-

box” testing of the code, where the unit tests exercise internal features that aren’t

visible to normal users.

• The test code makes use of expect() or unwrap() for its expected results. The

advice in ’t really relevant for test-only code, where panic! is used to

signal a failing test. Similarly, the test code also checks expected results with

assert_eq!, which will panic on failure.

• The code under test includes a function that panics on some kinds of invalid

input; to exercise that, there’s a unit test function that’s marked with the

#[should_panic] attribute. This might be needed when testing an internal func‐

tion that normally expects the rest of the code to respect its invariants and pre‐

conditions, or it might be a public function that has some reason to ignore the

. (Such a function should have a “Panics” section in its doc com‐

ment, as described in

suggests not documenting things that are already expressed by the type system. Similarly, there’s no need to test things that are guaranteed by the type system. If

your enum types start holding values that aren’t in the list of allowed variants, you’ve

got bigger problems than a failing unit test!

However, if your code relies on specific functionality from your dependencies, it can

be helpful to include basic tests of that functionality. The aim here is not to repeat

testing that’s already done by the dependency itself but instead to have an early warn‐

ing system that indicates whether the behavior that you need from the dependency

has changed—separately from whether the public API signature has changed, as indi‐

cated by the semantic version n

228 | Chapter 5: Tooling

Integration Tests

The other common form of test included with a Rust project is integration tests, held

under tests/. Each file in that directory is run as a separate test program that executes

all of the functions marked with #[test].

Integration tests do not have access to crate internals and so act as behavior tests that

can exercise only the public API of the crate.

Doc Tests

described the inclusion of short code samples in documentation comments,

to illustrate the use of a particular public API item. Each such chunk of code is

enclosed in an implicit fn main() { ... } and run as part of cargo test, effectively

making it an additional test case for your code, known as a doc test. Individual tests

can also be executed selectively by running cargo test --doc <item-name>.

Regularly running tests as part of your CI environment your

code samples don’t drift too far from the current reality of your API.

Examples

also described the ability to provide example programs that exercise your

public API. Each Rust file under examples/ (or each subdirectory under examples/

that includes a main.rs) can be run as a standalone binary with cargo run --

example <name> or cargo test --example <name>.

These programs have access to only the public API of your crate and are intended to

illustrate the use of your API as a whole. Examples are not specifically designated as

test code (no #[test], no #[cfg(test)]), and they’re a poor place to put code that

exercises obscure nooks and crannies of your crate—particularly as examples are not

run by cargo test by default.

Nevertheless, it’s a good idea to ensure that your CI system () builds and runs

all the associated examples for a crate (with cargo test --examples), because it can

act as a good early warning system for regressions that are likely to affect lots of users.

As noted, if your examples demonstrate mainline use of your API, then a failure in

the examples implies that something significant is wrong:

• If it’s a genuine bug, then it’s likely to affect lots of users—the very nature of

example code means that users are likely to have copied, pasted, and adapted the

example.

• If it’s an intended change to the API, then the examples need to be updated to

match. A change to the API also implies a backward incompatibility, so if the

Item 30: Write more than unit tests | 229

crate is published, then the semantic version number needs a corresponding

update to indicate this (

The likelihood of users copying and pasting example code means that it should have a

different style than test code. In line with ple for your users by avoiding unwrap() calls for Results. Instead, make each example’s

main() function return something like Result<(), Box<dyn Error>>, and then use

the question mark operator throughout ().

Benchmarks

attempts to persuade you that fully optimizing the performance of your code

isn’t always necessary. Nevertheless, there are definitely times when performance is

critical, and if that’s the case, then it’s a good idea to measure and track that perfor‐

mance. Having benchmarks that are run regularly (e.g., as part of CI;

you to detect when changes to the code or the toolchains adversely affect that perfor‐

mance.

The command runs special test cases that repeatedly perform an operation, and emits average timing information for the operation. At the time of writing,

support for benchmarks is not stable, so the precise command may need to be cargo

+nightly bench. (Rust’s unstable features, including the feature used here, are described in .)

However, there’s a danger that compiler optimizations may give misleading results,

particularly if you restrict the operation that’s being performed to a small subset of

the real code. Consider a simple arithmetic function:

pub fn factorial(n: u128) -> u128 {

match n {

0 => 1,

n => n * factorial(n - 1),

}

A naive benchmark for this code:

#![feature(test)]

extern crate test;

#[bench]

fn bench_factorial(b: & mut test::Bencher) {

b.iter(|| {

let result = factorial(15);

assert_eq!(result, 1_307_674_368_000);

});

}

gives incredibly positive results:

230 | Chapter 5: Tooling

test bench_factorial ... bench: 0 ns/iter (+/- 0)

With fixed inputs and a small amount of code under test, the compiler is able to opti‐

mize away the iteration and directly emit the result, leading to an unrealistically opti‐

mistic result.

’s an identity function

(their italics) to pessimize.

Moving the benchmark code to use this hint:

#[bench]

fn bench_factorial(b: & mut test::Bencher) {

b.iter(|| {

let result = factorial(std::hint::black_box(15));

assert_eq!(result, 1_307_674_368_000);

});

}

gives more realistic results:

test blackboxed::bench_factorial ... bench: 16 ns/iter (+/- 3)

The

emitted by the compiler, which may make it obvious when the compiler has per‐

formed optimizations that would be unrealistic for code running a real scenario.

Finally, if you are including benchmarks for your Rust code, the crate may provide an alternat is more convenient (it runs with stable Rust) and more fully featured (it has support for

statistics and graphs).

Fuzz Testing

Fuzz testing is the process of exposing code to randomized inputs in the hope of find‐

ing bugs, particularly crashes that result from those inputs. Although this can be a

useful technique in general, it becomes much more important when your code is

exposed to inputs that may be controlled by someone who is deliberately trying to

attack the code—so you should run fuzz tests if your code is exposed to potential

attackers.

Historically, the majority of defects in C/C++ code that have been exposed by fuzzers

have been memory safety problems, typically found by combining fuzz testing with

runtime instrumentation (e.g., or ) of memory access patterns.

Rust is immune to some (but not all) of these memory safety problems, particularly

when there is no unsafe code involved (owever, Rust does not prevent

Item 30: Write more than unit tests | 231

bugs in general, and a code path that triggers a panic! (see ) can still result in a denial-of-service (DoS) attack on the codebase as a whole.

The most effective forms of fuzz testing are coverage-guided: the test infrastructure

monitors which parts of the code are executed and favors random mutations of the

inputs that explore new code paths.

heavyweight champion of this technique, but in more recent years equivalent func‐

tionality has been included in the LL

The Rust compiler is built on LLVM, and so the subcommand exposes libFuzzer functionality for Rust (albeit for only a limited number of platforms).

The primary requirement for a fuzz test is to identify an entrypoint of your code that

takes (or can be adapted to take) arbitrary bytes of data as input:

U N D E S I R E D B E H A V I O R

/// Determine if the input starts with "FUZZ".

pub fn is_fuzz(data: &[u8]) -> bool {

if data.len() >= 3 /* oops */

&& data[0] == b'F'

&& data[1] == b'U'

&& data[2] == b'Z'

&& data[3] == b'Z'

{

true

} else {

false

}

With a target entrypoint identified, the

arrange the fuzzing subproject. At its core is a small driver that connects the target

entrypoint to the fuzzing infrastructure:

// fuzz/fuzz_targets/target1.rs file

#![no_main]

use libfuzzer_sys::fuzz_target;

fuzz_target!(|data: &[u8]| {

let _ = somecrate::is_fuzz(data);

});

Running cargo +nightly fuzz run target1 continuously executes the fuzz target

with random data, stopping only if a crash is found. In this case, a failure is found

almost immediately:

INFO: Running with entropic power schedule (0xFF, 100).

INFO: Seed: 1607525774

232 | Chapter 5: Tooling

INFO: Loaded 1 modules: 1624 [0x108219fa0, 0x10821a5f8),

INFO: Loaded 1 PC tables (1624 PCs): 1624 [0x10821a5f8,0x108220b78),

INFO: 9 files found in fuzz/corpus/target1

INFO: seed corpus: files: 9 min: 1b max: 8b total: 46b rss: 38Mb

#10

INITED cov: 26 ft: 26 corp: 6/22b exec/s: 0 rss: 39Mb

thread panicked at 'index out of bounds: the len is 3 but the index is 3',

testing/src/lib.rs:77:12

stack backtrace:

0: rust_begin_unwind

at /rustc/f77bfb7336f2/library/std/src/panicking.rs:579:5

1: core::panicking::panic_fmt

at /rustc/f77bfb7336f2/library/core/src/panicking.rs:64:14

2: core::panicking::panic_bounds_check

at /rustc/f77bfb7336f2/library/core/src/panicking.rs:159:5

3: somecrate::is_fuzz

4: _rust_fuzzer_test_input

5: ___rust_try

6: _LLVMFuzzerTestOneInput

7: __ZN6fuzzer6Fuzzer15ExecuteCallbackEPKhm

8: __ZN6fuzzer6Fuzzer6RunOneEPKhmbPNS_9InputInfoEbPb

9: __ZN6fuzzer6Fuzzer16MutateAndTestOneEv

10: __ZN6fuzzer6Fuzzer4LoopERNSt3__16vectorINS_9SizedFileENS_

16fuzzer_allocatorIS3_EEEE

11: __ZN6fuzzer12FuzzerDriverEPiPPPcPFiPKhmE

12: _main

and the input that triggered the failure is emitted.

Normally, fuzz testing does not find failures so quickly, and so it does not make sense

to run fuzz tests as part of your CI. The open-ended nature of the testing, and the

consequent compute costs, mean that you need to consider how and when to run

fuzz tests—perhaps only for new releases or major changes, or perhaps for a limited

You can also make subsequent runs of the fuzzing infrastructure more efficient, by

storing and reusing a corpus of previous inputs that the fuzzer found to explore new

code paths; this helps subsequent runs of the fuzzer explore new ground, rather than

retesting code paths previously visited.

6 If your code is a widely used open source cra may be willing to run fuzzing on your behalf.

Item 30: Write more than unit tests | 233

Testing Advice

An Item about testing wouldn’t be complete without repeating some common advice

(which is mostly not Rust-specific):

• As this Item has endlessly repeated, run all your tests in CI on every change (with

the exception of fuzz tests).

• When you’re fixing a bug, write a test that exhibits the bug before fixing the bug.

That way you can be sure that the bug is fixed and that it won’t be accidentally

reintroduced in the future.

• If your crate has fea run tests over every possible combination of available features.

• More generally, if your crate includes any config-specific code (e.g., #[cfg(tar

get_os = "windows")]), run tests for every platform that has distinct code.

This Item has covered a lot of different types of tests, so it’s up to you to decide how

much each of them is relevant and worthwhile for your project.

If you have a lot of test code and you are publishing your cra

you might need to consider which of the tests make sense to include in the published

crate. By default, cargo will include unit tests, integration tests, benchmarks, and

examples (but not fuzz tests, because the cargo-fuzz tools store these as a separate

crate in a subdirectory), which may be more than end users need. If that’s the case,

you can either some of the files or (for behavior tests) move the tests out of the crate and into a separate test crate.

Things to Remember

• Write unit tests for comprehensive testing that includes testing of internal-only

code. Run them with cargo test.

• Write integration tests to exercise your public API. Run them with cargo test.

• Write doc tests that exemplify how to use individual items in your public API.

Run them with cargo test.

• Write example programs that show how to use your public API as a whole. Run

them with cargo test --examples or cargo run --example <name>.

• Write benchmarks if your code has significant performance requirements. Run

them with cargo bench.

• Write fuzz tests if your code is exposed to untrusted inputs. Run them (continu‐

ously) with cargo fuzz.

234 | Chapter 5: Tooling

Item 31: Take advantage of the tooling ecosystem

The Rust ecosystem has a rich collection of additional tools, which provide function‐

ality above and beyond the essential task of converting Rust into machine code.

When setting up a Rust development environment, you’re likely to want most of the

following basic tools:

• piler

• tool, which manages the installed Rust toolchains

• An IDE with Rust support, or an IDE/editor plug-in like t allows you to quickly navigate around a Rust codebase, and provides autocom-pletion support for writing Rust code

• , for standalone explorations of Rust’s syntax and for sharing the results with colleagues

•

Beyond these basics, Rust includes many tools that help with the wider task of main‐

taining a codebase and improving the quality of that codebase. The

in the official Cargo toolchain cover various essential tasks beyond the basics of cargo build, cargo test, and cargo run, for example:

cargo fmt

Reformats Rust code according to standard conventions.

cargo check

Performs compilation checks without generating machine code, which can be

useful to get a fast syntax check.

cargo clippy

Performs lint checks, detecting inefficient or unidiomatic code ().

cargo doc

Generates documentation (

cargo bench

R).

7 This list may be reduced in some environments. For examtrally controlled toolchain (so no rustup) and integrates with Android’s Soong build system (so no cargo).

Item 31: Take advantage of the tooling ecosystem | 235

cargo update

Upgrades dependencies to the latest versions, selecting versions that are compli‐

ant with semantic versioning (ult.

cargo tree

Displays the dependency graph (

cargo metadata

Emits metadata about the packages that are present in the workspace and in their

dependencies.

The last of these is particularly useful, albeit indirectly: because there’s a tool that

emits information about crates in a well-defined format, it’s much easier for people to

produce other tools that make use of that informa

te, which provides a set of Rust types to hold the metadata information).

described some of the tools that are enabled by this metadata availability,

such as cargo-udeps (which allows detection of unused dependencies) or cargo-

deny (which allows checks for many things, including duplicate dependencies,

allowed licenses, and security advisories).

The extensibility of the Rust toolchain is not just limited to package metadata; the

compiler’s abstract syncrate. This information is wha) so potent but also powers a

variety of other tools:

Shows the complete source code produced by macro expansion, which can be

essential for debugging tricky macro definitions.

Supports the generation and tracking of code coverage information.

Any list of specific tools will always be subjective, out of date, and incomplete; the

more general point is to explore the available tools.

For exam gives dozens of results; some will be inappropriate and some will be abandoned, but some might just do exactly what

you want.

y be helpful if your code needs higher levels of assurance about its correctness.

Finally, a reminder: if a tool is useful on more than a one-off basis, you should inte‐

grate the tool into your CI system (as per ). If the tool is fast and false-positive free, it may also make sense to integrate the tool into your editor or IDE; the Rust

provides links to relevant documentation for this.

236 | Chapter 5: Tooling

Tools to Remember

In addition to the tools that should be configured to run over your codebase regularly

and automatically (), there are various other tools that have been mentioned

elsewhere in the book. For reference, these are collated here—but remember that

there are many more tools out there:

• unsafe code.

• I, for managing dependency updates.

•

that semantic versioning has been done correctly.

•

• tirely dedicated to the use of Clippy.

• corresponding to your source code, as described in

• also men

• covers the use of for auto-generating Rust FFI wrappers from C

code.

Item 32: Set up a continuous integration (CI) system

A CI system is a mechanism for automatically running tools over your codebase,

which is triggered whenever there’s a change to the codebase—or a proposed change

to the codebase.

The recommendation to set up a CI system is not at all Rust-specific, so this Item is a

mélange of general advice mixed with Rust-specific tool suggestions.

CI Steps

Moving to specifics, what kinds of steps should be included in your CI system? The

obvious initial candidates are the following:

• Build the code.

• Run the tests for the code.

In each case, a CI step should run cleanly, quickly, deterministically, and with a zero

false positive rate; more on this in the next section.

Item 32: Set up a continuous integration (CI) system | 237

The “deterministic” requirement also leads to advice for the build step: use rust-

toolchain.toml to specify a fixed version of the toolchain in your CI build.

file indicates which version of Rust should be used to build the code—either a specific version (e.g., 1.70), or a channel (stable, beta, or

nightly) possibly with an optional date (e.g., nightly-2023-09-19 Choosing a

floating channel value here would make the CI results vary as new toolchain versions

are released; a fixed value is more deterministic and allows you to deal with toolchain

upgrades separately.

Throughout this book, various Items have suggested tools and techniques that can

help improve your codebase; wherever possible, these should be included with the CI

system. For example, the two fundamental parts of a CI system previously mentioned

can be enhanced:

• Build the code.

— describes the use of features to conditionally include different chunks of code. If your crate has features, build every valid combination of features in

CI (and realize that this may involve 2N different variants—hence the advice to

avoid feature creep).

— t you consider making library code no_std compatible

where possible. You can be confident that your code is genuinely no_std com‐

patible only if you test no_std compatibility in CI. One option is to make use

of the Rust compiler’s cross-compilation abilities and build for an explicitly

no_std target (e.g., thumbv6m-none-eabi).

— includes a discussion around declaring a minimum supported Rust

version (MSRV) for your code. If you have this, check your MSRV in CI by

including a step that tests with that specific Rust version.

• Run the tests for the code.

— describes the various different styles of test; run all test types in CI.

Some test types are automatically included in cargo test (unit tests, integra‐

tion tests, and doc tests), but other test types (e.g., example programs) may

need to be explicitly triggered.

However, there are other tools and suggestions that can help improve the quality of

your codebase:

8 If your code relies on particular features that are available only in the nightly compiler, a rust-toolchain.toml file also makes that toolchain dependency clear.

238 | Chapter 5: Tooling

• waxes lyrical about the advantages of running Clippy over your code; run Clippy in CI. To ensure that failures are flagged, set the -Dwarnings option (for

example, via cargo clippy -- -Dwarnings).

• ting your public API; use the cargo doc tool to check

that the documentation generates correctly and that any hyperlinks in it resolve

correctly.

• mentions tools such as cargo-udeps and cargo-deny that can help man‐

age your dependency graph; running these as a CI step prevents regressions.

• discusses the Rust tool ecosystem; consider which of these tools are

worth regularly running over your codebase. For example, running rustfmt /

cargo fmt in CI allows detection of code that doesn’t comply with your project’s

style guidelines. To ensure that failures are flagged, set the --check option.

You can also include CI steps that measure particular aspects of your code:

• Generate code coverage stat proportion of your codebase is exercised by your tests.

• Run benchmarks (e.g., with cargo-bench) to measure the performance

of your code on key scenarios. However, note that most CI systems run in shared

environments where external factors can affect the results; getting more reliable

benchmark data is likely to require a more dedicated environment.

These measurement suggestions are a bit more complicated to set up, because the

output of a measurement step is more useful when it’s compared to previous results.

In an ideal world, the CI system would detect when a code change is not fully tested

or has an adverse effect on performance; this typically involves integration with some

external tracking system.

Here are other suggestions for CI steps that may or may not be relevant for your

codebase:

• If your project is a library, recall (from ) that any checked-in Cargo.lock file will be ignored by the users of your library. In theory, the semantic version

constrain) in Cargo.toml should mean that everything works correctly anyway; in practice, consider including a CI step that builds without any local

Cargo.lock, to detect whether the current versions of dependencies still work cor‐

rectly.

• If your project includes any kind of machine-generated resources that are

version-controlled (e.g., code generated from protocol buffer messages by

), then include a CI step that regenerates the resources and checks that there are no differences compared to the checked-in version.

Item 32: Set up a continuous integration (CI) system | 239

• If your codebase includes platform-specific (e.g., #[cfg(target_arch =

"arm")]) code, run CI steps that confirm that the code builds and (ideally) works

on that platform. (The former is easier than the latter because the Rust toolchain

includes support for cross-compilation.)

• If your project manipulates secret values such as access tokens or cryptographic

keys, consider including a CI step that searches the codebase for secrets that have

been inadvertently checked in. This is particularly important if your project is

public (in which case it may be worth moving the check from CI to a

CI checks don’t always need to be integrated with Cargo and the Rust toolchains;

sometimes a simple shell script can give more bang for the buck, particularly when a

codebase has a local convention that’s not universally followed. For example, a code‐

base might include a convention that any panic-inducing method invocation (

) has a special marker comment or that every TODO: comment has an owner (a per‐

son or a tracking ID), and a shell script is ideal for checking this.

Finally, consider examining the CI systems of public Rust projects to get ideas for

additional CI steps that might be useful for your project. For exam

that includes many steps that may provide inspiration.

CI Principles

Moving from the specific to the general, there are some overall principles that should

guide the details of your CI system.

The most fundamental principle is don’t waste the time of humans. If a CI system

unnecessarily wastes people’s time, they will start looking for ways to avoid it.

The most annoying waste of an engineer’s time is a flaky test: sometimes it passes and

sometimes it fails, even when the setup and codebase are identical. Whenever possi‐

ble, be ruthless with flaky tests: hunt them down, and put in the time up front to

investigate and fix the cause of the flakiness—it will pay for itself in the long run.

Another common waste of engineering time is a CI system that takes a long time to

run and that runs only after a request for a code review has been triggered. In this

situation, there’s the potential to waste two people’s time: both the author and also the

code reviewer, who may spend time spotting and pointing out issues with the code

that the CI bots could have flagged.

To help with this, try to make it easy to run the CI checks manually, independent

from the automated system. This allows engineers to get into the habit of triggering

them regularly so that code reviewers never even see problems that the CI would have

flagged. Better still, make the integration even more continuous by incorporating

240 | Chapter 5: Tooling

some of the tools into your editor or IDE setup so that (for example) poorly format‐

ted code never even makes it to disk.

This may also require splitting the checks up if there are time-consuming tests that

rarely find problems but are there as a backstop to prevent obscure scenarios

breaking.

More generally, a large project may need to divide up its CI checks according to the

cadence at which they are run:

• Checks that are integrated into each engineer’s development environment (e.g.,

rustfmt)

• Checks that run on every code review request (e.g., cargo build, cargo clippy)

and are easy to run manually

• Checks that run on every change that makes it to the main branch of the project

(e.g., full cargo test in all supported environments)

• Checks that run at scheduled intervals (e.g., daily or weekly), which can catch

rare regressions after the fact (e.g., long-running integration tests and benchmark

comparison tests)

• Checks that run on the current code at all times (e.g., fuzz tests)

It’s important that the CI system be integrated with whatever code review system is

used for your project so that a code review can clearly see a green set of checks and be

confident that its code review can focus on the important meaning of the code, not on

trivial details.

This need for a green build also means that there can be no exceptions to whatever

checks your CI system has put in place. This is worthwhile even if you have to work

around an occasional false positive from a tool; once your CI system has an accepted

failure (“Oh, everyone knows that test never passes”), then it’s vastly harder to spot

new regressions.

ing the bug. The same principle applies to your CI system: when you discover process

problems add a CI step that detects a process issue, before fixing the issue. For example,

if you discover that some auto-generated code has gotten out of sync with its source,

add a check for this to the CI system. This check will initially fail but then turn green

once the problem is solved—giving you confidence that this category of process error

will not occur again in the future.

Item 32: Set up a continuous integration (CI) system | 241

Public CI Systems

If your codebase is open source and visible to the public, there are a few extra things

to think about with your CI system.

First is the good news: there are lots of free, reliable options for building a CI system

for open source code. At the time of writing, are probably the best choice, but it’s far from the only choice, and more systems appear all the time.

Second, for open source code it’s worth bearing in mind that your CI system can act

as a guide for how to set up any prerequisites needed for the codebase. This isn’t a

concern for pure Rust crates, but if your codebase requires additional dependencies—

databases, alternative toolchains for FFI code, configuration, etc.—then your CI

scripts will be an existence proof of how to get all of that working on a fresh system.

Encoding these setup steps in reusable scripts allows both the humans and the bots to

get a working system in a straightforward way.

Finally, there’s bad news for publicly visible crates: the possibility of abuse and attacks.

This can range from attempts to perform cryptocurrency mining in your CI system

ttacks, and worse. To mitigate these risks, consider these guidelines:

• Restrict access so that CI scripts run automatically only for known collaborators

and have to be triggered manually for new contributors.

• Pin the versions of any external scripts to particular versions, or (better yet) spe‐

cific known hashes.

• Closely monitor any integration steps that need more than just read access to the

codebase.

242 | Chapter 5: Tooling

CHAPTER 6

Beyond Standard Rust

The Rust toolchain includes support for a much wider variety of environments than

just pure Rust application code, running in userspace:

• It supports cross-compilation, where the system running the toolchain (the host)

is not the same as the system that the compiled code will run on (the target),

which makes it easy to target embedded systems.

• It supports linking with code compiled from languages other than Rust, via built-

in FFI capabilities.

• It supports configurations without the full standard library std, allowing systems

that do not have a full operating system (e.g., no filesystem, no networking) to be

targeted.

• It even supports configurations that do not support heap allocation but only have

a stack (by omitting use of the standard alloc library).

These nonstandard Rust environments can be harder to work in and may be less

safe—they can even be unsafe—but they give more options for getting the job done.

This chapter of the book discusses just a few of the basics for working in these envi‐

ronments. Beyond these basics, you’ll need to consult more environment-specific

documentation (such as the ).

Item 33: Consider making library code

no_std compatible

Rust comes with a standard library called std, which includes code for a wide variety

of common tasks, from standard data structures to networking, from multithreading

support to file I/O. For convenience, several of the items from std are automatically

243

imported into your program, via the : a set of common use statements that make common types available without needing to use their full names (e.g., Vec

rather than std::vec::Vec).

Rust also supports building code for environments where it’s not possible to provide

this full standard library, such as bootloaders, firmware, or embedded platforms in

general. Crates indicate that they should be built in this way by including the

#![no_std] crate-level attribute at the top of src/lib.rs.

This Item explores what’s lost when building for no_std and what library functions

you can still rely on—which turns out to be quite a lot.

However, this Item is specifically about no_std support in library code. The difficul‐

ties of making a no_std binary are beyond this text, so the focus here is how to make sure that library code is available for those poor souls who do have to work in such a

minimal environment.

core

Even when building for the most restricted of platforms, many of the fundamental

types from the standard library are still available. For exam

are still available, albeit under a different name, as are various flavors of

The different names for these fundamental types start with core::, indicating that

they come from the core library, a standard library that’s available even in the most

no_std of environments. These core:: types behave exactly the same as the equiva‐

lent std:: types, because they’re actually the same types—in each case, the std:: ver‐

sion is just a re-export of the underlying core:: type.

This means that there’s a quick and dirty way to tell if a std:: item is available in a

no_std environmen page for the std item you’re interested in and follow the “source” link (at the top righ If that takes you to a src/

core/… location, then the item is available under no_std via core::.

The types from core are available for all Rust programs automatically. However, they

typically need to be explicitly used in a no_std environment, because the std prelude

is absent.

1 See or Philipp Oppermann’tion about what’s involved in creating a no_std binary.

2 Be aware that this can occasionally go wrong. For example, at the time of writing, the Error trait is defined in

is stable.

244 | Chapter 6: Beyond Standard Rust

In practice, relying purely on core is too limiting for many environments, even

no_std ones. A core (pun intended) constraint of core is that it performs no heap

al ocation.

Although Rust excels at putting items on the stack and safely tracking the corre‐

t standard data

structures—vectors, maps, sets—can’t be provided, because they need to allocate heap

space for their contents. In turn, this also drastically reduces the number of available

crates that work in this environment.

alloc

However, if a no_std environment does support heap allocation, then many of the

standard data structures from std can still be supported. These data structures, along

with other allocation-using functionality, are grouped into Rust’y.

As with core, these alloc variants are actually the same types under the covers. For

exam.

A no_std Rust crate needs to explicitly opt in to the use of alloc, by adding an

extern crate alloc; declaration to src/lib.rs

//! My `no_std` compatible crate.

#![no_std]

// Requires àlloc`.

extern crate alloc;

Pulling in the alloc crate enables many familiar friends, now addressed by their true

names:

•

With these things available, it becomes possible for many library crates to be no_std

compatible—for example, if a library doesn’t involve I/O or networking.

3 Prior to Rust 2018, extern crate declarations were used to pull in dependencies. This is now entirely handled by Cargo.toml, but the extern crate mechanism is still used to pull in those parts of the Rust standard librart are optional in no_std environments.

Item 33: Consider making library code no_std compatible | 245

There’s a notable absence from the data structures that alloc makes available,

though— are specific to std, not alloc. That’s because these hash-based containers rely on random seeds to protect against hash

collision attacks, but safe random number generation requires assistance from the

operating system—which alloc can’t assume exists.

Another notable absence is synchronization functionality like , which is required for multithreaded code (). These types are specific to std

because they rely on OS-specific synchronization primitives, which aren’t available

without an OS. If you need to write code that is both no_std and multithreaded,

third-party crates such as

Writing Code for no_std

The previous sections made it clear that for some library crates, making the code

no_std compatible just involves the following:

• Replacing std:: types with identical core:: or alloc:: crates (which requires

use of the full type name, due to the absence of the std prelude)

• Shifting from HashMap/HashSet to BTreeMap/BTreeSet

However, this only makes sense if all of the crates tha) are

also no_std compatible—there’s no point in becoming no_std compatible if any user

of your crate is forced to link in std anyway.

There’s also a catch here: the Rust compiler will not tell you if your no_std crate

depends on a std-using dependency. This means that it’s easy to undo the work of

making a crate no_std compatible—all it takes is an added or updated dependency

that pulls in std.

To protect against this, add a CI check for a no_std build so that your CI system (

) will warn you if this happens. The Rust toolchain supports cross-compilation out

of the box, so this can be as sim for a target system that does not support std (e.g., --target thumbv6m-none-eabi); any code that inadvertently requires std will then fail to compile for this target.

So: if your dependencies support it, and the simple transformations above are all

that’s needed, then consider making library code no_std compatible. When it is possi‐

ble, it’s not much additional work, and it allows for the widest reuse of the library.

If those transformations don’t cover all of the code in your crate but the parts that

aren’t covered are only a small or well-contained fraction of the code, then consider

adding a feature () to your crate that turns on just those parts.

246 | Chapter 6: Beyond Standard Rust

Such a feature is conventionally named either std, if it enables use of std-specific

functionality:

#![cfg_attr(not(feature = "std"), no_std)]

or alloc, if it turns on use of alloc-derived functionality:

#[cfg(feature = "alloc")]

extern crate alloc;

Note that there’s a trap for the unwary here: don’t have a no_std feature that disables

functionality requiring std (or a no_alloc feature similarly). As explained in , features need to be additive, and there’s no way to combine two users of the crate

where one configures no_std and one doesn’t—the former will trigger the removal of

code that the latter relies on.

As ever with feature-gated code, make sure that your CI system () builds all

the relevant combinations—including a build with the std feature disabled on an

explicitly no_std platform.

Fallible Allocation

The earlier sections of this Item considered two different no_std environments: a

fully embedded environment with no heap allocation whatsoever (core) and a more

generous environment where heap allocation is allowed (core + alloc).

However, there are some important environments that fall between these two

camps— in particular, those where heap allocation is possible but may fail because

there’s a limited amount of heap.

Unfortunately, Rust’s standard alloc library includes a pervasive assumption that

heap allocations cannot fail, and that’s not always a valid assumption.

Even a simple use of alloc::vec::Vec could potentially allocate on every line:

let mut v = Vec::new();

v.push(1); // might allocate

v.push(2); // might allocate

v.push(3); // might allocate

v.push(4); // might allocate

None of these operations returns a Result, so what happens if those allocations fail?

The answer depends on the toolchain, target, and but is likely to descend into panic! and program termination. There is certainly no answer that

allows an allocation failure on line 3 to be handled in a way that allows the program

to move on to line 4.

Item 33: Consider making library code no_std compatible | 247

This assumption of infal ible al ocation gives good ergonomics for code that runs in a

“normal” userspace, where there’s effectively infinite memory—or at least where run‐

ning out of memory indicates that the computer as a whole has bigger problems

elsewhere.

However, infallible allocation is utterly unsuitable for code that needs to run in envi‐

ronments where memory is limited and programs are required to cope. This is a

(rare) area where there’s better support in older, less memory-safe, languages:

• C is sufficiently low-level that allocations are manual, and so the return value

from malloc can be checked for NULL.

• C++ can use its exception mechanism to catch allocation failures in the form of

Historically, the inability of Rust’s standard library to cope with failed allocation was

flagged in some high-profile contexts (such as the , Android, and the

), and so work to fix the omission is ongoing.

alternatives to many of the collection APIs that involve allocation. This generally adds

a try_<operation> variant that results in a Result<_, AllocError>; for example:

• vailable as an alternative to

• vailable (with the nightly toolchain) as an alternative to

These fallible APIs only go so far; for example, there is (as yet) no fallible equivalent

t assembles a vector may need to do careful calculations to ensure that allocation errors can’t happen:

fn try_build_a_vec() -> Result<Vec< u8>, String> {

let mut v = Vec::new();

// Perform a careful calculation to figure out how much space is needed,

// here simplified to...

let required_size = 4;

v.try_reserve(required_size)

.map_err(|_e| format!("Failed to allocate {} items!", required_size))?;

// We now know that it's safe to do:

4 It’s also possible to add the overload to calls to new and check for nullptr return values. However, there are still con that allocate under the covers and that can therefore signal allocation failure only via an exception.

248 | Chapter 6: Beyond Standard Rust

v.push(1);

v.push(2);

v.push(3);

v.push(4);

Ok(v)

}

As well as adding fallible allocation entrypoints, it’s also possible to disable infal ible

allocation operations, by turning off the config flag (which is on by default). Environments with limited heap (such as the Linux kernel) can

explicitly disable this flag, ensuring that no use of infallible allocation can inadver‐

tently creep into the code.

Things to Remember

• Many items in the std crate actually come from core or alloc.

• As a result, making library code no_std compatible may be more straightforward

than you might think.

• Confirm that no_std code remains no_std compatible by checking it in CI.

• Be aware that working in a limited-heap environment currently has limited

library support.

Item 34: Control what crosses FFI boundaries

Even though Rust comes with a comprehensive and a burgeoning

ust code in the world than there is Rust code.

As with other recent languages, Rust helps with this problem by offering a foreign

function interface (FFI) mechanism, which allows interoperation with code and data

structures written in different languages—despite the name, FFI is not restricted to

just functions. This opens up the use of existing libraries in different languages, not

just those that have succumbed to the Rust community’s efforts to “rewrite it in Rust”

(RiiR).

The default target for Rust’s interoperability is the C programming language, which is

the same interop target that other languages aim at. This is partly driven by the ubiq‐

uity of C libraries but is also driven by simplicity: C acts as a “least common denomi‐

nator” of interoperability, because it doesn’t need toolchain support of any of the

more advanced features that would be necessary for compatibility with other lan‐

guages (e.g., garbage collection for Java or Go, exceptions and templates for C++,

function overrides for Java and C++, etc.).

Item 34: Control what crosses FFI boundaries | 249

However, that’s not to say that interoperability with plain C is simple. By including

code written in a different language, all of the guarantees and protections that Rust

offers are up for grabs, particularly those involving memory safety.

As a result, FFI code in Rust is automatically unsafe has to be bypassed. This Item explores some replacemen

some tooling that helps to avoid some (but not all) of the footguns involved in work‐

ing with FFI. (The of the also contains helpful advice and information.)

Invoking C Functions from Rust

The simplest FFI interaction is for Rust code to invoke a C function, taking “immedi‐

ate” arguments that don’t involve pointers, references, or memory addresses:

/* File lib.c */

#include "lib.h"

/* C function definition. */

int add(int x, int y) {

return x + y;

}

This C code provides a definition of the function and is typically accompanied by a

header file that provides a declaration of the function, which allows other C code to

use it:

/* File lib.h */

#ifndef LIB_H

#define LIB_H

/* C function declaration. */

int add(int x, int y);

#endif /* LIB_H */

The declaration roughly says: somewhere out there is a function called add, which

takes two integers as input and returns another integer as output. This allows C code

to use the add function, subject to a promise that the actual code for add will be pro‐

vided at a later date—specifically, at link time.

Rust code that wants to use add needs to have a similar declaration, with a similar

purpose: to describe the signature of the function and to indicate that the corre‐

sponding code will be available later:

use std::os::raw::c_int;

extern "C" {

pub fn add(x: c_int, y: c_int) -> c_int;

}

250 | Chapter 6: Beyond Standard Rust

The declaration is marked as extern "C" to indicate that an external C library will

extern "C" marker also automatically marks

Linking logistics

The details of how the C toolchain generates an external C library—and its format—

are environment-specific and beyond the scope of a Rust book like this. However, one

simple variant that’s common on Unix-like systems is a static library file, which will

normally have the form lib<something>.a (e.g., libcffi.a) and which can be generated

tool.

The Rust build system then needs an indication of which library holds the relevant C

in the code:

#[link(name = "cffi")] // An external library likèlibcffi.aìs needed

extern "C" {

// ...

}

or via a cargo

// File build.rs

fn main() {

// An external library likèlibcffi.aìs needed

println!("cargo:rustc-link-lib=cffi");

}

The latter option is more flexible, because the build script can examine its environ‐

ment and behave differently depending on what it finds.

In either case, the Rust build system is also likely to need information about how to

find the C library, if it’s not in a standard system location. This can be specified by

having a build script tha instruction to cargo, containing the library location:

// File build.rs

fn main() {

// ...

// Retrieve the location of `Cargo.toml`.

let dir = std::env::var("CARGO_MANIFEST_DIR").unwrap();

// Look for native libraries one directory higher up.

println!(

"cargo:rustc-link-search=native={}",

5 If the FFI functionality you want to use is part of the standard C library, then you don’t need to create these declara crate already provides them.

6 Cargo.toml manifest can help to make this dependency visible to Cargo.

Item 34: Control what crosses FFI boundaries | 251

std::path::Path::new(&dir).join("..").display()

);

}

Code concerns

Returning to the source code, even this simplest of examples comes with some

gotchas. First, use of FFI functions is automatically unsafe:

let x = add(1, 1);

error[E0133]: call to unsafe function is unsafe and requires unsafe function

or block

--> src/main.rs:176:13

176 | let x = add(1, 1);

| ^^^^^^^^^ call to unsafe function

= note: consult the function's documentation for information on how to

avoid undefined behavior

and so needs to be wrapped in unsafe { }.

The next thing to watch out for is the use of C’s int type, represented as

. How big is an int? It’s probably true that the following two things are the same:

• The size of an int for the toolchain that compiled the C library

• The size of a std::os::raw::c_int for the Rust toolchain

But why take the chance? Prefer sized types at FFI boundaries, where possible—which

for C means making use of the types (e.g., uint32_t) defined in <stdint.h>. How‐

ever, if you’re dealing with an existing codebase that already uses int/long/size_t,

this may be a luxury you don’t have.

The final practical concern is that the C code and the equivalent Rust declaration

need to exactly match. Worse still, if there’s a mismatch, the build tools will not emit a

warning—they will just silently emit incorrect code.

discusses the use of the bindgen tool to prevent this problem, but it’s worth

understanding the basics of what’s going on under the covers to understand why the

build tools can’t detect the problem on their own. In particular, it’s worth understand‐

ing the basics of name mangling.

252 | Chapter 6: Beyond Standard Rust

Name mangling

Compiled languages generally support separate compilation, where different parts of

the program are converted into machine code as separate chunks (object files), which

can then be combined into a complete program by the linker. This means that if only

one small part of the program’s source code changes, only the corresponding object

file needs to be regenerated; the link step then rebuilds the program, combining both

the changed object and all the other unmodified objects.

The

vide definitions of functions and variables, and other object files have placeholder

markers indicating that they expect to use a definition from some other object, but it

wasn’t available at compile time. The linker combines the two: it ensures that any

placeholder in the compiled code is replaced with a reference to the corresponding

concrete definition.

The linker performs this correlation between the placeholders and the definitions by

simply checking for a matching name, meaning that there is a single global name‐

space for all of these correlations.

Historically, this was fine for linking C language programs, where a single name

could not be reused in any way—the name of a function is exactly what appears in the

object file. (As a result, a common convention for C libraries is to manually add a

prefix to all symbols so that lib1_process doesn’t clash with lib2_process.)

However, the introduction of C++ caused a problem because C++ allows overridden

definitions with the same name:

// C++ code

namespace ns1 {

int32_t add(int32_t a, int32_t b) { return a+b; }

int64_t add(int64_t a, int64_t b) { return a+b; }

}

namespace ns2 {

int32_t add(int32_t a, int32_t b) { return a+b; }

}

The solution for this is name mangling: the

to the name that’s emitted in the object file, and the linker continues to perform its simple-minded 1:1 correlation between

placeholders and definitions.

On Unix-like systems, the t the linker works with:

% nm ffi-lib.o | grep add # what the linker sees for C

0000000000000000 T _add

% nm ffi-cpp-lib.o | grep add # what the linker sees for C++

Item 34: Control what crosses FFI boundaries | 253

0000000000000000 T __ZN3ns13addEii

0000000000000020 T __ZN3ns13addExx

0000000000000040 T __ZN3ns23addEii

In this case, it shows three mangled symbols, all of which refer to code (the T indi‐

cates the text section of the binary, which is the traditional name for where code lives).

te this back into what would be visible in C++ code:

% nm ffi-cpp-lib.o | grep add | c++filt # what the programmer sees

0000000000000000 T ns1::add(int, int)

0000000000000020 T ns1::add(long long, long long)

0000000000000040 T ns2::add(int, int)

Because the mangled name includes type information, the linker can and will com‐

plain about any mismatch in the type information between placeholder and defini‐

tion. This gives some measure of type safety: if the definition changes but the place

using it is not updated, the toolchain will complain.

Returning to Rust, extern "C" foreign functions are implicitly marked as #[no_man

gle], and the symbol in the object file is the bare name, exactly as it would be for a C

program. This means that the type safety of function signatures is lost: because the

linker sees only the bare names for functions, if there are any differences in type

expectations between definition and use, the linker will carry on regardless and prob‐

lems will arise only at runtime.

Accessing C Data from Rust

The C add example in the previous section passed the simplest possible type of data

back and forth between Rust and C: an integer that fits in a machine register. Even so,

there were still things to be careful about, so it’s no surprise then that dealing with

more complex data structures also has wrinkles to watch out for.

Both C and Rust use the struct to combine related data into a single data structure.

However, when a struct is realized in memory, the two languages may well choose to

put different fields in different places or even in different orders (the To prevent mismatches, use #[repr(C)] for Rust types used in FFI; this

/* C data structure definition. */

/* Changes here must be reflected in lib.rs. */

typedef struct {

uint8_t byte;

uint32_t integer;

} FfiStruct;

// Equivalent Rust data structure.

// Changes here must be reflected in lib.h / lib.c.

#[repr(C)]

pub struct FfiStruct {

254 | Chapter 6: Beyond Standard Rust

pub byte: u8,

pub integer: u32,

}

The structure definitions have a comment to remind the humans involved that the

two places need to be kept in sync. Relying on the constant vigilance of humans is

likely to go wrong in the long term; as for function signatures, it’s better to automate

this synchronization between the two languages via a tool like bindgen).

One particular type of data that’s worth thinking about carefully for FFI interactions

is strings. The default definitions of what makes up a string are somewhat different

between C and Rust:

• A Rust holds UTF-8 encoded data, possibly including zero bytes, with an explicitly known length.

• A C string (char *) holds byte values (which may or may not be signed), with its

length implicitly determined by the first zero byte (\0) found in the data.

Fortunately, dealing with C-style strings in Rust is comparatively straightforward,

because the Rust library designers have already done the heavy lifting by providing a

pair of types to encode them. Use the type to hold (owned) strings that need to be interoperable with C, and use the corresponding

borrowed string values. The latter type includes the method, which can be used to pass the string’s contents to any FFI function that’s expecting a const char* C

string. Note that the const is important: this can’t be used for an FFI function that

needs to modify the contents (char *) of the string that’s passed to it.

Lifetimes

Most data structures are too big to fit in a register and so have to be held in memory

instead. That in turn means that access to the data is performed via the location of

that memory. In C terms, this means a pointer: a number that encodes a memory

address—with no other semantics a

In Rust, a location in memory is generally represented as a reference, and its numeric

value can be extracted as a raw pointer, ready to feed into an FFI boundary:

extern "C" {

// C function that does some operation on the contents

// of an `FfiStruct`.

pub fn use_struct(v: *const FfiStruct) -> u32;

}

let v = FfiStruct {

byte: 1,

integer: 42,

Item 34: Control what crosses FFI boundaries | 255

};

let x = unsafe { use_struct(&v as *const FfiStruct) };

However, a Rust reference comes with additional constraints around the lifetime of

the associated chunk of memoryts get lost in

the conversion to a raw pointer.

As a result, the use of raw pointers is inherently unsafe, as a marker that Here Be

Dragons: the C code on the other side of the FFI boundary could do any number of

things that will destroy Rust’s memory safety:

• The C code could hang onto the value of the pointer and use it at a later point

when the associated memory has either been freed from the heap or reused on

the stack ( use-after-free).

• The C code could decide to cast away the const-ness of a pointer that’s passed to

it and modify data that Rust expects to be immutable.

• The C code is not subject to Rust’s Mutex protections, so the specter of data races

) rears its ugly head.

• The C code could mistakenly return associated heap memory to the allocator (by

calling C’s free() library function), meaning that the Rust code might now be

performing use-after-free operations.

All of these dangers form part of the cost-benefit analysis of using an existing library

via FFI. On the plus side, you get to reuse existing code that’s (presumably) in good

working order, with only the need to write (or auto-generate) corresponding declara‐

tions. On the minus side, you lose the memory protections that are a big reason to

use Rust in the first place.

As a first step to reduce the chances of memory-related problems, al ocate and free

memory on the same side of the FFI boundary. For example, this might appear as a

symmetric pair of functions:

/* C functions. */

/* Allocate an `FfiStruct` */

FfiStruct* new_struct(uint32_t v);

/* Free a previously allocated `FfiStruct` */

void free_struct(FfiStruct* s);

with corresponding Rust FFI declarations:

extern "C" {

// C code to allocate an `FfiStruct`.

pub fn new_struct(v: u32) -> *mut FfiStruct;

// C code to free a previously allocated `FfiStruct`.

pub fn free_struct(s: *mut FfiStruct);

}

256 | Chapter 6: Beyond Standard Rust

To make sure that allocation and freeing are kept in sync, it can be a good idea to

implement an RAII wrapper that automatically prevents C-allocated memory from

being leaked (pper structure owns the C-allocated memory:

/// Wrapper structure that owns memory allocated by the C library.

struct FfiWrapper {

// Invariant: inner is non-NULL.

inner: *mut FfiStruct,

}

and the Drop implementation returns that memory to the C library to avoid the

potential for leaks:

/// Manual implementation of [`Drop`], which ensures that memory allocated

/// by the C library is freed by it.

impl Drop for FfiWrapper {

fn drop(& mut self) {

// Safety: ìnnerìs non-NULL, and besides `free_struct()` copes

// with NULL pointers.

unsafe { free_struct(self.inner) }

}

The same principle applies to more than just heap memory: implement Drop to apply

RAII to FFI-derived resources—open files, database connections, etc. (see ).

Encapsulating the interactions with the C library into a wrapper struct also makes it

possible to catch some other potential footguns, for example, by transforming an

otherwise invisible failure into a Result:

type Error = String;

impl FfiWrapper {

pub fn new(val: u32) -> Result<Self, Error> {

let p: *mut FfiStruct = unsafe { new_struct(val) };

// Raw pointers are not guaranteed to be non-NULL.

if p.is_null() {

Err("Failed to get inner struct!".into())

} else {

Ok(Self { inner: p })

}

The wrapper structure can then offer safe methods that allow use of the C library’s

functionality:

impl FfiWrapper {

pub fn set_byte(& mut self, b: u8) {

// Safety: relies on invariant that ìnnerìs non-NULL.

let r: & mut FfiStruct = unsafe { & mut *self.inner };

r.byte = b;

Item 34: Control what crosses FFI boundaries | 257

}

Alternatively, if the underlying C data structure has an equivalent Rust mapping, and

if it’s safe to directly manipulate that data structure, then implementations of the

AsRef and AsMut

impl AsMut<FfiStruct> for FfiWrapper {

fn as_mut(& mut self) -> & mut FfiStruct {

// Safety: ìnnerìs non-NULL.

unsafe { & mut *self.inner }

}

let mut wrapper = FfiWrapper::new(42).expect("real code would check");

// Directly modify the contents of the C-allocated data structure.

wrapper.as_mut().byte = 12;

This example illustrates a useful principle for dealing with FFI: encapsulate access to

an unsafe FFI library inside safe Rust code. This allows the rest of the application to

and avoid writing unsafe code. It also concentrates all of the dangerous code in one place, which you can then study (and test) carefully to

uncover problems—and treat as the most likely suspect when something does go

wrong.

Invoking Rust from C

What counts as “foreign” depends on where you’re standing: if you’re writing an

application in C, then it may be a Rust library that’s accessed via a foreign function

interface.

The basics of exposing a Rust library to C code are similar to the opposite direction:

• Rust functions that are exposed to C need an extern "C" marker to ensure

they’re C-compatible.

• Rust symbols are name mangled by default (like C++), so function definitions

also need a #[no_mangle] attribute to ensure that they’re accessible via a simple

name. This in turn means that the function name is part of a single global name‐

space that can clash with any other symbol defined in the program. As such, con‐

sider using a prefix for exposed names to avoid ambiguities (mylib_…).

• Data structure definitions need the #[repr(C)] attribute to ensure that the layout

of the contents is compatible with an equivalent C data structure.

7 A Rust equivalent of the c++filt tool for translating mangled names back to programmer-visible names is

command.

258 | Chapter 6: Beyond Standard Rust

Also like the opposite direction, more subtle problems arise when dealing with point‐

ers, references, and lifetimes. A C pointer is different from a Rust reference, and you

forget that at your peril:

U N D E S I R E D B E H A V I O R

#[no_mangle]

pub extern "C" fn add_contents(p: *const FfiStruct) -> u32 {

// Convert the raw pointer provided by the caller into

// a Rust reference.

let s: & FfiStruct = unsafe { &*p }; // Ruh-roh

s.integer + s.byte as u32

}

/* C code invoking Rust. */

uint32_t result = add_contents(NULL); // Boom!

When you’re dealing with raw pointers, it’s your responsibility to ensure that any use

of them complies with Rust’s assumptions and guarantees around references:

#[no_mangle]

pub extern "C" fn add_contents_safer(p: *const FfiStruct) -> u32 {

let s = match unsafe { p.as_ref() } {

Some(r) => r,

None => return 0, // Pesky C code gave us a NULL.

};

s.integer + s.byte as u32

}

In these examples, the C code provides a raw pointer to the Rust code, and the Rust

code converts it to a reference in order to operate on the structure. But where did that

pointer come from? What does the Rust reference refer to?

The very first example in showed how Rust’s memory safety prevents refer‐

ences to expired stack objects from being returned; those problems reappear if you

hand out a raw pointer:

U N D E S I R E D B E H A V I O R

impl FfiStruct {

pub fn new(v: u32) -> Self {

Self {

byte: 0,

integer: v,

}

// No compilation errors here.

#[no_mangle]

Item 34: Control what crosses FFI boundaries | 259

pub extern "C" fn new_struct(v: u32) -> *mut FfiStruct {

let mut s = FfiStruct::new(v);

& mut s // return raw pointer to a stack object that's about to expire!

}

Any pointers passed back from Rust to C should generally refer to heap memory, not

stack memory. But naively trying to put the object on the heap via a Box doesn’t help:

U N D E S I R E D B E H A V I O R

// No compilation errors here either.

#[no_mangle]

pub extern "C" fn new_struct_heap(v: u32) -> *mut FfiStruct {

let s = FfiStruct::new(v); // creatèFfiStructòn stack

let mut b = Box::new(s); // movèFfiStruct` to heap

& mut *b // return raw pointer to a heap object that's about to expire!

}

The owning Box is on the stack, so when it goes out of scope, it will free the heap

object and the returned raw pointer will again be invalid.

, which abnegates responsibility for the heap object, effectively “forgetting” about it:

#[no_mangle]

pub extern "C" fn new_struct_raw(v: u32) -> *mut FfiStruct {

let s = FfiStruct::new(v); // creatèFfiStructòn stack

let b = Box::new(s); // movèFfiStruct` to heap

// Consume thèBoxànd take responsibility for the heap memory.

Box::into_raw(b)

}

This raises the question of how the heap object now gets freed. The previous advice

was to perform allocation and freeing of memory on the same side of the FFI bound‐

ary, which means that we need to persuade the Rust side of things to do the freeing.

The corresponding tool for the job is Box from a raw pointer:

#[no_mangle]

pub extern "C" fn free_struct_raw(p: *mut FfiStruct) {

if p.is_null() {

return; // Pesky C code gave us a NULL

}

let _b = unsafe {

// Safety: p is known to be non-NULL

Box::from_raw(p)

260 | Chapter 6: Beyond Standard Rust

};

} // `_b` drops at end of scope, freeing thèFfiStruct`

This still leaves the Rust code at the mercy of the C code; if the C code gets confused

and asks Rust to free the same pointer twice, Rust’s allocator is likely to become ter‐

minally confused.

That illustrates the general theme of this Item: using FFI exposes you to risks that

aren’t present in standard Rust. That may well be worthwhile, as long as you’re aware

of the dangers and costs involved. Controlling the details of what passes across the

FFI boundary helps to reduce that risk but by no means eliminates it.

Controlling the FFI boundary for C code invoking Rust also involves one final con‐

cern: if your R prevent panic!s

from crossing the FFI boundary, as this always results in undefined behavior—unde‐

fined but

Things to Remember

• Interfacing with code in other languages uses C as a least common denominator,

which means that symbols all live in a single global namespace.

• Minimize the chances of problems at the FFI boundary by doing the following:

— Encapsulating unsafe FFI code in safe wrappers

— Allocating and freeing memory consistently on one side of the boundary or

the other

— Making data structures use C-compatible layouts

— Using sized integer types

— Using FFI-related helpers from the standard library

— Preventing panic!s from escaping from Rust

Item 35: Prefer bindgen to manual FFI mappings

discussed the mechanics of invoking C code from a Rust program, describing

how declarations of C structures and functions need to have an equivalent Rust decla‐

ration to allow them to be used over FFI. The C and Rust declarations need to be kept

in sync, and t the toolchain wouldn’t help with this—mis‐

matches would be silently ignored, hiding problems that would arise later.

8 Note that Rust version 1.71 includes the , which makes some cross-language unwinding functionality possible.

Item 35: Prefer bindgen to manual FFI mappings | 261

Keeping two things perfectly in sync sounds like a good target for automation, and

the Rust project provides the right tool for the job: . The primary function of bindgen is to parse a C header file and emit the corresponding Rust declarations.

Taking some of the example C declarations from

/* File lib.h */

#include <stdint.h>

typedef struct {

uint8_t byte;

uint32_t integer;

} FfiStruct;

int add(int x, int y);

uint32_t add32(uint32_t x, uint32_t y);

the bindgen tool can be manually invoked (or invoked by a build.rs

create a corresponding Rust file:

% bindgen --no-layout-tests \

--allowlist-function="add.*" \

--allowlist-type=FfiStruct \

-o src/generated.rs \

lib.h

The generated Rust is identical to the handcrafted declarations in

/* automatically generated by rust-bindgen 0.59.2 */

#[repr(C)]

#[derive(Debug, Copy, Clone)]

pub struct FfiStruct {

pub byte: u8,

pub integer: u32,

}

extern "C" {

pub fn add(

x: ::std::os::raw::c_int,

y: ::std::os::raw::c_int,

) -> ::std::os::raw::c_int;

}

extern "C" {

pub fn add32(x: u32, y: u32) -> u32;

}

and can be pulled into R:

// Include the auto-generated Rust declarations.

include!("generated.rs");

262 | Chapter 6: Beyond Standard Rust

For anything but the most trivial FFI declarations, use bindgen to generate Rust bind‐

ings for C code—this is an area where machine-made, mass-produced code is defi‐

nitely preferable to artisanal handcrafted declarations. If a C function definition

changes, the C compiler will complain if the C declaration no longer matches the C

definition, but nothing will complain that a handcrafted Rust declaration no longer

matches the C declaration; auto-generating the Rust declaration from the C declara‐

tion ensures that the two stay in sync

This also means that the bindgen step is an ideal candidate to include in a CI system

ted code is included in source control, the CI system can error out if a freshly generated file doesn’t match the checked-in version.

The bindgen tool comes into its own when you’re dealing with an existing C codebase

that has a large API. Creating Rust equivalents to a big lib_api.h header file is manual

and tedious, and therefore error-prone—and as noted, many categories of mismatch

error will not be detected by the toolchain. bindgen of that allow specific subsets of an API to be targeted (such as the --allowlist-function

and --allowlist-type options previously illustrated).

This also allows a layered approach for exposing an existing C library in Rust; a com‐

mon convention for wrapping some xyzzy library is to have the following:

• An xyzzy-sys crate that holds (just) the bindgen-erated code—use of which is

necessarily unsafe

• An xyzzy crate that encapsulates the unsafe code and provides safe Rust access

to the underlying functionality

This concentrates the unsafe code in one layer and allows the rest of the program to

follow the advice in .

Beyond C

The bindgen tool has the ability to but only a subset and in a limited fashion. For better (but still somewhat limited) integration, consider using

the ate for C++/Rust interoperation. Instead of generating Rust code from C++

declarations, cxx takes the approach of auto-generating both Rust and C++ code from

a common schema, allowing for tighter integration.

9 The example also used the --no-layout-tests option to keep the output simple; by default, the generated

code will include #[test] code to check that structures are indeed laid out correctly.

Item 35: Prefer bindgen to manual FFI mappings | 263

Afterword

Hopefully the advice, suggestions, and information in this book will help you become

a fluent, productive Rust programmerdescribes, this book is intended

to cover the second step in this process, after you’ve learned the basics from a core

Rust reference book. But there are more steps you can take and directions to explore:

• Async Rust is not covered in this book but is likely to be needed for efficient, con‐

current server-side applications. The provides an introduction to async by Maxwell Flitton and Caroline Morton (O’Reilly, 2024) may also help.

• Moving in the other direction, bare-metal Rust might align with your interests

and requirements. This goes beyond the introduction to no_std to a

world where there’s no operating system and no allocation. The

troduction here.

• Regardless of whether your interests are low-level or high-level, the

ecosystem of third-party, open source crates is worth exploring—and contribu‐

ting to. Curated summaries like can help navigate the huge number of possibilities.

• Ror can provide help—and include a searchable index of questions that have been asked

(and answered!) previously.

• If you find yourself relying on an existing library that’s not written in Rust (as per

), you could rewrite it in Rust (RiiR). But don’t

to reproduce a battle-tested, mature codebase.

• As you become more skilled in Rust, Jon Gjengset’s Rust for Rustaceans (No

Starch, 2022) is an essential reference for more advanced aspects of Rust.

Good luck!

265

Index

Symbols

types available in,

! (exclamation mark),

allocation

2018 edition of Rust,

2021 edition of Rust,

allow attribute,

? (question mark) operator, , also-implemen

Android,

applied to Result itera

anon

converting result types to form where ?

any method (on Itera

Any trait,

anyhow cra

ar tool,

Arc type,

arra

abstract syn

access con

preferring from / into conversions over as

Adams, Douglas,

casts,

adapters,

AsM,

AsRef trait,

AddAssign trait,

assert_eq! macro,

AddressSanitizer,

associa

async,

aggrega

algebraic data type,

alignment,

attackers, code exposed to,

all method (on Itera

alloc library, ,

automatic type con

assumption that heap allocation can't fail,

Into trait automatically provided from From

implementa

267

Box::into_ra

Box::new method,

backward compa,

ownership of values on the heap,

crate a

broken_intra_doc_links crate attribute,

Barbarossa, H

BTreeMap type,

bare-metal R

beha

build scripts

benchmarks,

dependency running arbitrary code during

bindgen,

BitAnd trait,

emitting C library linkage informa

build.rs,

BitOr trait,

builder pattern, ,

helper methods for populating fields,

BitXorAssign trait,

issues with

black_box function,

only one item built a

blanket trait implementations, ,

separating out stages of build process,

AsMut and BorrowM

support for building multiple items,

AsRef and Borrow traits,

byteorder cra

for smart poin

Bloch, Josh

making Boolean arguments clear with new‐

accessing C data from R

type,

C-like en

bootloaders,

default target for Rust's interoperability,

borrow checker, ,

for loop,

invoking C functions from R

checks moved from compile time to run‐

invoking R

deliberate error inserted for,

macro argument with side effects,

mutable references, moving value out and

manual allocations, malloc,

replacing it,

old-style pointers,

owner operations and,

pointers and references, comparison with

points to remember,

winning fights against,

preferring bindgen to FFI mappings,

creating self-referential data structures,

using da

printf statemen

using smart poin

== operator,

allocation failures, caught with exception

BorrowM

alloca

Bos, Mara, ,

bounds checking,

bounds method,

code that manually locks and unlocks a

Box type,

Box::from_raw method,

combination of enum with union,

268 | Index

version selection algorithm,

cxx crate for interoperating with R

data races in,

exception safety for templa

(see also Clippy)

exceptions and templa

explicit constraints on templa

cargo fm

for statement, loop declara

cargo metadata,

interoperating with R

iostream, operator<< overload,

cargo tree,

cargo upda

cargo-den

cargo-expand,

one definition rule for C/C++ code accessed

cargo-fuzz,

via R

cargo-semver-checks,

operator overloads, avoiding for unrelated

cargo-server-checks,

types,

cargo-tarpa

pointers and references, comparison with

Cargo.lock file,

Cargo.toml file,

RAII patterns in,

dependencies section,

RAII patterns used for memory manage‐

determining crate's features in,

ment,

links key,

run-time type identification (RTTI),

cargo:rustc-link-lib,

shared_pointer,

cargo:rustc-link-search,

switch arms for enums, warning about

missing,

preferring from / into conversions over as

templa

casts,

Rust code appearing to make implicit casts,

visibility guidelines,

catch_un

c++filt,

not implementing Sync,

C-un

cfg attribute,

cackle maniacally,

cfg(test) a

Cargill, Tom,

cfg_attr attribute,

char * (C string),

allowing multiple versions of a crate in a

automatic dependency selection by semver,

Clippy, ,

links to webpage describing error,

feature activa

warnings about R

restrictions in m

warnings corresponding to Items in this

searching for tools,

supporting different versions of library crate

Clone trait,

linked into single binary,

Copy trait implementa

tests included in published cra

Index | 269

traits that im

check spotting new potentially panicking

where it can't or shouldn't be implemented,

code,

integrating useful tools in

clone-on-write, ,

integra

cloned method,

cloning data in data structures,

testing example code,

closures,

con

in itera

avoiding nonlocal operations in macros,

failures of,

copied method,

with locks held, avoiding invoking,

copies, reducing,

traits represen

copy semantics, ,

code bloa

deciding whether to implemen

code, distinguishing from text,

core librar

collections

no heap allocation performed by,

allowing iteration over contents,

using full type names,

fallible alloca

Cow enum,

preceded by &,

CPAN (Comprehensive Perl Archive Network),

comments, documentation,

crates,

describing how some other code uses the

caution with use of another crate's types in

method,

your API,

focusing on the wh

crate names and feature names sharing

out-of-sync commen

namespace,

external, using instead of macros,

com

multiple versions in dependency graph,

com

no_std compatibility,

Comprehensive Rust online course,

proc-macro cra

concepts in writing R

pub(crate) visibility,

semantic versioning for a

concrete methods,

seman

(see also shared-state parallelism)

tool emitting informa

conditional compila

vulnerability to dependency problems,

cra,

config options, name/value varian

encapsulating unsafe code,

config-specific code,

names for crates published on,

published crate documentation,

const (poin

tests in published cra

third-party, open source crates,

constan

criterion crate,

cross-compilation,

consuming iterators,

cross-reference iden

continuous integration,

cross-references (in documentation),

bindgen step in,

270 | Index

re-exporting dependencies whose types

cxx crate,

appear in your API, -

selection by cargo by semantic versioning,

testing functionality of,

wildcard-imported, pinning to precise ver‐

in Rust,

sion,

data structures,

dependency graph, ,

and alloca

expanded, controlling exposure to with

designing with borrow checker in mind,

optional features,

multiple versions of a cra

lifetimes in,

solving problems with tooling,

made available from alloc,

dependency upgrades,

Rust data structures exposed to C,

deprecated attribute,

self-referential,

Deref trait,

DerefMut trait, ,

using macros to a

derive a,

deadlocks,

, -

reducing chance of in shared-state parallel‐

declaring associated helper a

preferring to procedural macro that emits a

Debug trait,

DeriveInput data structure,

from derive macro,

derive_builder crate,

fmt method,

DebugDraw trait,

declara

development environmen

diagnostic informa

expression with side effects as in

discussion forums,

forma

Displa

hygienic macros,

inserting code at point of invoca

DivAssign trait,

preferring macros with behavior aligned

dividing by zero,

with normal R

default features,

default implementations,

Defa

documentation

combination with struct update syntax,

conventions for documentation comments,

defer statemen

documenting public interfaces,

denial-of-service (DoS) a

additional documentation locations,

Dependabot,

published crate documenta

Rust documentation comment format,

deciding when to take on,

tooling for,

dependencies section of Cargo.toml,

what not to document,

indirect,

requiring,

nonbreaking change causing compilation

for Rust standard librar

failure,

domain-specific language (DSL),

DoubleEndedItera

Index | 271

downcast_mut method,

borrowing rule broken,

Draw trait,