直接存储器访问 (DMA)

本节会围绕DMA传输，讨论要搭建一个内存安全的API的核心需求。

DMA外设被用来以并行于处理器的工作(主程序的执行)的方式来执行存储传输。一个DMA传输或多或少等于启动一个进程(看thread::spawn)去执行一个memcpy 。我们将用fork-join模型去解释一个内存安全的API的要求。

考虑下面的DMA数据类型：

#![allow(unused)]
fn main() {
/// A singleton that represents a single DMA channel (channel 1 in this case)
///
/// This singleton has exclusive access to the registers of the DMA channel 1
pub struct Dma1Channel1 {
    // ..
}

impl Dma1Channel1 {
    /// Data will be written to this `address`
    ///
    /// `inc` indicates whether the address will be incremented after every byte transfer
    ///
    /// NOTE this performs a volatile write
    pub fn set_destination_address(&mut self, address: usize, inc: bool) {
        // ..
    }

    /// Data will be read from this `address`
    ///
    /// `inc` indicates whether the address will be incremented after every byte transfer
    ///
    /// NOTE this performs a volatile write
    pub fn set_source_address(&mut self, address: usize, inc: bool) {
        // ..
    }

    /// Number of bytes to transfer
    ///
    /// NOTE this performs a volatile write
    pub fn set_transfer_length(&mut self, len: usize) {
        // ..
    }

    /// Starts the DMA transfer
    ///
    /// NOTE this performs a volatile write
    pub fn start(&mut self) {
        // ..
    }

    /// Stops the DMA transfer
    ///
    /// NOTE this performs a volatile write
    pub fn stop(&mut self) {
        // ..
    }

    /// Returns `true` if there's a transfer in progress
    ///
    /// NOTE this performs a volatile read
    pub fn in_progress() -> bool {
        // ..
    }
}
}

假设Dma1Channel1被静态地配置成按one-shot的模式(也即不是circular模式)使用串口(又称作UART或者USART) #1，Serial1。 Serial1提供下面的阻塞版的API：

#![allow(unused)]
fn main() {
/// A singleton that represents serial port #1
pub struct Serial1 {
    // ..
}

impl Serial1 {
    /// Reads out a single byte
    ///
    /// NOTE: blocks if no byte is available to be read
    pub fn read(&mut self) -> Result<u8, Error> {
        // ..
    }

    /// Sends out a single byte
    ///
    /// NOTE: blocks if the output FIFO buffer is full
    pub fn write(&mut self, byte: u8) -> Result<(), Error> {
        // ..
    }
}
}

假设我们想要将Serial1 API扩展成可以(a)异步地发送一个缓存区和(b)异步地填充一个缓存区。

一开始我们将使用一个存储不安全的API，然后我们将迭代它直到它完全变成存储安全的API。在每一步，我们都将向你展示如何破开API，让你意识到当使用异步的存储操作时，有哪些问题需要被解决。

开场

作为开端，让我们尝试使用Write::write_all API作为参考。为了简便，让我们忽略所有的错误处理。

#![allow(unused)]
fn main() {
/// A singleton that represents serial port #1
pub struct Serial1 {
    // NOTE: we extend this struct by adding the DMA channel singleton
    dma: Dma1Channel1,
    // ..
}

impl Serial1 {
    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<'a>(mut self, buffer: &'a [u8]) -> Transfer<&'a [u8]> {
        self.dma.set_destination_address(USART1_TX, false);
        self.dma.set_source_address(buffer.as_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        self.dma.start();

        Transfer { buffer }
    }
}

/// A DMA transfer
pub struct Transfer<B> {
    buffer: B,
}

impl<B> Transfer<B> {
    /// Returns `true` if the DMA transfer has finished
    pub fn is_done(&self) -> bool {
        !Dma1Channel1::in_progress()
    }

    /// Blocks until the transfer is done and returns the buffer
    pub fn wait(self) -> B {
        // Busy wait until the transfer is done
        while !self.is_done() {}

        self.buffer
    }
}
}

注意： 不用像上面的API一样，Transfer的API也可以暴露一个futures或者generator。这是一个API设计问题，与整个API的内存安全性关系不大，因此我们在本文中不会深入讨论。

我们也可以实现一个异步版本的Read::read_exact 。

#![allow(unused)]
fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<'a>(&mut self, buffer: &'a mut [u8]) -> Transfer<&'a mut [u8]> {
        self.dma.set_source_address(USART1_RX, false);
        self.dma
            .set_destination_address(buffer.as_mut_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        self.dma.start();

        Transfer { buffer }
    }
}
}

这里是write_all API的用法：

#![allow(unused)]
fn main() {
fn write(serial: Serial1) {
    // fire and forget
    serial.write_all(b"Hello, world!\n");

    // do other stuff
}
}

这是使用read_exact API的一个例子：

#![allow(unused)]
fn main() {
fn read(mut serial: Serial1) {
    let mut buf = [0; 16];
    let t = serial.read_exact(&mut buf);

    // do other stuff

    t.wait();

    match buf.split(|b| *b == b'\n').next() {
        Some(b"some-command") => { /* do something */ }
        _ => { /* do something else */ }
    }
}
}

`mem::forget`

mem::forget是一个安全的API。如果我们的API真的是安全的，那么我们应该能够将两者结合使用而不会出现未定义的行为。然而，情况并非如此；考虑下下面的例子：

#![allow(unused)]
fn main() {
fn unsound(mut serial: Serial1) {
    start(&mut serial);
    bar();
}

#[inline(never)]
fn start(serial: &mut Serial1) {
    let mut buf = [0; 16];

    // start a DMA transfer and forget the returned `Transfer` value
    mem::forget(serial.read_exact(&mut buf));
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
}

在start中我们启动了一个DMA传输以填充一个在堆上分配的数组，然后mem::forget了被返回的Transfer值。然后我们继续从start返回并执行函数bar 。

这一系列操作导致了未定义的行为。DMA传输向栈的存储区写入，但是当start返回时，那块存储区域会被释放，然后被bar重新用来分配像是x和y这样的变量。在运行时，这可能会导致变量x和y随机更改其值。DMA传输也会覆盖掉被函数bar的序言推入栈中的状态(比如link寄存器)。

注意如果我们不用mem::forget，而是mem::drop，可以让Transfer的析构函数停止DMA的传输，这样程序就变成了安全的了。但是不能依赖于运行析构函数来加强存储安全性因为mem::forget和内存泄露(看下RC cycles)在Rust中是安全的。

通过在APIs中把缓存的生命周期从'a变成'static来修复这个问题。

#![allow(unused)]
fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact(&mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> {
        // .. same as before ..
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> {
        // .. same as before ..
    }
}
}

如果我们尝试复现先前的问题，我们注意到mem::forget不再引起问题了。

#![allow(unused)]
fn main() {
#[allow(dead_code)]
fn sound(mut serial: Serial1, buf: &'static mut [u8; 16]) {
    // NOTE `buf` is moved into `foo`
    foo(&mut serial, buf);
    bar();
}

#[inline(never)]
fn foo(serial: &mut Serial1, buf: &'static mut [u8]) {
    // start a DMA transfer and forget the returned `Transfer` value
    mem::forget(serial.read_exact(buf));
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
}

像之前一样，在mem::forget Transfer的值之后，DMA传输继续运行着。这次没有问题了，因为buf是静态分配的(比如static mut变量)，不是在栈上。

重复使用(Overlapping use)

我们的API没有阻止用户在DMA传输过程中再次使用Serial接口。这可能导致传输失败或者数据丢失。

有许多方法可以禁止重叠使用。一个方法是让Transfer获取Serial1的所有权，然后当wait被调用时将它返回。

#![allow(unused)]
fn main() {
/// A DMA transfer
pub struct Transfer<B> {
    buffer: B,
    // NOTE: added
    serial: Serial1,
}

impl<B> Transfer<B> {
    /// Blocks until the transfer is done and returns the buffer
    // NOTE: the return value has changed
    pub fn wait(self) -> (B, Serial1) {
        // Busy wait until the transfer is done
        while !self.is_done() {}

        (self.buffer, self.serial)
    }

    // ..
}

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    // NOTE we now take `self` by value
    pub fn read_exact(mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> {
        // .. same as before ..

        Transfer {
            buffer,
            // NOTE: added
            serial: self,
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    // NOTE we now take `self` by value
    pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> {
        // .. same as before ..

        Transfer {
            buffer,
            // NOTE: added
            serial: self,
        }
    }
}
}

移动语义静态地阻止了当传输在进行时对Serial1的访问。

#![allow(unused)]
fn main() {
fn read(serial: Serial1, buf: &'static mut [u8; 16]) {
    let t = serial.read_exact(buf);

    // let byte = serial.read(); //~ ERROR: `serial` has been moved

    // .. do stuff ..

    let (serial, buf) = t.wait();

    // .. do more stuff ..
}
}

还有其它方法可以防止重叠使用。比如，可以往Serial1添加一个(Cell)标志，其指出是否一个DMA传输正在进行中。当标志被设置了，read，write，read_exact和write_all全都会在运行时返回一个错误(比如Error::InUse)。当使用write_all / read_exact时，会设置标志，在Transfer.wait中，标志会被清除。

编译器(误)优化

编译器可以自由地重新排序和合并不是volatile的存储操作以更好地优化一个程序。使用我们现在的API，这种自由度会导致未定义的行为。想一下下面的例子：

#![allow(unused)]
fn main() {
fn reorder(serial: Serial1, buf: &'static mut [u8]) {
    // zero the buffer (for no particular reason)
    buf.iter_mut().for_each(|byte| *byte = 0);

    let t = serial.read_exact(buf);

    // ... do other stuff ..

    let (buf, serial) = t.wait();

    buf.reverse();

    // .. do stuff with `buf` ..
}
}

这里编译器可以将buf.reverse()移到t.wait()之前，其将导致一个数据竞争问题：处理器和DMA最终都会同时修改buf 。同样地编译器可以将赋零操作放到read_exact之后，它也会导致一个数据竞争问题。

为了避免这些存在问题的重排序，我们可以使用一个 compiler_fence

#![allow(unused)]
fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact(mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> {
        self.dma.set_source_address(USART1_RX, false);
        self.dma
            .set_destination_address(buffer.as_mut_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        // NOTE: added
        atomic::compiler_fence(Ordering::Release);

        // NOTE: this is a volatile *write*
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> {
        self.dma.set_destination_address(USART1_TX, false);
        self.dma.set_source_address(buffer.as_ptr() as usize, true);
        self.dma.set_transfer_length(buffer.len());

        // NOTE: added
        atomic::compiler_fence(Ordering::Release);

        // NOTE: this is a volatile *write*
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }
}

impl<B> Transfer<B> {
    /// Blocks until the transfer is done and returns the buffer
    pub fn wait(self) -> (B, Serial1) {
        // NOTE: this is a volatile *read*
        while !self.is_done() {}

        // NOTE: added
        atomic::compiler_fence(Ordering::Acquire);

        (self.buffer, self.serial)
    }

    // ..
}
}

我们在read_exact和write_all中使用Ordering::Release以避免所有的正在进行中的存储操作被移动到self.dma.start()后面去，其执行了一个volatile写入。

同样地，我们在Transfer.wait中使用Ordering::Acquire以避免所有的后续的存储操作被移到self.is_done()之前，其执行了一个volatile读入。

为了更好地展示fences的影响，稍微修改下上个部分中的例子。我们将fences和它们的orderings添加到注释中。

#![allow(unused)]
fn main() {
fn reorder(serial: Serial1, buf: &'static mut [u8], x: &mut u32) {
    // zero the buffer (for no particular reason)
    buf.iter_mut().for_each(|byte| *byte = 0);

    *x += 1;

    let t = serial.read_exact(buf); // compiler_fence(Ordering::Release) ▲

    // NOTE: the processor can't access `buf` between the fences
    // ... do other stuff ..
    *x += 2;

    let (buf, serial) = t.wait(); // compiler_fence(Ordering::Acquire) ▼

    *x += 3;

    buf.reverse();

    // .. do stuff with `buf` ..
}
}

由于Release fence，赋零操作不能被移到read_exact之后。同样地，由于Acquire fence，reverse操作不能被移动wait之前。在两个fences之间的存储操作可以在fences间自由地重新排序，但是这些操作都不会涉及到buf，所以这种重新排序不会导致未定义的行为。

请注意compiler_fence比要求的强一些。比如，fences将防止在x上的操作被合并即使我们知道buf不会与x重叠(由于Rust的别名规则)。然而，没有比compiler_fence更精细的内部函数了。

我们需不需要内存屏障？

这取决于目标平台的架构。在Cortex M0到M4F核心的例子里，AN321说到：

3.2 主要场景

(..)

在Cortex-M处理器中，很少需要用到DMB因为它们不会重新排序存储传输。然而，如果软件要在其它ARM处理器中复用，那么就需要用到，特别是多主机系统。比如：

DMA控制器配置。在CPU存储访问和一个DMA操作间需要一个屏障。

(..)

4.18 多主机系统

(..)

把47页图41和图42的例子中的DMB或者DSB指令去除掉不会导致任何错误，因为Cortex-M处理器：

不会重新排序存储传输。

不会允许两个写传输重叠。

这里图41中展示了在启动DMA传输前使用了一个DMB（存储屏障）指令。

在Cortex-M7内核的例子中，如果你使用了数据缓存（DCache），那么你需要存储屏障（DMB/DSB），除非你手动地无效化被DMA使用的缓存。即使将数据缓存取消掉，可能依然需要内存屏障以避免存储缓存中出现重新排序。

如果你的目标平台是一个多核系统，那么很可能你需要内存屏障。

如果你需要内存屏障，那么你需要使用atomic::fence而不是compiler_fence。这在Cortex-M设备上会生成一个DMB指令。

泛化缓存

我们的API太受限了。比如，下面的程序即使是有效的也不会被通过。

#![allow(unused)]
fn main() {
fn reuse(serial: Serial1, msg: &'static mut [u8]) {
    // send a message
    let t1 = serial.write_all(msg);

    // ..

    let (msg, serial) = t1.wait(); // `msg` is now `&'static [u8]`

    msg.reverse();

    // now send it in reverse
    let t2 = serial.write_all(msg);

    // ..

    let (buf, serial) = t2.wait();

    // ..
}
}

为了能接受这样的程序，我们可以让缓存参数更泛化点。

#![allow(unused)]
fn main() {
// as-slice = "0.1.0"
use as_slice::{AsMutSlice, AsSlice};

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: B) -> Transfer<B>
    where
        B: AsMutSlice<Element = u8>,
    {
        // NOTE: added
        let slice = buffer.as_mut_slice();
        let (ptr, len) = (slice.as_mut_ptr(), slice.len());

        self.dma.set_source_address(USART1_RX, false);

        // NOTE: tweaked
        self.dma.set_destination_address(ptr as usize, true);
        self.dma.set_transfer_length(len);

        atomic::compiler_fence(Ordering::Release);
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    fn write_all<B>(mut self, buffer: B) -> Transfer<B>
    where
        B: AsSlice<Element = u8>,
    {
        // NOTE: added
        let slice = buffer.as_slice();
        let (ptr, len) = (slice.as_ptr(), slice.len());

        self.dma.set_destination_address(USART1_TX, false);

        // NOTE: tweaked
        self.dma.set_source_address(ptr as usize, true);
        self.dma.set_transfer_length(len);

        atomic::compiler_fence(Ordering::Release);
        self.dma.start();

        Transfer {
            buffer,
            serial: self,
        }
    }
}

}

注意: 可以使用 AsRef<[u8]> (AsMut<[u8]>) 而不是 AsSlice<Element = u8> (AsMutSlice<Element = u8).

现在 reuse 程序可以通过了。

不可移动的缓存

这么修改后，API也可以通过值传递接受数组。（比如 [u8; 16]）。然后，使用数组会导致指针无效化。考虑下面的程序。

#![allow(unused)]
fn main() {
fn invalidate(serial: Serial1) {
    let t = start(serial);

    bar();

    let (buf, serial) = t.wait();
}

#[inline(never)]
fn start(serial: Serial1) -> Transfer<[u8; 16]> {
    // array allocated in this frame
    let buffer = [0; 16];

    serial.read_exact(buffer)
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
}

read_exact 操作将使用位于 start 函数的 buffer 的地址。当 start 返回时，局部的 buffer 将会被释放，在 read_exact 中使用的指针将会变得无效化。你最后会遇到与 unsound 案例中一样的情况。

为了避免这个问题，我们要求我们的API使用的缓存即使当它被移动时依然保有它的内存区域。Pin 类型提供这样的保障。首先我们可以更新我们的API以要求所有的缓存都是 "pinned" 的。

注意： 要编译下面的所有程序，你的Rust需要 >=1.33.0。写这本书的时候 (2019-01-04) 这意味着要使用 nightly 版的Rust

#![allow(unused)]
fn main() {
/// A DMA transfer
pub struct Transfer<B> {
    // NOTE: changed
    buffer: Pin<B>,
    serial: Serial1,
}

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: bounds changed
        B: DerefMut,
        B::Target: AsMutSlice<Element = u8> + Unpin,
    {
        // .. same as before ..
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: bounds changed
        B: Deref,
        B::Target: AsSlice<Element = u8>,
    {
        // .. same as before ..
    }
}
}

注意： 我们可以使用 StableDeref 特质而不是 Pin newtype but opted for Pin since it's provided in the standard library.

With this new API we can use &'static mut references, Box-ed slices, Rc-ed slices, etc.

#![allow(unused)]
fn main() {
fn static_mut(serial: Serial1, buf: &'static mut [u8]) {
    let buf = Pin::new(buf);

    let t = serial.read_exact(buf);

    // ..

    let (buf, serial) = t.wait();

    // ..
}

fn boxed(serial: Serial1, buf: Box<[u8]>) {
    let buf = Pin::new(buf);

    let t = serial.read_exact(buf);

    // ..

    let (buf, serial) = t.wait();

    // ..
}
}

`'static` bound

Does pinning let us safely use stack allocated arrays? The answer is no. Consider the following example.

#![allow(unused)]
fn main() {
fn unsound(serial: Serial1) {
    start(serial);

    bar();
}

// pin-utils = "0.1.0-alpha.4"
use pin_utils::pin_mut;

#[inline(never)]
fn start(serial: Serial1) {
    let buffer = [0; 16];

    // pin the `buffer` to this stack frame
    // `buffer` now has type `Pin<&mut [u8; 16]>`
    pin_mut!(buffer);

    mem::forget(serial.read_exact(buffer));
}

#[inline(never)]
fn bar() {
    // stack variables
    let mut x = 0;
    let mut y = 0;

    // use `x` and `y`
}
}

As seen many times before, the above program runs into undefined behavior due to stack frame corruption.

The API is unsound for buffers of type Pin<&'a mut [u8]> where 'a is not 'static. To prevent the problem we have to add a 'static bound in some places.

#![allow(unused)]
fn main() {
impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: added 'static bound
        B: DerefMut + 'static,
        B::Target: AsMutSlice<Element = u8> + Unpin,
    {
        // .. same as before ..
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B>
    where
        // NOTE: added 'static bound
        B: Deref + 'static,
        B::Target: AsSlice<Element = u8>,
    {
        // .. same as before ..
    }
}
}

现在有问题的程序将被拒绝。

析构函数

Now that the API accepts Box-es and other types that have destructors we need to decide what to do when Transfer is early-dropped.

Normally, Transfer values are consumed using the wait method but it's also possible to, implicitly or explicitly, drop the value before the transfer is over. For example, dropping a Transfer<Box<[u8]>> value will cause the buffer to be deallocated. This can result in undefined behavior if the transfer is still in progress as the DMA would end up writing to deallocated memory.

In such scenario one option is to make Transfer.drop stop the DMA transfer. The other option is to make Transfer.drop wait for the transfer to finish. We'll pick the former option as it's cheaper.

#![allow(unused)]
fn main() {
/// A DMA transfer
pub struct Transfer<B> {
    // NOTE: always `Some` variant
    inner: Option<Inner<B>>,
}

// NOTE: previously named `Transfer<B>`
struct Inner<B> {
    buffer: Pin<B>,
    serial: Serial1,
}

impl<B> Transfer<B> {
    /// Blocks until the transfer is done and returns the buffer
    pub fn wait(mut self) -> (Pin<B>, Serial1) {
        while !self.is_done() {}

        atomic::compiler_fence(Ordering::Acquire);

        let inner = self
            .inner
            .take()
            .unwrap_or_else(|| unsafe { hint::unreachable_unchecked() });
        (inner.buffer, inner.serial)
    }
}

impl<B> Drop for Transfer<B> {
    fn drop(&mut self) {
        if let Some(inner) = self.inner.as_mut() {
            // NOTE: this is a volatile write
            inner.serial.dma.stop();

            // we need a read here to make the Acquire fence effective
            // we do *not* need this if `dma.stop` does a RMW operation
            unsafe {
                ptr::read_volatile(&0);
            }

            // we need a fence here for the same reason we need one in `Transfer.wait`
            atomic::compiler_fence(Ordering::Acquire);
        }
    }
}

impl Serial1 {
    /// Receives data into the given `buffer` until it's filled
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B>
    where
        B: DerefMut + 'static,
        B::Target: AsMutSlice<Element = u8> + Unpin,
    {
        // .. same as before ..

        Transfer {
            inner: Some(Inner {
                buffer,
                serial: self,
            }),
        }
    }

    /// Sends out the given `buffer`
    ///
    /// Returns a value that represents the in-progress DMA transfer
    pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B>
    where
        B: Deref + 'static,
        B::Target: AsSlice<Element = u8>,
    {
        // .. same as before ..

        Transfer {
            inner: Some(Inner {
                buffer,
                serial: self,
            }),
        }
    }
}
}

Now the DMA transfer will be stopped before the buffer is deallocated.

#![allow(unused)]
fn main() {
fn reuse(serial: Serial1) {
    let buf = Pin::new(Box::new([0; 16]));

    let t = serial.read_exact(buf); // compiler_fence(Ordering::Release) ▲

    // ..

    // this stops the DMA transfer and frees memory
    mem::drop(t); // compiler_fence(Ordering::Acquire) ▼

    // this likely reuses the previous memory allocation
    let mut buf = Box::new([0; 16]);

    // .. do stuff with `buf` ..
}
}

总结

To sum it up, we need to consider all the following points to achieve memory safe DMA transfers:

Use immovable buffers plus indirection: Pin<B>. Alternatively, you can use the StableDeref trait.
The ownership of the buffer must be passed to the DMA : B: 'static.
Do not rely on destructors running for memory safety. Consider what happens if mem::forget is used with your API.
Do add a custom destructor that stops the DMA transfer, or waits for it to finish. Consider what happens if mem::drop is used with your API.

This text leaves out up several details required to build a production grade DMA abstraction, like configuring the DMA channels (e.g. streams, circular vs one-shot mode, etc.), alignment of buffers, error handling, how to make the abstraction device-agnostic, etc. All those aspects are left as an exercise for the reader / community (:P). Overlapping use

The Embedonomicon