直接存储器访问 (DMA)
本节会围绕DMA传输,讨论要搭建一个内存安全的API的核心需求。
DMA外设被用来以并行于处理器的工作(主程序的执行)的方式来执行存储传输。一个DMA传输或多或少等于启动一个进程(看thread::spawn
)去执行一个memcpy
。我们将用fork-join模型去解释一个内存安全的API的要求。
考虑下面的DMA数据类型:
#![allow(unused)] fn main() { /// A singleton that represents a single DMA channel (channel 1 in this case) /// /// This singleton has exclusive access to the registers of the DMA channel 1 pub struct Dma1Channel1 { // .. } impl Dma1Channel1 { /// Data will be written to this `address` /// /// `inc` indicates whether the address will be incremented after every byte transfer /// /// NOTE this performs a volatile write pub fn set_destination_address(&mut self, address: usize, inc: bool) { // .. } /// Data will be read from this `address` /// /// `inc` indicates whether the address will be incremented after every byte transfer /// /// NOTE this performs a volatile write pub fn set_source_address(&mut self, address: usize, inc: bool) { // .. } /// Number of bytes to transfer /// /// NOTE this performs a volatile write pub fn set_transfer_length(&mut self, len: usize) { // .. } /// Starts the DMA transfer /// /// NOTE this performs a volatile write pub fn start(&mut self) { // .. } /// Stops the DMA transfer /// /// NOTE this performs a volatile write pub fn stop(&mut self) { // .. } /// Returns `true` if there's a transfer in progress /// /// NOTE this performs a volatile read pub fn in_progress() -> bool { // .. } } }
假设Dma1Channel1
被静态地配置成按one-shot的模式(也即不是circular模式)使用串口(又称作UART或者USART) #1,Serial1
。
Serial1
提供下面的阻塞版的API:
#![allow(unused)] fn main() { /// A singleton that represents serial port #1 pub struct Serial1 { // .. } impl Serial1 { /// Reads out a single byte /// /// NOTE: blocks if no byte is available to be read pub fn read(&mut self) -> Result<u8, Error> { // .. } /// Sends out a single byte /// /// NOTE: blocks if the output FIFO buffer is full pub fn write(&mut self, byte: u8) -> Result<(), Error> { // .. } } }
假设我们想要将Serial1
API扩展成可以(a)异步地发送一个缓存区和(b)异步地填充一个缓存区。
一开始我们将使用一个存储不安全的API,然后我们将迭代它直到它完全变成存储安全的API。在每一步,我们都将向你展示如何破开API, 让你意识到当使用异步的存储操作时,有哪些问题需要被解决。
开场
作为开端,让我们尝试使用Write::write_all
API作为参考。为了简便,让我们忽略所有的错误处理。
#![allow(unused)] fn main() { /// A singleton that represents serial port #1 pub struct Serial1 { // NOTE: we extend this struct by adding the DMA channel singleton dma: Dma1Channel1, // .. } impl Serial1 { /// Sends out the given `buffer` /// /// Returns a value that represents the in-progress DMA transfer pub fn write_all<'a>(mut self, buffer: &'a [u8]) -> Transfer<&'a [u8]> { self.dma.set_destination_address(USART1_TX, false); self.dma.set_source_address(buffer.as_ptr() as usize, true); self.dma.set_transfer_length(buffer.len()); self.dma.start(); Transfer { buffer } } } /// A DMA transfer pub struct Transfer<B> { buffer: B, } impl<B> Transfer<B> { /// Returns `true` if the DMA transfer has finished pub fn is_done(&self) -> bool { !Dma1Channel1::in_progress() } /// Blocks until the transfer is done and returns the buffer pub fn wait(self) -> B { // Busy wait until the transfer is done while !self.is_done() {} self.buffer } } }
注意: 不用像上面的API一样,
Transfer
的API也可以暴露一个futures或者generator。 这是一个API设计问题,与整个API的内存安全性关系不大,因此我们在本文中不会深入讨论。
我们也可以实现一个异步版本的Read::read_exact
。
#![allow(unused)] fn main() { impl Serial1 { /// Receives data into the given `buffer` until it's filled /// /// Returns a value that represents the in-progress DMA transfer pub fn read_exact<'a>(&mut self, buffer: &'a mut [u8]) -> Transfer<&'a mut [u8]> { self.dma.set_source_address(USART1_RX, false); self.dma .set_destination_address(buffer.as_mut_ptr() as usize, true); self.dma.set_transfer_length(buffer.len()); self.dma.start(); Transfer { buffer } } } }
这里是write_all
API的用法:
#![allow(unused)] fn main() { fn write(serial: Serial1) { // fire and forget serial.write_all(b"Hello, world!\n"); // do other stuff } }
这是使用read_exact
API的一个例子:
#![allow(unused)] fn main() { fn read(mut serial: Serial1) { let mut buf = [0; 16]; let t = serial.read_exact(&mut buf); // do other stuff t.wait(); match buf.split(|b| *b == b'\n').next() { Some(b"some-command") => { /* do something */ } _ => { /* do something else */ } } } }
mem::forget
mem::forget
是一个安全的API。如果我们的API真的是安全的,那么我们应该能够将两者结合使用而不会出现未定义的行为。然而,情况并非如此;考虑下下面的例子:
#![allow(unused)] fn main() { fn unsound(mut serial: Serial1) { start(&mut serial); bar(); } #[inline(never)] fn start(serial: &mut Serial1) { let mut buf = [0; 16]; // start a DMA transfer and forget the returned `Transfer` value mem::forget(serial.read_exact(&mut buf)); } #[inline(never)] fn bar() { // stack variables let mut x = 0; let mut y = 0; // use `x` and `y` } }
在start
中我们启动了一个DMA传输以填充一个在堆上分配的数组,然后mem::forget
了被返回的Transfer
值。然后我们继续从start
返回并执行函数bar
。
这一系列操作导致了未定义的行为。DMA传输向栈的存储区写入,但是当start
返回时,那块存储区域会被释放,
然后被bar
重新用来分配像是x
和y
这样的变量。在运行时,这可能会导致变量x
和y
随机更改其值。DMA传输
也会覆盖掉被函数bar
的序言推入栈中的状态(比如link寄存器)。
注意如果我们不用mem::forget
,而是mem::drop
,可以让Transfer
的析构函数停止DMA的传输,这样程序就变成了安全的了。但是不能依赖于运行析构函数来加强存储安全性因为mem::forget
和内存泄露(看下RC cycles)在Rust中是安全的。
通过在APIs中把缓存的生命周期从'a
变成'static
来修复这个问题。
#![allow(unused)] fn main() { impl Serial1 { /// Receives data into the given `buffer` until it's filled /// /// Returns a value that represents the in-progress DMA transfer pub fn read_exact(&mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> { // .. same as before .. } /// Sends out the given `buffer` /// /// Returns a value that represents the in-progress DMA transfer pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> { // .. same as before .. } } }
如果我们尝试复现先前的问题,我们注意到mem::forget
不再引起问题了。
#![allow(unused)] fn main() { #[allow(dead_code)] fn sound(mut serial: Serial1, buf: &'static mut [u8; 16]) { // NOTE `buf` is moved into `foo` foo(&mut serial, buf); bar(); } #[inline(never)] fn foo(serial: &mut Serial1, buf: &'static mut [u8]) { // start a DMA transfer and forget the returned `Transfer` value mem::forget(serial.read_exact(buf)); } #[inline(never)] fn bar() { // stack variables let mut x = 0; let mut y = 0; // use `x` and `y` } }
像之前一样,在mem::forget
Transfer
的值之后,DMA传输继续运行着。这次没有问题了,因为buf
是静态分配的(比如static mut
变量),不是在栈上。
重复使用(Overlapping use)
我们的API没有阻止用户在DMA传输过程中再次使用Serial
接口。这可能导致传输失败或者数据丢失。
有许多方法可以禁止重叠使用。一个方法是让Transfer
获取Serial1
的所有权,然后当wait
被调用时将它返回。
#![allow(unused)] fn main() { /// A DMA transfer pub struct Transfer<B> { buffer: B, // NOTE: added serial: Serial1, } impl<B> Transfer<B> { /// Blocks until the transfer is done and returns the buffer // NOTE: the return value has changed pub fn wait(self) -> (B, Serial1) { // Busy wait until the transfer is done while !self.is_done() {} (self.buffer, self.serial) } // .. } impl Serial1 { /// Receives data into the given `buffer` until it's filled /// /// Returns a value that represents the in-progress DMA transfer // NOTE we now take `self` by value pub fn read_exact(mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> { // .. same as before .. Transfer { buffer, // NOTE: added serial: self, } } /// Sends out the given `buffer` /// /// Returns a value that represents the in-progress DMA transfer // NOTE we now take `self` by value pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> { // .. same as before .. Transfer { buffer, // NOTE: added serial: self, } } } }
移动语义静态地阻止了当传输在进行时对Serial1
的访问。
#![allow(unused)] fn main() { fn read(serial: Serial1, buf: &'static mut [u8; 16]) { let t = serial.read_exact(buf); // let byte = serial.read(); //~ ERROR: `serial` has been moved // .. do stuff .. let (serial, buf) = t.wait(); // .. do more stuff .. } }
还有其它方法可以防止重叠使用。比如,可以往Serial1
添加一个(Cell
)标志,其指出是否一个DMA传输正在进行中。
当标志被设置了,read
,write
,read_exact
和write_all
全都会在运行时返回一个错误(比如Error::InUse
)。
当使用write_all
/ read_exact
时,会设置标志,在Transfer.wait
中,标志会被清除。
编译器(误)优化
编译器可以自由地重新排序和合并不是volatile的存储操作以更好地优化一个程序。使用我们现在的API,这种自由度会导致未定义的行为。想一下下面的例子:
#![allow(unused)] fn main() { fn reorder(serial: Serial1, buf: &'static mut [u8]) { // zero the buffer (for no particular reason) buf.iter_mut().for_each(|byte| *byte = 0); let t = serial.read_exact(buf); // ... do other stuff .. let (buf, serial) = t.wait(); buf.reverse(); // .. do stuff with `buf` .. } }
这里编译器可以将buf.reverse()
移到t.wait()
之前,其将导致一个数据竞争问题:处理器和DMA最终都会同时修改buf
。同样地编译器可以将赋零操作放到read_exact
之后,它也会导致一个数据竞争问题。
为了避免这些存在问题的重排序,我们可以使用一个 compiler_fence
#![allow(unused)] fn main() { impl Serial1 { /// Receives data into the given `buffer` until it's filled /// /// Returns a value that represents the in-progress DMA transfer pub fn read_exact(mut self, buffer: &'static mut [u8]) -> Transfer<&'static mut [u8]> { self.dma.set_source_address(USART1_RX, false); self.dma .set_destination_address(buffer.as_mut_ptr() as usize, true); self.dma.set_transfer_length(buffer.len()); // NOTE: added atomic::compiler_fence(Ordering::Release); // NOTE: this is a volatile *write* self.dma.start(); Transfer { buffer, serial: self, } } /// Sends out the given `buffer` /// /// Returns a value that represents the in-progress DMA transfer pub fn write_all(mut self, buffer: &'static [u8]) -> Transfer<&'static [u8]> { self.dma.set_destination_address(USART1_TX, false); self.dma.set_source_address(buffer.as_ptr() as usize, true); self.dma.set_transfer_length(buffer.len()); // NOTE: added atomic::compiler_fence(Ordering::Release); // NOTE: this is a volatile *write* self.dma.start(); Transfer { buffer, serial: self, } } } impl<B> Transfer<B> { /// Blocks until the transfer is done and returns the buffer pub fn wait(self) -> (B, Serial1) { // NOTE: this is a volatile *read* while !self.is_done() {} // NOTE: added atomic::compiler_fence(Ordering::Acquire); (self.buffer, self.serial) } // .. } }
我们在read_exact
和write_all
中使用Ordering::Release
以避免所有的正在进行中的存储操作被移动到self.dma.start()
后面去,其执行了一个volatile写入。
同样地,我们在Transfer.wait
中使用Ordering::Acquire
以避免所有的后续的存储操作被移到self.is_done()
之前,其执行了一个volatile读入。
为了更好地展示fences的影响,稍微修改下上个部分中的例子。我们将fences和它们的orderings添加到注释中。
#![allow(unused)] fn main() { fn reorder(serial: Serial1, buf: &'static mut [u8], x: &mut u32) { // zero the buffer (for no particular reason) buf.iter_mut().for_each(|byte| *byte = 0); *x += 1; let t = serial.read_exact(buf); // compiler_fence(Ordering::Release) ▲ // NOTE: the processor can't access `buf` between the fences // ... do other stuff .. *x += 2; let (buf, serial) = t.wait(); // compiler_fence(Ordering::Acquire) ▼ *x += 3; buf.reverse(); // .. do stuff with `buf` .. } }
由于Release
fence,赋零操作不能被移到read_exact
之后。同样地,由于Acquire
fence,reverse
操作不能被移动wait
之前。
在两个fences之间的存储操作可以在fences间自由地重新排序,但是这些操作都不会涉及到buf
,所以这种重新排序不会导致未定义的行为。
请注意compiler_fence
比要求的强一些。比如,fences将防止在x
上的操作被合并即使我们知道buf
不会与x
重叠(由于Rust的别名规则)。
然而,没有比compiler_fence
更精细的内部函数了。
我们需不需要内存屏障?
这取决于目标平台的架构。在Cortex M0到M4F核心的例子里,AN321说到:
3.2 主要场景
(..)
在Cortex-M处理器中,很少需要用到DMB因为它们不会重新排序存储传输。 然而,如果软件要在其它ARM处理器中复用,那么就需要用到,特别是多主机系统。比如:
- DMA控制器配置。在CPU存储访问和一个DMA操作间需要一个屏障。
(..)
4.18 多主机系统
(..)
把47页图41和图42的例子中的DMB或者DSB指令去除掉不会导致任何错误,因为Cortex-M处理器:
- 不会重新排序存储传输。
- 不会允许两个写传输重叠。
这里图41中展示了在启动DMA传输前使用了一个DMB(存储屏障)指令。
在Cortex-M7内核的例子中,如果你使用了数据缓存(DCache),那么你需要存储屏障(DMB/DSB),除非你手动地无效化被DMA使用的缓存。即使将数据缓存取消掉,可能依然需要内存屏障以避免存储缓存中出现重新排序。
如果你的目标平台是一个多核系统,那么很可能你需要内存屏障。
如果你需要内存屏障,那么你需要使用atomic::fence
而不是compiler_fence
。这在Cortex-M设备上会生成一个DMB指令。
泛化缓存
我们的API太受限了。比如,下面的程序即使是有效的也不会被通过。
#![allow(unused)] fn main() { fn reuse(serial: Serial1, msg: &'static mut [u8]) { // send a message let t1 = serial.write_all(msg); // .. let (msg, serial) = t1.wait(); // `msg` is now `&'static [u8]` msg.reverse(); // now send it in reverse let t2 = serial.write_all(msg); // .. let (buf, serial) = t2.wait(); // .. } }
为了能接受这样的程序,我们可以让缓存参数更泛化点。
#![allow(unused)] fn main() { // as-slice = "0.1.0" use as_slice::{AsMutSlice, AsSlice}; impl Serial1 { /// Receives data into the given `buffer` until it's filled /// /// Returns a value that represents the in-progress DMA transfer pub fn read_exact<B>(mut self, mut buffer: B) -> Transfer<B> where B: AsMutSlice<Element = u8>, { // NOTE: added let slice = buffer.as_mut_slice(); let (ptr, len) = (slice.as_mut_ptr(), slice.len()); self.dma.set_source_address(USART1_RX, false); // NOTE: tweaked self.dma.set_destination_address(ptr as usize, true); self.dma.set_transfer_length(len); atomic::compiler_fence(Ordering::Release); self.dma.start(); Transfer { buffer, serial: self, } } /// Sends out the given `buffer` /// /// Returns a value that represents the in-progress DMA transfer fn write_all<B>(mut self, buffer: B) -> Transfer<B> where B: AsSlice<Element = u8>, { // NOTE: added let slice = buffer.as_slice(); let (ptr, len) = (slice.as_ptr(), slice.len()); self.dma.set_destination_address(USART1_TX, false); // NOTE: tweaked self.dma.set_source_address(ptr as usize, true); self.dma.set_transfer_length(len); atomic::compiler_fence(Ordering::Release); self.dma.start(); Transfer { buffer, serial: self, } } } }
注意: 可以使用
AsRef<[u8]>
(AsMut<[u8]>
) 而不是AsSlice<Element = u8>
(AsMutSlice<Element = u8
).
现在 reuse
程序可以通过了。
不可移动的缓存
这么修改后,API也可以通过值传递接受数组。(比如 [u8; 16]
)。然后,使用数组会导致指针无效化。考虑下面的程序。
#![allow(unused)] fn main() { fn invalidate(serial: Serial1) { let t = start(serial); bar(); let (buf, serial) = t.wait(); } #[inline(never)] fn start(serial: Serial1) -> Transfer<[u8; 16]> { // array allocated in this frame let buffer = [0; 16]; serial.read_exact(buffer) } #[inline(never)] fn bar() { // stack variables let mut x = 0; let mut y = 0; // use `x` and `y` } }
read_exact
操作将使用位于 start
函数的 buffer
的地址。当 start
返回时,局部的 buffer
将会被释放,在 read_exact
中使用的指针将会变得无效化。你最后会遇到与 unsound
案例中一样的情况。
为了避免这个问题,我们要求我们的API使用的缓存即使当它被移动时依然保有它的内存区域。Pin
类型提供这样的保障。首先我们可以更新我们的API以要求所有的缓存都是 "pinned" 的。
注意: 要编译下面的所有程序,你的Rust需要
>=1.33.0
。写这本书的时候 (2019-01-04) 这意味着要使用 nightly 版的Rust
#![allow(unused)] fn main() { /// A DMA transfer pub struct Transfer<B> { // NOTE: changed buffer: Pin<B>, serial: Serial1, } impl Serial1 { /// Receives data into the given `buffer` until it's filled /// /// Returns a value that represents the in-progress DMA transfer pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B> where // NOTE: bounds changed B: DerefMut, B::Target: AsMutSlice<Element = u8> + Unpin, { // .. same as before .. } /// Sends out the given `buffer` /// /// Returns a value that represents the in-progress DMA transfer pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B> where // NOTE: bounds changed B: Deref, B::Target: AsSlice<Element = u8>, { // .. same as before .. } } }
注意: 我们可以使用
StableDeref
特质而不是Pin
newtype but opted forPin
since it's provided in the standard library.
With this new API we can use &'static mut
references, Box
-ed slices, Rc
-ed
slices, etc.
#![allow(unused)] fn main() { fn static_mut(serial: Serial1, buf: &'static mut [u8]) { let buf = Pin::new(buf); let t = serial.read_exact(buf); // .. let (buf, serial) = t.wait(); // .. } fn boxed(serial: Serial1, buf: Box<[u8]>) { let buf = Pin::new(buf); let t = serial.read_exact(buf); // .. let (buf, serial) = t.wait(); // .. } }
'static
bound
Does pinning let us safely use stack allocated arrays? The answer is no. Consider the following example.
#![allow(unused)] fn main() { fn unsound(serial: Serial1) { start(serial); bar(); } // pin-utils = "0.1.0-alpha.4" use pin_utils::pin_mut; #[inline(never)] fn start(serial: Serial1) { let buffer = [0; 16]; // pin the `buffer` to this stack frame // `buffer` now has type `Pin<&mut [u8; 16]>` pin_mut!(buffer); mem::forget(serial.read_exact(buffer)); } #[inline(never)] fn bar() { // stack variables let mut x = 0; let mut y = 0; // use `x` and `y` } }
As seen many times before, the above program runs into undefined behavior due to stack frame corruption.
The API is unsound for buffers of type Pin<&'a mut [u8]>
where 'a
is not
'static
. To prevent the problem we have to add a 'static
bound in some
places.
#![allow(unused)] fn main() { impl Serial1 { /// Receives data into the given `buffer` until it's filled /// /// Returns a value that represents the in-progress DMA transfer pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B> where // NOTE: added 'static bound B: DerefMut + 'static, B::Target: AsMutSlice<Element = u8> + Unpin, { // .. same as before .. } /// Sends out the given `buffer` /// /// Returns a value that represents the in-progress DMA transfer pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B> where // NOTE: added 'static bound B: Deref + 'static, B::Target: AsSlice<Element = u8>, { // .. same as before .. } } }
现在有问题的程序将被拒绝。
析构函数
Now that the API accepts Box
-es and other types that have destructors we need
to decide what to do when Transfer
is early-dropped.
Normally, Transfer
values are consumed using the wait
method but it's also
possible to, implicitly or explicitly, drop
the value before the transfer is
over. For example, dropping a Transfer<Box<[u8]>>
value will cause the buffer
to be deallocated. This can result in undefined behavior if the transfer is
still in progress as the DMA would end up writing to deallocated memory.
In such scenario one option is to make Transfer.drop
stop the DMA transfer.
The other option is to make Transfer.drop
wait for the transfer to finish.
We'll pick the former option as it's cheaper.
#![allow(unused)] fn main() { /// A DMA transfer pub struct Transfer<B> { // NOTE: always `Some` variant inner: Option<Inner<B>>, } // NOTE: previously named `Transfer<B>` struct Inner<B> { buffer: Pin<B>, serial: Serial1, } impl<B> Transfer<B> { /// Blocks until the transfer is done and returns the buffer pub fn wait(mut self) -> (Pin<B>, Serial1) { while !self.is_done() {} atomic::compiler_fence(Ordering::Acquire); let inner = self .inner .take() .unwrap_or_else(|| unsafe { hint::unreachable_unchecked() }); (inner.buffer, inner.serial) } } impl<B> Drop for Transfer<B> { fn drop(&mut self) { if let Some(inner) = self.inner.as_mut() { // NOTE: this is a volatile write inner.serial.dma.stop(); // we need a read here to make the Acquire fence effective // we do *not* need this if `dma.stop` does a RMW operation unsafe { ptr::read_volatile(&0); } // we need a fence here for the same reason we need one in `Transfer.wait` atomic::compiler_fence(Ordering::Acquire); } } } impl Serial1 { /// Receives data into the given `buffer` until it's filled /// /// Returns a value that represents the in-progress DMA transfer pub fn read_exact<B>(mut self, mut buffer: Pin<B>) -> Transfer<B> where B: DerefMut + 'static, B::Target: AsMutSlice<Element = u8> + Unpin, { // .. same as before .. Transfer { inner: Some(Inner { buffer, serial: self, }), } } /// Sends out the given `buffer` /// /// Returns a value that represents the in-progress DMA transfer pub fn write_all<B>(mut self, buffer: Pin<B>) -> Transfer<B> where B: Deref + 'static, B::Target: AsSlice<Element = u8>, { // .. same as before .. Transfer { inner: Some(Inner { buffer, serial: self, }), } } } }
Now the DMA transfer will be stopped before the buffer is deallocated.
#![allow(unused)] fn main() { fn reuse(serial: Serial1) { let buf = Pin::new(Box::new([0; 16])); let t = serial.read_exact(buf); // compiler_fence(Ordering::Release) ▲ // .. // this stops the DMA transfer and frees memory mem::drop(t); // compiler_fence(Ordering::Acquire) ▼ // this likely reuses the previous memory allocation let mut buf = Box::new([0; 16]); // .. do stuff with `buf` .. } }
总结
To sum it up, we need to consider all the following points to achieve memory safe DMA transfers:
-
Use immovable buffers plus indirection:
Pin<B>
. Alternatively, you can use theStableDeref
trait. -
The ownership of the buffer must be passed to the DMA :
B: 'static
. -
Do not rely on destructors running for memory safety. Consider what happens if
mem::forget
is used with your API. -
Do add a custom destructor that stops the DMA transfer, or waits for it to finish. Consider what happens if
mem::drop
is used with your API.
This text leaves out up several details required to build a production grade
DMA abstraction, like configuring the DMA channels (e.g. streams, circular vs
one-shot mode, etc.), alignment of buffers, error handling, how to make the
abstraction device-agnostic, etc. All those aspects are left as an exercise for
the reader / community (:P
).
Overlapping use