Rust vs. C: Performance and Security in Low-Level Programming (2023)

According to the 2020 Stack Overflow Developer SurveyRust is the most popular programming language. He won the title for the fifth consecutive year and the good news doesn't stop there. Also in 2020, Linux kernel developers proposed including Rust in the Linux kernel, which was originally written in C. Facebook recently partnered with the Rust Foundation, an organization that drives the development of the Rust language with the intention of contributing to the generalization of .

Given all of this, we decided to check if Rust can replace C inLow-level network programmingto ensure greater security without sacrificing high performance. For usproof of concept, we chose the DPDK library because it is used for writing user-space applications for packet processing where performance is a critical factor.

Rust as a system programming language

Rust was created to providehigh performance, comparable to C++and C, with a strong emphasis on code security. C compilers don't really care about security. This means that programmers must be careful not to write a program that causes memory violations or data races.

In Rust, most of these issues are caught during the build process. You can write code in two different modes: Rust-Safe, which imposes additional restrictions on the programmer (e.g. object property management) but ensures that the code works correctly. The other mode is insecure Rust, which gives the programmer more autonomy (e.g. it can work with raw C-type pointers), but the code can break. For these reasons, Rust is an excellent choice for programming systems that require high performance and security. but you can tooCompare the performance of Rust and Python

If you stand in front of one tooDilemma between using Go or Rust, we have a great comparison article.

Rust vs. C: Performance and Security in Low-Level Programming (1)

C libraries in Rust

Many projects related to low-level systems, such as operating systems, game engines, and network applications, are written in C or C++. This is mainly because there has never been a real alternative that guarantees high performance and easy access to storage and operating system features.

Today Rust is considered an alternative, even ifrewrite entire projects in Rustwould blow most budgets. Luckily we don't have to. Rust supports calling C functions from Rust code with no additional performance overhead.

Rust vs. C: Performance and Security in Low-Level Programming (2)

Suppose we have a simple library written in C:

#contain <stdio.h>#contain <stdlib.h>Structure Build { int32_tA;};Structure Build *init_struct(int32_tA) { Structure Build *S= malloc(size of(*S)); e (S!= NULL)S->A=A; hand backS;}file free_structure(Structure Build *S) { Buch(S);}

We can create bindings to these functions and structures and use them like this:

#[repr(C)]Structure Build {A: i32,}extern "C" { fn init_struct(A: i32) -> *Period Build; fn free_structure(S: *Period Build);}fn Director() { unsure { leaveS= init_struct(5); e !S.is zero() { press!("s->a = {}", (*S).A); } free_structure(S); }}

These links are easy to write. There are even special tools (the remaining binding) that you can generate automatically. Unfortunately, they are very crude and requireunsureblocks to use, so we can't take full advantage of Rust's features. In this case, the programmer must check whether the assignment was successful (Sis not null) before it is dereferenced and an array is accessed. In theory, we could ignore the factinit_structmay return and read nullptr in some casesonewithout this test. This would work most of the time, but sometimes it would fail at runtime, which is why it's not stainless. We must also think about freeing up memory or risking a memory leak. However, these bindings are a good first step towards a more decent API that enforces correct use of this library. The code above can be grouped as follows:

Structure remainderstructure {cstr: *Period Build,}imply remainderstructure { fn Novo(A: i32) -> Result<Be, ()> { // Use raw links to create a structure leavecstr= unsure {init_struct(A)}; // returns Err if failed or Ok if successful ecstr.is zero() { hand back err(()); } Accordingly(remainderstructure {cstr}) } fn hello dir eins(&be) -> i32 { // At this point we know that self.cstr is not null// So dereferencing is fine unsure {(*be.cstr).A} }}imply Tear off for remainderstructure { fn tear off(&Period be) { // Freeing the structure will not fail unsure {free_structure(be.cstr)}; }}// You don't have to remember to free the structure; released when out of rangefn Director() -> Result<(), ()> { // We need to handle the case where RustStruct::new fails and// returned Err so we can't use mismatched s leaveS= remainderstructure::Novo(5)?; press!("s->a = {}",S.hello dir eins()); Accordingly(())}

unsureBlocks are moved into the library code, that is, inDirector()We can use the secure Rust API.

DPDK in oxide

allow packet processing in userspace,DPDKis a library used for programming high-performance network applications. DPDK is written in C, so using it with Rust without a properly prepared API is inconvenient and unsafe.

We're not the first to try to link to the DPDK in Rust. We decided to base our API on another project:ANLAB-KAIST/rust-dpdk. This project uses bindgen when compiling code to generate bindings for the specified DPDK version.

This makes it easy to update the API to the latest DPDK version. Also, much of the high-level API was already well-written, so there was no need to write it from scratch. Finally, we just added some features to this library and fixed some issues. Our final version of this API is here:codylime/rust-dpdk.

(Video) coding in c until my program is unsafe

The interface for communicating with the DPDK has been designed in such a way that the programmer does not have to remember non-obvious dependencies that can often lead to errors in DPDK applications. Here are some API examples with corresponding C code.

EAL initialization

{ E Twithdrawn= rte_eal_init(Argc,argv);Argc-=withdrawn;argv+=withdrawn; // Parse application-specific arguments // ... rte_eal_cleanup();}
{ // args - The command line argument leaveGans= Eal::Novo(&Periodargument)?; // Parse eal-specific arguments // Parse application-specific arguments // ...} // eal.drop() calls rte_eal_cleanup() and others

Most DPDK functions can only be called after EAL initialization. In Rust this was solved by instantiating aEalStructure. This structure provides more functionality in your methods. In addition, thanks to the participation of the EAL in the structure, it is not necessary to carry out a cleaning at the end of the program. It will be called automatically when theGansstructure falls.

Initializing the Ethdev and RX/TX queues

{ RTE_ETH_FOREACH_DEV(ported) { // Configure device and create queue 1 RX and 1 TX rte_eth_dev_configure(ported, 1, 1, &puerto_config); // configure queues rte_eth_rx_queue_setup(ported, 0,nb_rxd, rte_eth_dev_socket_id(ported), &rxq_config,pktmbuf_pool); rte_eth_tx_queue_setup(ported, 0,nb_txd, rte_eth_dev_socket_id(ported), &txq_config); rte_eth_dev_start(ported); rte_eth_promiscuous_enable(ported); } // ... // deinitialization RTE_ETH_FOREACH_DEV(ported) { rte_eth_dev_stop(ported); rte_eth_dev_cerrar(ported); }}
{ leaveuninit_ports=Gans.doors()?; leavedoors_with_queues=uninit_ports.a_iter().Map(|uninit_port| { // Configure the device, create and configure 1 RX and 1 TX queue leave (Porto, (rxqs,gracias)) =uninit_port.darin(1, 1, none);Porto.begin().develop();Porto.set_promiscuous(TRUE); (Porto, (rxqs,gracias)) }).bring together::<vec<_>>(); // ...} // port.drop() for rte_eth_dev_stop() and rte_eth_dev_close()

In DPDK applications, ethdev is normally initialized once at program startup. Multiple queues can be passed to the configuration at startup. in the rust,real.ports()returns a list of uninitialized ports that can be initialized separately, producing a structure corresponding to the initialized port and lists of RX and TX queues. vocationreal.ports()a second time causes a runtime error, preventing a single device from booting multiple times.

Our Rust API simplifies initialization: most DPDK function calls are hidden in thepuerto_uninit.init()Implementation. Of course, this means we don't have as much power when configuring devices and queues in Rust, but we get a simpler implementation.

RX/TX-Leime

{ Structure rte_mbuf *pkts_burst[MAX_PKT_BURST]; uint16_tnb_recv= rte_eth_rx_burst(ported,Identity card and passport,pkts_burst,MAX_PKT_BURST); // ... uint16_tnb_sent= rte_eth_tx_burst(ported,Identity card and passport,pkts_burst,MAX_PKT_BURST);}
{ leave PeriodPoints= ArrayVec::<Package<TestPriv>, MAX_PKT_BURST>::Novo(); leavenb_recv=rx_queue.revenue(&PeriodPoints); // ... leavenb_sent=tx_queue.tx(&PeriodPoints);}

Each queue is associated with an ethdev, so it would be easier to work with queues than devices when sending and receiving packets. In Rust we have specific structures for RX and TX queues to make them easier to work with.

Also, the RX and TX queues in DPDK are not thread-safe, so the frameworks are designed in such a way that any attempt to use them in multiple threads will result in a compile-time error.

Change in package content

{ Structure rte_ether_hdr *ethics= rte_pktmbuf_mtod(Package, Structure rte_ether_hdr *); Structure rte_ether_addrCoup= (Structure rte_ether_addr) { .dir_bytes= {1, 2, 3, 4, 5, 6} }; Structure rte_ether_addrdmac= (Structure rte_ether_addr) { .dir_bytes= {6, 5, 4, 3, 2, 1} }; memcpy(&ethics->d_Address.dir_bytes, &dmac.dir_bytes, size of(dmac.dir_bytes)); memcpy(&ethics->s_Address.dir_bytes, &Coup.dir_bytes, size of(Coup.dir_bytes));}
{ leave Periodethics= Phosphor Quadro-Ethernet::re_marked(Package.mut_data())?;ethics.set_src_addr(Ethernet-Address([1, 2, 3, 4, 5, 6]));ethics.set_dst_addr(Ethernet-Address([6, 5, 4, 3, 2, 1]));}

By using Rust, we can easily add functionality written by someone else by simply adding box names to the Cargo.toml file. In the event of data packet changes (e.g. header changes), we can therefore easily access external libraries. We tested and compared several open source libraries; Conclusions and performance tests are below.

different topics

{ RTE_LCORE_FOREACH_WORKER(lcore_id) { rte_eal_remote_launch(lcore_function,Private,lcore_id); }}
{ dpdk::Hilo::to reach(|to reach| { forKernemGans.lcores() {Kern.throw(to reach, |Private| lcore_function(Private)); } })?;}

The DPDK allows us to launch code in a specific logical kernel. We combined Rust's thread management with DPDK's lcore management. The API allows you to create threads in specific colors while providing the convenience and security that Rust threads are known for.

Example: l2fwd

To test the Rust API, we implemented a DPDK font application in Rust. We decided to try l2fwd (l2fwd description). It's a simple application that can receive packets from a port, change MAC addresses in an Ethernet header, and forward the packets to another port. Here you can find the source codes ofl2fwd em Cjl2fwd in Rost.

>> Discover ourrust development services

(Video) Is This NEW Language BETTER Than Rust? C++? (Zig First Impressions)

Rust vs C: Performance Comparison

We compared the performance of both applications in a full environment with two Intel Xeon Gold 6252 CPUs. l2fwd used a core from the first CPU (NUMA node 0) while theTRex traffic generatorused 16 cores from the other cpu (node ​​NUMA 1). Both applications used local memory of the NUMA node, so they didn't share any resources that could affect performance (e.g. caches, memory).

Additionally, l2fwd used a single interface of a 25Gbps Intel Ethernet XXV710 network adapter connected to NUMA node 0 and a RX and TX queue for traffic management. TRex used a NIC with a 25 Gbps interface connected to NUMA node 1.

The generated traffic consisted of L2 packets with a single IPv4 and UDP header with different IP addresses and UDP ports. Added additional data to the end of the packet when testing larger packet sizes. Further details on the subject of the environment can be found in ourRepository.

Rust vs. C: Performance and Security in Low-Level Programming (3)

Cowardly. 1 The environment used for the test

During the test, we also measured kernel utilization. In the main function, l2fwd polls incoming packets in an endless loop. This allowed us to achieve better performance than traditional interrupt packet handling. However, CPU utilization is always around 100%, making it harder to measure actual kernel utilization. We use the method describedHereto measure CPU usage.

Each time through the loop, l2fwd attempts to read a maximum of 32 packetsrte_eth_rx_burst(). We can calculate the average number of packets received from these calls. If the average value is high, it means that l2fwd is receiving and processing packets on most of the loops. When this value is low, l2fwd usually loops without doing any meaningful work.

overload test

The congestion test consisted of sending as much traffic as possible to exceed l2fwd's processing resources. In this case, we wanted to test software performance, knowing that l2fwd would always have something to do.

Rust vs. C: Performance and Security in Low-Level Programming (4)

Eyelash. Rust vs. C: Overload Test Results

Rust scored well below average in this test. Rust l2fwd received about 1.2 fewer packets than C l2fwd and sent fewer packets. We can also see that the average RX burst for Rust l2fwd is at maximum (32), and we achieved an RX burst size of about 24.5 for C l2fwd. All of this means that the C implementation is generally faster because it can handle more packets. We believe that the main reason for these differences is that the handling of lost packages written in Rust is less efficient compared to C.

The drop rate is very high for both apps. The most likely cause is that we were using a single TX queue and it couldn't handle more packets.

It also explains why the drop rate of C's implementation is higher than Rust's. Both applications wanted to send more packets than the TX queue could handle, but C was faster than Rust and tried to send more packets, resulting in a higher drop rate.

RFC2544

We ran the RFC2544 test on both implementations of l2fwd. The results are shown below.

Note that the actual packet sizes sent in the charts consist of the packet data plus an additional 13 bytes added by TRex (preamble and IFG).

Rust vs. C: Performance and Security in Low-Level Programming (5)

Cowardly. 2 Performance comparison between C and Rust [bps].

Rust vs. C: Performance and Security in Low-Level Programming (6)

Cowardly. 3 Performance comparison between C and Rust [pps]. The bigger the packet, the fewer packets you have to send to reach 25 Gbps. This explains the decrease in pps for larger packets.

Rust vs. C: Performance and Security in Low-Level Programming (7)

(Video) This Is How Rust Stops Memory Leaks

Cowardly. 4 Average latency comparison between C and Rust

Rust vs. C: Performance and Security in Low-Level Programming (8)

Cowardly. 5 Comparison of C-Jitter and Rust

Rust vs. C: Performance and Security in Low-Level Programming (9)

Cowardly. 6 TRex RX differentiation for Rust l2fwd and C l2fwd

During testing, we observed that the average number of packets per RX burst was always less than 2. This means l2fwd was idle during the test, which explains why the results for C and Rust are so similar. To see visible differences, we would have to test them with more complex applications.

We couldn't get 25 Gbps on smaller packages, even though l2fwd was quite slow. This was attributed to a single TX queue not being able to handle all of the traffic.

We can also see that there were some tests where Rust performs slightly better than C, even though it performs worse in the overload test. We suspect that the Rust implementation is mostly worse at handling dropped packages, so in the case of RFC2544 where downloads are not allowed, we can see that the Rust implementation gives comparable or sometimes better results than the Rust implementation. by C

Appendix: Package processing libraries

We tested some libraries for data package changes. The tests consisted of modifying the packet data (setting the source and destination MAC addresses and the source IP address to constant values). All testing was done locally on a single machine, so he only tested the memory changes made by the libraries, not traffic management. Test accessories can be found on ourRepository.

Tests created with call time optimizations performed significantly better than those without, so we compared these two cases.

Etherparse library

Forperpetuatewe tried different methods to change the package:

  • Instead of just changing the required locations, build the package from scratch:

Rust vs. C: Performance and Security in Low-Level Programming (10)

Cowardly. 7 link timing optimizations enabled vs disabled with the Etherparse library (build packages from scratch).

  • Using Rust std::io::Cursor which only copies the required memory:

Rust vs. C: Performance and Security in Low-Level Programming (11)

Cowardly. 8 Link timing optimizations enabled vs. disabled using the Etherparse library (using std::io::Cursor).

  • Using pure slices instead of cursors:

Rust vs. C: Performance and Security in Low-Level Programming (12)

Cowardly. 9 link timing optimizations enabled vs. disabled with Etherparse library (using segments).

We compare the assembly instructions generated in all these cases according to binding time optimizations:

  • Cutting method:

    • 465 asm instructions
    • includes calls to memcpy, memset, malloc
  • Cursor Method:

    (Video) Top 5 Fastest Programming Languages: Rust, C++, Swift, Java, and 90 more compared!

    • 871 asm instructions, nearly twice the length of the pruning method. That's why the results in benchmarks are worse than in slices
    • includes calls to memcpy, memset, malloc
  • Create package from scratch:

    • 881 asm instructions
    • too much memory manipulation
    • Allocate and free storage space

pnet library

EmRot, package changes in effect:

Rust vs. C: Performance and Security in Low-Level Programming (13)

Cowardly. 10 link timing optimizations enabled or disabled using the Pnet library.

Review of the generated assembly instructions after binding time optimization:

  • 19 asm statements
  • 2 branches

A smoltcp library


Emsmoltcp, package changes in effect:

Rust vs. C: Performance and Security in Low-Level Programming (14)

Cowardly. 11 link timing optimizations enabled or disabled using the smoltcp library.

Review of the generated assembly instructions after binding time optimization:

  • 27 asm instructions
  • 5 branches

Conclusions from the performance results

A comparison of all libraries can be found below:

Rust vs. C: Performance and Security in Low-Level Programming (15)

Cowardly. 12 Comparison of all libraries with binding time optimization enabled

Rust vs. C: Performance and Security in Low-Level Programming (16)

Cowardly. 13 Comparison of all libraries with disabled link time optimization

Without link-time optimizations, pnet and smoltcp are very different, although both modify packages. A possible reason is the poorer implementation of the pnet library, which was more difficult for the compiler to optimize. After enabling link timing optimizations, pnet and smoltcp returned similar timing results, although smoltcp generated more assembler statements and branches than pnet. All of these branches were error checks, so the branch predictor didn't have much trouble optimizing them.

Etherparse generates a lot more assembler instructions than pnet and smoltcp and also seems better suited to building packages from scratch than modifying them.

We used smoltcp in the l2fwd implementation because it worked very well with and without link time optimizations.

final thoughts

Our tests show that replacing C with Rust resulted in performance degradation. An implementation written in Rust achieved about 85% performance in C in the overload test, but we still see room for improvement in the implementation of bindings, which could bring us closer to the performance of C. On the other hand, we have Rust's security controls, which simplify the Creation of secure code. When programming the system, e.g. B. network applications, this is very valuable.

We also used a very simple l2fwd, which is why the RFC2544 results were so close. We can't be sure how Rust would perform in a more demanding scenario. That's why in the future we plan to build a complex application in C and Rust using the hooks we described. This would allow us to do a more detailed performance comparison.

Language developers learned their lesson and developed Rust, a modern alternative to C and C++ that solves many of the problems (e.g. related to memory management or multithreaded programming) in programming languages. We are happy that it replaces C and C++ in some use cases.

(Video) Why Rust is Being Used for Evil

Videos

1. Lecture: Rust vs. C Programming Languages
(Kevin Boos)
2. Prime Reacts: Is This NEW Language BETTER Than Rust? C++? (Zig First Impressions)
(ThePrimeTime)
3. Rust Is Coming to The Linux Kernel and I'm Not Sure How to Feel
(Low Level Learning)
4. Rust Runs on EVERYTHING, Including the Arduino | Adventures in Embedded Rust Programming
(Low Level Learning)
5. April 2021 CACM: Safe Systems Programming in Rust
(Association for Computing Machinery (ACM))
6. Carbon Lang… The C++ killer?
(Fireship)

References

Top Articles
Latest Posts
Article information

Author: Lilliana Bartoletti

Last Updated: 14/08/2023

Views: 6643

Rating: 4.2 / 5 (73 voted)

Reviews: 80% of readers found this page helpful

Author information

Name: Lilliana Bartoletti

Birthday: 1999-11-18

Address: 58866 Tricia Spurs, North Melvinberg, HI 91346-3774

Phone: +50616620367928

Job: Real-Estate Liaison

Hobby: Graffiti, Astronomy, Handball, Magic, Origami, Fashion, Foreign language learning

Introduction: My name is Lilliana Bartoletti, I am a adventurous, pleasant, shiny, beautiful, handsome, zealous, tasty person who loves writing and wants to share my knowledge and understanding with you.