Comprehensive review of the RDNA architecture and AMD Navi graphics cards

June 30, 2019

In the next horizon event, which took place during E3 this year, AMD provided good information on Navi graphics processors with the new RDNA architecture.

The graphics processor is to be used in two Radeon RX 5700 XT and Radeon RX 5700 graphics cards.

AMD plans to ship these two graphics cards to the Nvidia RTX 2070 and RTX 2060 Turing Card.

This article examines the evolution of RDNA architecture in the Navi graphics processor

and will become more familiar with the latest AMD graphics cards based on this architecture, the RX 5700 XT and the RX 5700.

navi

AMD will not be using hardware-based Ray Tracing in the first wave of Navi graphics processors (called Navi 10).

David Wong,

who is at the head of the RTG (Radon Technologies Group), believes that radiation detection in cloud computing (CLAUD) is now the best,

and then the results of the calculations can be sent to the output of the displays.

Despite this, AMD’s Navi 20 graphics processor, which will be available by the year 2021, may feature this feature.

The Navi graphics processor,

which provides graphics card processing capabilities for the RX 5700 series cards,

is powered by Taiwan’s 7-nanometer manufacturing technology TSMC,

and is powered by third-generation CPUs that support the PCIe 4.0 communication standard.

Therefore,

AMD’s RX 5700 series graphics cards will be the first GPUs to be supported by the fourth-generation PCIe expansion slot, with its full-bandwidth capacity.

The Navi graphics processor, with the benefit of the Radeon Media and Radeon Display engines,

perfectly satisfies the needs of both streamers and content makers and leads them to a set of new display technologies.

Navi graphics processors Although inherited from the deployed GCN architecture, the new RDNA architecture has revolutionized many improvements.

In other words, Navi can be considered as an extract of both the GCN and RDNA architectures. AMD is well aware that GCN is still a very good way to carry out heavy computing tasks; it’s a task in which high throughput and work-life balance play a key role.

The Vega 64 graphics processor, with its own specs, was about to overcome the GeForce GTX 1080, but failed to do so.

The reason for the failure of this card to overcome the rival flagship was that the Vega chip produced by the GCN architecture does not work well in using its arsenal of cores and memory cache.

On the other hand, Navi’s graphics processors perform better in both of these areas, because AMD says that these graphics chips will be presented with a new combination of computing units (CUs), cache memory hierarchies, and GPUs. In the remainder of this article, We will review these features.

In the first step, it’s best to look at the block diagram of the Navi 10 graphics processor.

The GPU is powered by the RX 5700 XT graphics card (with full operational capability) and the RX 5700 graphics card (with reduced operating power).

navi

The Navi 10 graphics processor includes 40 computational units (in 20 units of dual), each with 64 processors or streamers,

and a total of 2560 computing cores in the heart of the processor.

It is true that the number of these cores has been reduced compared with the graphics chip of the Vega 64 and Vega 56 cards (with 4096 and 3584 stream processors respectively),

but this time with a new and more efficient design in each CU with the RDNA architecture.

Each CU in the design of the Navi 10 chip includes an additional scalar unit (whose task is to run mathematical vector calculations) and an additional scheduler, which combines these two rates of execution of instructions (Instruction Rate) to twice the previous generation Finds.

Such a combination is much more effective than GCN for gaming and processing graphical environments.

Reducing latency, improving the performance of Single-Threading, and creating greater compatibility with game-based processing is one of the main goals of the RDNA architecture.

In the new architecture, the layout of SIMDs has also changed dramatically. SIMD is a string of logical computing cores (ALUs), each of which kernels executing a work item or crisp item from an issued instruction in a clock cycle.

In the old architecture of GCN, each computing unit includes four SIMD16 (16 cores), and in the new architecture RDNA, each CU contains two SIMD32 (32 cores).

Each SIMD in the new RDNA architecture has its own scalar unit and a scheduler; while in the GCN design, a scalar unit and a scheduler are shared among all SIMDs, and this is one of the strengths of the serious architecture of the number goes.

navi

In the GCN architecture,

each instruction (in the most complex conditions) loaded on a wavefront with 64 crises (Wave64) and assigned to a SIMD16 to run.

 In this situation, each instruction is distributed throughout the four clock cycles among the ALUs.

So, in the old architecture, SIMD is not able to process an instruction in a single clock cycle.

In this case,

at each clock cycle only 25% of the capacity of the logical cores in each computational unit is used,

and so the utilization rate of resources is not satisfactory.

Instead, in the RDNA’s new architecture, instructions with 64 items are distributed as two wave fronts each with 32 crises (Wave32) simultaneously between 2 SIMD32 units.

In this case, all work items are processed in a clock cycle.

Thus,

the waiting time for processing results is reduced and,

on the other hand, 100% of the computing unit resources are used to optimize the processing variables.

navi

In short, RDNA architecture,

by simplifying the instructions issued, becomes an architecture-based game-freindly-compatible architecture in an effective way of a heavy-computing-focused architecture.

The calculations are sharper in this architecture, and instead of putting instructions at the 64-wave front,

each instruction loads on a 32-foot frontside (or in the most complex case on 32 fronts) and runs over a clock cycle on SIMD32s.
The compiler in this architecture still has the option to select the type of instruction and its implementation form.

 This unit can call the instructions as Wave32 or execute a Wave64 instruction to run on two SIMD32s,

and selecting one of these two methods depends on the processing volume and the processor load.

In RDNA’s new architecture,

the processing resources of the two computational units (CUs) are adjacent to each other,

and by utilizing the overlapping workflow,

the performance of these two units is adaptable and combinable,

the possibility of loading and executing larger workgroups, and ultimately latency Decreases.

In general, the RDNA architecture has the ultimate goal of reducing latency,

improving Single-Threading processing performance, and optimizing cache efficiency compared to GCN architecture. Then In the new architecture,

more work is done at each clock cycle and in each computing unit.

And in the architecture of RDNA, by changing the layout of the computational units, we are processing a 64-bucket instruction in a clock cycle.

Perhaps the question arises is that, despite tens of thousands of crisp processing in graphical computing,

why is the focus of this architecture on Single-Threading processing.

In response,

it should be said that filling machine computing units with the GCN architecture

with a wide range of different workloads and despite the many crisp wait for processing is not easy.

For this reason, we are faced with dramatic changes in the RDNA architecture,

which causes all computing components of a machine to interact simultaneously with the implementation of parallel pairs,

which means that a logical processor can be executed crisply in a clock cycle (in other words, in a unit time) without delaying any One of the cores.

Given the assumption that the instructions issued do not have much to do with each other,

having 2 units of scalar and 2 units of scheduler per CU and an optimal rearrangement of the SIMD units,

we will achieve a higher level of performance and more sustainability in the RDNA architecture.

In its new graphic architecture,

AMD has added a dedicated L1 cache to the Navi chip,

relying twice on the bandwidth of the closest memory cache (L0) to the ALU.

Here, the goal is to reduce the latency of access to cache at any level. In other words,

in this architecture, effective bandwidth increases,

because the required data, instead of fetching slower buffer frames,

is pasted on different levels of the cache cache and is then called more quickly.

navi

AMD has argued that RDNA faces improvements in color compression across the Python. In graphics environments, graphical data is compressed from any type and locally to reduce the amount of bandwidth involved.

The RDNA architecture improves the Delta Color Compression (DCC) algorithm, and now the readers can read and write straight-through color data.

The display section in this architecture can directly read compressed data stored on the memory system.

The goal is to increase bandwidth availability and reduce the power consumption of the previous generation GCN architecture.

The RDNA Architecture also has other improvements as well. Overall, the new architecture is more agile, more effective, less costly and more consistent with gaming coding compared to GCN.

On paper, the final output of computing power and bandwidth RDNA architecture will increase the frame rate in gaming environments than GCN.

navi

If we want to take a statistically significant view of RDNA’s superiority over GCN, the company claims that it will perform at 25% better performance at the same clock speed.

However, considering the 7nm chip manufacturing technology of Navi and the possibility of further clock speeds than the Vega chips, the RDNA architecture is at a CU-CU convergence basis up to 50 percent faster than GCN.

According to AMD’s claims, power efficiency (performance level per watt power consumption), the Navi graphics chip is 10 to 50 percent more than the GCN.

Review of Navi 10 based graphics cards

AMD’s RX 5700 XT graphics card operates all of the computing power of the Navi 10 graphics processor and deactivates some of the chip’s resources in the weaker RX 5700 card.

The Navi 10 chip used on these two 251 mm square cross-sectional cards is built with the 7-nanometer TSMC technology. AMD has used 10 billion transistors in this small space.

navi

Navi chips also support the third-generation processor-based processor of the PCIe 4.0 communication standard. By supporting this standard, the bandwidth of the slot will be 2 times the PCIe standard.

Such bandwidth is unlikely to be used for today’s games, but it may become a good tool for content makers to implement high-resolution resolutions and process heavy data sets.

The physical graphics card has a relatively large 10.5-inch PCB, which occupies the space equivalent to two slots on the board.

In this work, a blower fan is used for cooling, which is ideally silent.

The extra teeth on the edge of the card are more beautiful, and AMD staff say the denture is a “power meter”.

Out of these two cards, the C Type USB port is not visible; however, depending on the type of architecture, there is support for such a port.

The memory controller’s bus width is 256-bit on both cards, which works with GDDR6 memory chips at 14GB / s. The total memory bandwidth is 448 GB / s.

The key difference between the two RX 5700 XT and RX 5700 cards is that the XT’s more powerful card has 40 computing units and 2560 processors, while the more conventional model is equipped with 36 computing units and 2304 CPUs.

Of course, the different clock speeds on the RX 5700 card are less than the RX 5700 XT, and thus have a lower level of performance.

One of the main goals of the Navi chip architecture is to achieve a higher clock speed compared to the Vega graphics processor.

A look at the frequency numbers provided by AMD confirms this proposition in new graphics cards.

The base clock and boost speed of the RX 5700 XT is 1605 and 1905 MHz, respectively, and the numbers on the RX 5700 card are 1465 and 1725 MHz, respectively.

AMD has introduced a third-speed clock called Navigator clock on graphics cards based on Navi.

The gaming clock is a conservative estimate of frequency that can be expected from a Navi card.

We can say that this is the speed of the clock in the normal loading (gaming) that is expected to be achieved.

AMD wants to give beginner users a glimpse of the speed at which they should expect to run at the time they run the game.

The clock speed of the RX 5700 XT card is 1755 MHz and the RX 5700 is 1625 MHz.

navi

The computing power of the RX 5700 XT and RX 5700 graphics cards is 9 and 7.5 traf fails.

Despite the benefits of RDNA architecture compared to GCN, these numbers are higher than the RX 590, but at a lower level than the Rx Vega 64 or RX Vega 56 cards.

The RX 5700 XT’s thermal design capability is 225 watts and the RX 5700 thermal design capability is 180 watts.

It is reasonable to imagine that the compatibility and similarity of parts

of the new architecture with the GCN architecture will make the Navi graphics cards still more power-hungry than NVIDIA’s.

The power of both cards is provided with a combination of 8-pin and 6-pin power connectors.

The RX 5700 XT and RX 5700 cards perform better on most titles of the game and at 1440p,

compared to the RTX 2070 and RTX 2060, respectively.

Based on its own experiments,

AMD claims the RX 5700 XT works on some gaming titles

and the highest API levels slightly faster than the Nvidia RTX 2070 card.

Based on the charts shown by the company,

the RX 5700 XT card in Battlefield 5 in the 1440p resolution

and Ultra Graphics settings up to 22% faster than the RTX 2070,

and in the Shadow of the Tomb Raider game at the same resolution and highest level of settings. The graphics remain unchanged at 3% of the Turing Card.

navi

On the other hand, AMD offers a picture of the test results of the RX 5700 card compared to the RTX 2060.

Based on this image, the card has been able to win rival cards in all titles (including Shadow of the Tomb Raider) and in the 1440p resolution.

navi

A closer look at the performance level of these two Navi 10 graphics cards,

both of which exceed the Vega 56 card at performance level and appear at the size and size of a Vega 64 card.

None of these cards in terms of performance level does not match the current flagship AMD Radeon VII card

with the Vega 20 slider and very fast HBM2 graphics.

But what’s important is to achieve this level of performance with a tighter chip and more power efficiency.

Navi’s graphics processors come with a smoother architecture that combines smoother memory hierarchy

with AMD’s more sophisticated build technology to its decent-performance performance level.

The Navi 10 graphics processor has never been designed to crack down on GeForce’s flagship processors,

and even does not come close to the RTX 2080 Ti,

but at a price range of 250 to 400 euros,

the chipset replaces the current Vega cards and threatens Nvidia’s accounts to account. comes.

Of course,

it should not be forgotten that NVIDIA is developing a new line of RTX series graphics cards called Super,

which will soon enter the hardware market, replacing the current RTX 2000 cards,

and upgrading the performance of all the cards from the flagship to the flagship.

 In this case,

perhaps the overwhelming use of Navi Cards for some RTX 2000 replacements is not worth much. Of course,

we have to wait for the results of all these cards to be performed by the reviewers and during the independent trials.

The launch of both Navi 10 cards will begin on July 7 at an official ceremony in Los Angeles. For the RX 5700,

the AMD is priced at $ 379 and has a $ 449 price tag for a rugged RX 5700 XT card.

The company also announced the launch of the 50th anniversary of the RX 5700 XT graphics card,

which uses gold plating around the fan and on the frame.

The cards will be available at higher frequencies of 1680, 1830,

and 1980 MHz and will be available at $ 499 ($ ​​50 more than the reference model).