Repository landing page

We are not able to resolve this OAI Identifier to the repository landing page. If you are the repository manager for this record, please head to the Dashboard and adjust the settings.

Performance Aspects of Synthesizable Computing Systems

Abstract

Indlejrede systemer anvendes i dag i en lang række applikationer der kræver højydeevne, men som er underkastet skarpe restriktioner i henhold til mekanisk design, strømforbrug og pris. Indlejrede systemer der er implementeret ved hjlp ASIC-teknologi opnår typisk den højeste ydeevne med det laveste strmforbrug og den laveste enhedspris. En hj indgangspris kombineret med en lav enhedspris gr ASIC-teknologi egnet til masseproduktion, men uegnet til sma produktionsmngder. Derfor bliver FPGA teknologi i stigende grad anvendt i markeder med sma produktionsmngder. FPGA-teknologien er efterhanden blevet fornet til et sådant niveau, at mange-kernede processor systemer, dedikerede acceleratorer og et stort antal interfaces kan implementeres pa en enkelt FPGA enhed.Denne afhandling består af fem dele. Afhandlingen behandler og undersøger ydeevnen for syntetiserebare computersystemer pa FPGA-enheder. I første del af afhandlingen evalueres måder hvorpå syntetiserebare processorkerner kan udnytte de nyeste avancerede FPGA-arkitekturer. Denne evaluering resulterer i en processorarkitektur der er optimeret til at opnå høj ydeevne pa moderne FPGAenheder. Den nuværende implementering af denne processorarkitektur, kaldet Tinuso-I, kan kres med en taktfrekvens pa op til 376MHz pa en Xilinx Virtex 6 enhed. Tinuso-I anvender færre hardware ressourcer end andre kommercielle processorer i sin klasse. Tinuso arkitekturen anvender prædikeret eksekvering for at undgå dyre pipeline stall forsaget af hop i instruktionsstrømmen. Tinuso arkitekturen eksponerer pipeline hazards til compileren for at holde hardwaren simpel. I anden del af afhandlingen unsersges det hvorvidt en produktionskvaliceret compiler, GCC, er i stand til at anvende prdikerede instruktioner og tilrettelæggelse af instruktionsstrmmen for at mindske effekten af pipeline hazards. Tredje del af afhandlingen beskriver design og implementering af kommunikationsstrukturerne for flere Tinuso multikerne konfiguratioer og evaluerer skalerbarheden af de resulterende systemer. Fjerde del af afhandlingen er et casestudie der viser hvordan en højtydende syntetisk apparatur radar applikation kan afvikles pa et syntetiserebart multikerne system. Det anvendte system består af 64 processorkerner og et 2D kommunikationsnetværk der implementeres pa en enkelt FPGA enhed. Systemetbruger omkring 10 watt. I sidste del af afhandlingen præsenteres en jobbaseret programmeringsmodel der simplicerer hukommelsesmanagement og gr det nemt at udtrykke parallelisme.Embedded systems are used in a broad range of applications that demand high performance within severely constrained mechanical, power, and cost requirements. Embedded systems implemented in ASIC technology tend to provide the highest performance, lowest power consumption and lowest unit cost. However, high setup and design costs make ASICs economically viable only for high volume production. Therefore, FPGAs are increasingly being used in low and medium volume markets. The evolution of FPGAs has reached a point where multiple processor cores, dedicated accelerators, and a large number of interfaces can be integrated on a single device.This thesis consists of ve parts that address performance aspects of synthesizable computing systems on FPGAs. First, it is evaluated how synthesizable processor cores can exploit current state-of-the-art FPGA architectures. This evaluation results in a processor architecture optimized for a high throughput on modern FPGA architectures. The current hardware implementation, the Tinuso I core, can be clocked as high as 376MHz on a Xilinx Virtex 6 device and consumes fewer hardware resources than similar commercial processor congurations. The Tinuso architecture leverages predicated execution to circumvent costly pipeline stalls due to branches and exposes hazards to the compiler to keep the hardware simple. Second, it is investigated if a production compiler, GCC, is able to successfully leverage predicated execution and schedule instructions so as to mitigate the hazards. The third part of this thesis describes the design and implementation of communication structures for Tinuso multicore congurations and evaluates the scalability of these systems. Forth, a case study shows how to map a high performance synthetic aperture radar application to a synthesizable multicore system. The proposed system includes 64 processor cores and a 2D mesh interconnect on a single FPGA device and consumes about 10 watt only. Finally, a task based programming model is proposed that allows for easily expressing parallelism and simplies memory management

Similar works

This paper was published in Online Research Database In Technology.

Having an issue?

Is data on this page outdated, violates copyrights or anything else? Report the problem now and we will take corresponding actions after reviewing your request.