SHAKTI Processor Systems

An Open Source Hardware Initiative.

The SHAKTI processor project aims to build 6 variants of processors based on the RISC-V ISA from UC Berkeley (www.riscv.org). The project will develop a complete reference SoC for each family which will serve as an exemplar for that category of processor. While the cores and most of the SoC components (including bus and interconnect fabrics) will be in open source, some standard components like PCIe controller, DDR controller and PHY IP will be proprietary 3rd part IP. All source will be licensed using a 3 part BSD license and will be royalty and patent free (as far as IIT-Madras is concerned, we will not assert any patents).

While the primary focus is research, the SoCs are being designed to be competitive with commercial processors with respect to features, silicon area, power profile and frequency. This of course assumes that an optimal layout process is used to tape out our design. All FPGA data (Xilinx) will also be made available While we do plan to tape out a few variants, given the foundry NDA requirements, we will not be able to publish any layout/backend data.

The various class variants are

C class

  • 32-bit 3-8 stage in-order variant aimed at 50-250 Mhz microcontroller variants
  • Optional memory protection Very low power static design
  • Fault Tolerant variants for ISO26262 applications
  • IoT variants will have compressed/reduced ISA support (RV32E)

I class

  • I Class is 64-bit variant of shakti processor and features a pipeline of depth of 8 stages. The 8 stages of the pipeline being Fetch, Decode, Rename(Map), Wakeup, Select, Drive, Execute, Commit. Each of the pipeline stages take utmost a single cycle to execute.
  • I Class processor supports all standard instructions based on RISC-V ISA. (RV32I and RV64I, RV32M and RV32M )
  • Each instruction has its operand registers renamed in-order but are selected for issue to execution units Out-of-order. Finally, instructions commit in-order.
  • Register Renaming is done through merged register file approach. Merged register file stores both the architectural register values and speculated values. The number of architectural registers are 32 and the number of physical registers are 64. A buffer (register alias table) that maintains the map from architectural registers to physical registers.
  • The type of Branch Predictor used speculative branching is Tournament Branch predictor. Tournament Branch Predictor has Bimodal and Global predictors contend between each other.
  • The Functional Units are parameterised in the design. Current design uses 2 Arithmetic and Logical Units, 1 Branch Unit, a Load Store Unit.
  • A fully parameterized I-Cache and D-Cache use physical address for both index and tag (PIPT). Each cache is of size of 32KB. The cache is fully parameterised in terms of the size of the cache, associativity, number of blocks within a cache line, number of sets, etc. The caches are implemented using BRAMs provided in the Bluespec library. These BRAMs have a direct correlation to the FPGA based Block RAMs, thus making translation to an FPGA based design easy.
  • This variant of I Class processor supports Dual Issue. This variant supports no hardware for address translation or memory protection.

    Access to Code»

M Class

  • Enhanced variants of the I-class processors aimed at general purpose compute, low end server and mobile applications
  • Enhancements over I class, large issue size, quad-threaded, up to 8 cores, freq up to 2.5 Ghz, optional NoC fabric

S class

  • 64-bit superscalar, multi-threaded variant for desktop/server applications.
  • 1.2-3Ghz, 2-16 cores, crossbar/ring interconnect, segmented L3 cache
  • RapidIO based external cache coherent interconnect for multi-socket applications (up to 256 sockets)
  • Hybrid Memory Cube support
  • 256/512 bit SIMD/VPU (we may go for a pure VPU as opposed to a packed SIMD variant)
  • Specialized Functional units for database acceleration, security acceleration, data analytics
  • Experimental variants will be used as test-bed for our Adaptive System Fabric project which aims to design a data-center architecture using NV RAM devices and unified interconnects for memory, storage and networking and leverages persistent memory techniques

H class

  • 64-bit in-order, multi-threaded, HPC variant with 32-100 cores
  • 512/1024 bit VPU/SIMD
  • Mesh Interconnectw ith multiple cache coherent domains
  • Goal is 3-5 + Tflops (DP, sustained)

T class processors

  • Experimental security oriented 64-bit variants with tagged ISA
  • single address design being explored
  • decoupling of protection from memory management
  • HW based capability management
  • ultra low-power variants for smart card applications

F class processors

  • Fault tolerant varinats for aerospace and auto-safety applications
  • FT technqiues used in all parts of the system
  • RESO technqiues used in the cores
  • duplicated I/O paths
  • ECC for all memory and buffers
  • optional mult-core variants for n-way voting
  • research test bed for exploring FT logic techniques
  • FT silicon processes and FPGAs can be used to enhance reliability but are not mandatory

Processor Interconnect

We are also developing a processor to processor cache-coherent interconnect to allow building of multi-socket S class systems. The interconnect is based on the RapidIO interconnect. We are investigating a two tier scheme where a MOESI/MESIF style scheme is used for 2-8 socket systems anda directory based scheme for larger configurations (max 256 sockets)

Design Approach

The approach is to built optimal (high performance) building blocks that can be shared among the variants and then add variant specific blocks. The above variants are just canonical references and the Shakti family will see variants that will be hybrids. When possible, we have also provided the Synopsys and Xilinx synthesis results for each module. Final versions will contain the full BSV code, the generated Verilog code, testbenches, verification IP and FPGA support files. Related projects The SHAKTI effort is part of a larger effort to build complte systems. As part of this effort, IIT-M is developing interconnects (optical and copper) based on Gen 3 (10/25G per lane) RapidIO and a scale-out SSD storage system called lightsor (see lightstor.org) based on this interconnect.

The final goal is to build a fabric called Adaptive System fabric that will use a combination of Hybrid Memory Cubes and RapidIO that will unify support for compute, networking and storage.

Source Code

The Bitbucket Repo for the I-Class: Here»

Project Coordinators

  1. Prof. V. Kamakoti (veezhi (at) gmail (dot) com)
  2. G S Madhusudan(gs.madhusudan (at) cse (dot) iitm (dot) ac (dot) in)

Student Contributors

  1. Neel Gala - PhD Scholar
  2. Rahul Bodunna - Project Associate
  3. Arjun Menon - MS Scholar
  • For Queries or Collaboration you can contact our team at: shakti (dot) iitm (at) gmail (dot) com

    Videos/Slides on Shakti

    1. 1st RISCV Workshop 2014 - Video
    2. 1st RISCV Workshop 2014 - Slides
    3. OrConf 2015 - Slides

    Internship

    Students seeking internship under the SHAKTI Processor initiative are requested to please fill the following form: Form