Architecture & Design
Fujitsu Enters Deep Learning, AI Markets With Custom Architecture
- Fujitsu Enters Deep Knowing, AI Markets With Custom-made Architecture
For the past couple of years, the battle over AI, deep learning, and other HPC (High-Performance Computing) workloads has been primarily a two-horse race. It’s between Nvidia, the very first company to launch a GPGPU architecture that might in theory manage such works, and Intel, who has continued to focus on increasing the number of FLOPS its Core processors can handle per clock cycle. AMD is ramping up its own Radeon Instinct and Vega Frontier Edition cards to take on AI also, though the business has yet to win much market share in that arena. Today there’s an emerging 4th gamer– Fujitsu.Fujitsu’s new DLU(Deep Knowing Unit)is indicated to be 10x faster than existing services from its competitors, with assistance for Fujitu’s torus adjoin. It’s not clear if this refers to Tofu (torus blend) 1, which the existing K computer uses, or if the platform will likewise support Tofu 2, which enhances bandwidth from 40Gbps to 100Gbps (from 5GBps to 12.5 GBps). Tofu2 would appear to be the better option, however Fujitsu hasn’t clarified that point yet.Underneath the DLU are an unspecified variety of DPUs(Deep Learning Processing System).
The DPUs are capable of running FP32, FP16, INT16, and INT8 data types. According to the Top500, Fujitsu has previously demonstrated that INT8 can be used without a substantial loss of precision. Depending on the style specs, this may be one method Fujitsu hopes to strike its performance-per-watt targets.Here’s exactly what we know about the underlying design: Each of the DPUs consists of 16 DLEs(Deep Learning Processing Elements), and each DPE has 8 SIMD systems with a really big register file(no cache) under software control. The whole DPU is controlled by a separate master core, which manages execution and handles memory access in between the DPU and its on-chip memory controller.So simply to clarify: The DLU is the entire silicon chip– memory, register files, whatever. DPUs are managed by a different master controller and work out memory gain access to.
The DPUs are comprised of DLEs with their 8 SIMD units, and this is where the number crunching happens. At a very high level, we have actually seen both AMD and Nvidia use comparable ways of organizing resources into CUs, with specific resources duplicated per Compute Unit, and each calculate unit having an associated variety of cores.Fujitsu is currently preparing a second-generation core that will embed itself directly with a CPU, instead of being an off-chip unique part. The business intends to have the first-generation gadget ready
for sale sometime in 2018, which no company date offered for the introduction of the second-gen device.More Articles New iOS 10.3.3 Update Repairs Vital Wi-Fi Security Bug