Is multicore still a challenge?
09 December 2014
The latest Multicore Conference, which is in its 5th year, attracted over 250 experts. The themes at this year’s event were; architectures, languages and mobile.
Jim Whittaker, the man responsible for the MIPS processor IP, kicked things off with a look at the disruption required in the multicore landscape to address new application opportunities; from wearables and robots to HPC and big data.
Jim highlighted that one of the keys to delivering leading edge CPU IP cores for SoCs is the flexibility of connectivity between cores, clusters and threads, both coherent and non-coherent. The workload should dictate the optimal architecture of the compute resources required to support it. For example, graphics and many compute problems are embarrassingly parallel and programmable; hence GPUs are massively parallel compute engines. Video however is a well-defined space and there are many examples of dedicated IP blocks that can do a better job delivering performance and efficiency than programmable core or multi-cores. The end result being that modern ICs need to offer a ‘multicore’ architecture that supports fast and efficient connectivity.
One of the biggest challenges in multicore is providing programming languages that developers can use to harness m theoretical performance. Simon McIntosh-Smith, from the University of Bristol computer research group, led the languages theme. He looked at the continuing renaissance in parallel programming languages. Using parallelism to gain performance means it no longer fits into the niche for HPC, but is rapidly becoming the norm in desktop, mobile and embedded platforms where almost all programming is now parallel. Simon outlined approaches such as adding parallelism extensions to existing languages to creating completely new languages. This is where emerging standards with proprietary approaches and high-level and low-level approaches often compromise performance or ease-of-use.
The emergence of ‘industry-standard’ parallel programming languages provide relatively high-level cross-platform and open APIs. Solutions such as “Metal” attempt to harnesses graphics and compute into a single low-level language. This offers a proprietary approach to squeezing maximum performance and efficiency from the underlying hardware.
The army of C++ developers also continue to march toward parallelism with future releases of the standard scheduled to include mechanisms for handling concurrency and parallelism. APIs such as OpenCL, and new domain specific languages and libraries built on-top of them may make the task a little easier.
Processors for intelligence
The afternoon keynote by Simon Knowles, the CTO at XMOS and founder of ICERA was titled “Processors for Intelligence” and illuminated the need for a new breed of processor to deliver truly intelligent devices.
While improvements in multicore design for CPU and GPU have led to smart computing, these smart applications only provide faster methods to compute solutions to problems we can express in exact algorithms.
The new challenge is to build intelligent processors that solve problems for which an exact algorithm can’t be conceived, when it is too expensive to run or where there are not enough inputs available. These new processors will need to make judgements based on experience and will work on tough problems like visual recognition and speech. Many of these problems share similar characteristics and lend themselves to being represented as sparse graphs.
Sparse matrix-multiply are only one to two per cent efficient. So if task parallelism lives in the CPU and dense data parallelism is catered for by the vector machines found in GPUs and wide SIMD data path CPUs, where does sparse data parallelism live? In ‘sparse’ processors of course, and it’s these sparse graph processors that will lead to truly intelligent applications.
Contact Details and Archive...