Resolving the Technical and Business Challenges of Getting Connected to the Internet of Things

Agenda - List of presentations


Sample of Presentations for IoT DevCon


Multicore Debug
Location: Hilton - San Carlos Room
How Tracing Solves the Multicore System Debug Problem

Modern multicore designs often aggregate wildly different hardware and software technologies. Traditional debuggers, which show a snapshot of a portion of the system, do little to uncover issues that arise due to complex interaction of components. Engineers routinely cobble together proprietary tracing facilities in order to have some prayer of catching hard to find defects. What is needed to debug these diverse systems is a tool that can analyze trace data coming from many different collection technologies side by side. The MCA Tools Infrastructure Working Group is currently working on a trace data standard to address these needs, the Common Trace Format (CTF). This session will discuss the goals behind CTF, introduce the architecture, and show the first tools that are using it.

Make Your Multicore Program Fail Again

One of the biggest challenges facing developers and testers of concurrent programs is the programs' non-deterministic behavior. Scheduling of threads or processes is affected by a large number of unrelated asynchronous events. An intermittent failure may or may not be captured during a test. Even if such failure is captured, it does not help debugging because in most cases there is no mechanism to reproduce it. In this talk I will demonstrate how these bugs can be found and reproduced in randomized yet controlled deterministic environment. I will introduce Maze, a new development tool which is an implementation of this environment on X86 Linux platform. We will look at the methodology and follow up with several practical examples.

Non-intrusive Debug and Performance Optimization for Multicore Systems

Efficiently partitioned and bug-free software running on multiple cores is crucial for taking full advantage of the power of multiple cores. Debugging of such complex software systems comes with an additional degree of complexity due to vanishing accessibility of the sub-system interfaces, buses and concurrent processing paradigm. This presentation will cover on-chip debug technologies, tools and techniques overcoming constraints and accelerating debug cycle for such complex multicore systems. The presentation will also cover applied usage of cross triggering, synchronous run, execution trace, common platform tracers and system trace technology providing comprehensive system level visibility to analyze synchronization, internal bus transactions and sub-system interactions related issues.

Software Debug on Heterogeneous Multicore Virtual Prototype Systems

Virtual prototypes provide a number of advantages for embedded software development, such as an early start to software development, and full visibility into the execution of the software. This is especially important for heterogeneous multicore systems, which can have significant complexity, making traditional development and debug on hardware a difficult task. Debug of software using virtual prototypes offers the capability of doing 3-dimensional debug: having visibility into software execution on and interaction with every hardware component, including peripherals and memory (spatial); using conditional statements (temporal); and being able to debug at any level of software abstraction, from firmware to OS to application. This presentation will show examples of using 3-D debug.


Multicore Frameworks
Location: Hilton - Almaden Ballroom 1
Improving Heterogeneous Multicore Programming using OpenMP

Systems with heterogeneous cores and memory architectures are increasingly being used to improve the performance of embedded systems. These systems are however a very challenging design environment, as they require the programmer to focus on low-level details rather than on the application itself. These low-level details include moving and managing data between cores, and scheduling tasks to the cores. A lot of these low-level codes are vendor specific, which limits their portability. The Multicore Association recently completed low-level APIs for communication and resource management (MCAPI and MRAPI) as first steps toward a generalized multicore programming model. We show how the widely-adopted OpenMP standard can be used as a high-level programming model to improve programmer productivity and achieve code-portability in the embedded space. Our OpenMP runtime library uses MCA libraries as an underlying API, making the library portable across systems that support the MCA routines. With this solution, programmers can focus on application development in a high-level fashion, and leave low-level details to the runtime.

Parallel Programming for Embedded Multicore Processors Using OpenMP

OpenMP is the de facto standard for shared memory parallel programming. It provides high-level programming constructs that enable a user to easily expose a program's task and loop level parallelism in an incremental fashion. With OpenMP, a user specifies the parallelization strategy for a program at a high level by annotating the program code with compiler directives that specify how a region of code is executed by a team of threads. The compiler works out the detailed mapping of the computation to the machine. OpenMP has the potential to increase programmer productivity, reducing design, development costs and time to market for embedded systems. This presentation will focus on initial experiences adapting OpenMP to serve as a programming model for an embedded multicore processor.

Portable Software Framework for Packet Processing on Multicore Platforms
Multicore processors provide a range of on-chip resources to accelerate performance. However the complexity of the cores, accelerators and offload engines means that most software developers are unable to extract the highest performance that is theoretically achievable from their chosen architecture. Furthermore, optimized code produced for one architecture is generally not portable to another, constraining the ability of architects to select the best processor for their specific application. This presentation explains techniques for the use of a hardware-independent multicore software development framework to produce packet processing software for telecom, networking and security products. The resulting code is compatible with standard OS APIs and portable across multicore architectures. ME907»
Psssst ... Pass It On - Selecting an IPC Protocol for your Multicore Design

This presentation will compare and contrast various IPC mechanisms for embedded systems, focusing on choosing an IPC when using two or more different types of operating systems. We will introduce several communication methods available to the developer and discuss how each one operates from a high-level perspective.  We will discuss the most applicable domains for each method, describe possible pros and cons, and weigh the methods against each other.  Upon completion, the attendee will have a good understanding of each method and its intended use cases and will be able to confidently select an IPC method for their next embedded multicore design.

Utilizing Middleware to Build Multicore Systems

Multi-processing systems increases both the number of cores and the possible combinations of system components, such as (inter chip) i/o devices and transports. The software complexity expands as it becomes more challenging to apply the same software platform on different hardware platforms. Middleware combined with configurable operating systems can provide a set of building blocks, analogous with Lego blocks, that can substantially alleviate this situation. We will discuss how to use and apply "software building blocks" to multicore hardware platforms.


Performance Analysis, Optimizations, and Bottlenecks
Location: Hilton - Almaden Ballroom 2
Analysis and Quantification of How Multicore Processors More Efficiently Provide More Performance

There is no doubt that symmetric multi-core is destined to be the standard processor architecture. All PCs use it. All servers use it. Mobile Computing has just joined the bandwagon and new multi-core solutions are reaching in less traditional embedded applications. This presentation will explore how exactly multi-core is able to deliver more performance throughput whilst being more energy efficient. This is a technical presentation that will describe real-life approaches used in modern operating systems on real SoCs shipping in products today, and will try to quantify and analyse the benefits and issues associated with each approach.

Case Study: Parallelizing Google's vp8 Video Codec
Google's vp8 video codec is positioned to play an important role in future media devices. A reasonably efficient sequential reference implementation is available for commercial use with an open source license. However, this implementation is a complex piece of C code (some 85,000 lines) and it is hard to see how to get to a correct and efficient multicore implementation. This talk shows how to analyze and partition the vp8 algorithm to achieve a required level of performance without studying the vp8 source code in detail. The example will illustrate mapping the vp8 codec to homogeneous multiprocessors, with or without an additional Graphics Processing Unit (GPU), and to a range of heterogeneous multicore architectures. ME828»
Using Multicore to Benchmark Performance and Scalability

Developing software for multicore systems presents the issue of application scalability. In particular when we cannot design for a specific number of cores, we need to accommodate for as many if not all deployment scenarios as possible. In this presentation we look the performance and scalability of an application that customers may run on 1 to N number of cores in a system. We use a case study as an example to see how a single threaded application, can be scaled up to take advantage of multiple cores and even multiple processors on a system to maximise the application performance. We use rough predictive mechanisms like Amdahl’s Law to estimate scalability and finally use performance tools to analyse the impact of hyper threading, core affinity of threads and system interrupts.


Multicore Memory Bandwidth
Location: Hilton - San Carlos Room
The Multi-Channel Solution: Optimizing Memory Bandwidth in Mobile SoCs

Multicore SoCs are packed with heterogeneous processors that all need to access DRAM. Truthfully, consumer SoCs require DRAM bandwidth more than DRAM capacity. Application performance depends on how quickly and efficiently memory is accessed. Simply adding more memory to a system won’t solve the problem. TSV technology enables a number of connections between the stacked SoC and the DRAM, which allows designers to increase bandwidth and reduce traffic contention on-chip. The presenter will discuss efficient memory access data flow algorithms, how to cost-effectively spread SoC traffic across multiple DRAMs, and innovations in stacked memory technology. The presenter will examine cost, performance and power consumption using these new design methodologies for next-generation consumer SoCs.


Manycore Analysis
Location: Hilton - Almaden Ballroom 2
Challenges and Benefits of Manycore Processors in Servers and Cloud Computing Applications

Cloud computing is becoming a dominant consumer of servers. While it's not obvious, there are definitely benefits to using manycore processors in server and cloud computing applications, a market traditionally dominated by large-scale processor architectures. But simply connecting lots of chips with lean cores will not guarantee the required combination of performance, latency, and efficiency advantages over fewer larger cores. This session will review the technical requirements of cloud computing and servers, justifying the value of manycore processors. We'll also explain the requirements of a manycore processor in order to handle the computing and data throughput, and point out the potential bottlenecks. Finally, we'll demonstrate with a case study how to adopt a manycore architecture into the cloud and server domain, including hardware and software challenges.


Executive Summit: Strategies for Integrating Multicore Technology
Location: The St. Claire Hotel
Multicore Industry Update
Markus Levy, Chairman of Multicore Expo

Having chaired the Multicore Expo for the past 6 years, I've seen many changes come about in the multicore ecosystem. In this succinct presentation, I'll provide an overview of where things were, where they are today, and where I see them heading. This should provide you with some background to make important decisions about how and when to use multicore technology.

Avoid Undermining the Capabilities of Your Multicore Design
Pekka Varis, CTO Multicore and Media Infrastructure

With the introduction of multicore processors for nearly all areas of embedded processing, the typical answer to requirements for more processing power is to add more cores. But adding additional cores may actually impair performance if the resulting device is not properly balanced. Balance is achieved through several steps that are based on a deep understanding of the requirements of the target applications.
The first step is selecting the right core.  The best core may be a general purpose core like an ARM or x86 core, a specialized core such as a digital signal processor -- or even a mix of cores. Many applications benefit from the incorporation of specialized processing elements often called accelerators. Identifying the right mix of accelerators is the next step. Once the right mix of cores and processing elements is known, a multicore “infrastructure” can be defined.  This is the most important step. Great cores and high performance accelerators can be crippled by a poorly designed and implemented infrastructure.
Instead of focusing on cores alone, developers should turn their attention to system architecture and the performance of key interfaces when gauging the quality of a new platform. That will enable them to measure the function of key interfaces and correlate those to software operations to reach what should be their end goal: unlocking the full potential of a multicore device. This keynote will focus on the design of a multicore infrastructure from both the hardware and software perspectives. Topics from switch fabrics to busses to memory architecture and cache will be explored.  Hardware elements outside of the cores themselves that can simplify multicore programming will also be discussed.
Expert Panel: A Boundless Appetite for More, More, More: Android + Multicore Yields Synergy for Embedded Devices
How the Rapid Proliferation of Multimedia-Intensive Platforms Is Accelerating Innovation in Multicore Technologies and Dictating New Product Strategy


Android, iOS, MeeGo, and WebOS to name a few. There's no doubt the next-generation multimedia-intensive software platforms are permeating the embedded landscape. These platforms are raising the bar for both mobile and traditional GUI-based devices. The obvious trickle-down effect is the need for more processing horsepower and power management capabilities. The not so obvious effect is the impact on product strategies throughout the value chain, from the expanding GUI-based mobile and end-point devices, to the wireless and core networks and everything connected beyond. Companies with products on either end and anywhere in between must turn to multicore technologies for more: more bandwidth, more processing power, more services, more security, more, more, more.

Join this exclusive VIP panel of industry leaders at the forefront of a once in a lifetime technology disruption opportunity for a lively discussion on the synergy of Android and multicore and the opportunity for innovation. Learn the answers to these thought-provoking questions and uncover some unexpected opportunities:

  • How are the new multimedia platforms, such as Android, and multicore technologies factoring into your product strategy and roadmap?
  • What are the power and performance requirements? Is Green now table stakes?
  • When does a dual OS strategy make sense and where will embedded virtual have the biggest impact-resource allocation, power management, security, etc.?
  • Can it be both real-time and reliable? Is the door open for security to be designed in from the bottom up?
  • What new markets are opened from this new UI and multicore paradigm?

Now is the time to optimize your product strategy for Android and multicore.

The question is no longer "when?"; but "how?"; Don't miss this once in a lifetime opportunity to be at the forefront of innovation.

Managing the Mess: Getting to A Billion Smartphones, and Other Connected Devices
Richard Kramer, Managing Director and Brett Simpson, Director  

In 2010, something strange happened in the tech industry: the ASPs of the single largest end market (mobile devices) actually rose, driven by higher smartphone penetration – now 56% of industry value. We chart the path to 1bn+ smartphones shipping in '13, alongside a vast range of other connected devices, which will change expectations of end customers and vendors alike.  This also raises a range of interoperability and value shift issues for the ecosystem, from vertically integrated players like Apple to a range of component vendors.

Arete is the leading independent technology research firm exclusively serving investors, and free from conflicts of interest. We have an 11-year track record of charting key themes and their impacts – positive and negative – on industry players from chipset, device and software vendors to operators and content owners.


Realtime Considerations
Location: Hilton - Almaden Ballroom 2
Migration From Single-core to Multicore Software in Hard Realtime Environments

A methodology is described for mapping software to a multicore MCU, in an AUTOSAR environment based on a formal approach for scheduling analysis. Core of the methodology is a generic communication benchmark which the end user executes on the MCU with the help of his specific compiler and AUTOSAR environment to determine the costs for various communication scenarios. Furthermore, the user has to describe its application in a model with regards to communication and computation. It is also shown how this application model can be automatically generated. The model is combined with the results of the communication benchmark. The user can judge the success of his mapping, based on specifically developed metrics, and explore other mappings.

Multicore for Real-Time and Safety-Critical Software: Avoid the Pitfalls

The move towards multicore processors is so swift that real-time and safety-critical applications will have to conform to the rules dictated by multicore hardware, while off-the-shelf multicore hardware is optimized for average-case throughput.  This talk focuses on the impact that this move has on real-time and safety-critical code. A software developer must be aware of the effects of cache structures and memory models to understand the consequences on performance and correctness of his code.  The effects range from unpredictable performance degradation to severe software failures. Remedies presented include the usage of processor affinities, and an in-depth understanding and use of the memory model of the selected programming language. Use of explicit locks can limit scalability and even result in performance that is worse compared to a single-core system. The alternative to explicit locks is lock-free algorithms.  However, classical lock-free algorithms have unbounded worst-case execution times, such that they are not usable in real-time code. Lock-free code can be written in a way that the execution time is bounded. This talk will present simple code patterns for building lock-free algorithms for real-time code.

Scaling Applications Across Multicore and Multiprocessor DSP Platforms with a Standards-Based Programming Model

DSP programming has evolved over the last few years as we have seen advent of new algorithms in complex signal processing functionality for applications such as image, video or vision processing. These new algorithms have given rise to a different class of applications that need to scale from supporting single user to, in some cases, many thousands of users. While scaling has become a necessity, business continues to drive for an improvement in design cycles and time-to-market. To satisfy the need for scalability, various multicore silicon vendors have created devices that are capable of scaling from 2 cores to 8 cores with pin-to-pin compatibility. Multicore development platforms are evolving, offering software solutions that simplify the migration to multicore and ease scalability.


Software Design
Location: Hilton - San Carlos Room
Challenges Faced in Optimizing JIT Compilers for Multicore Mobile Platforms

Currently mobile vendors are completing on application bring up time and performance for Android based platforms. This presentation discusses some of the challenges faced to extract performance out of multi core mobile platforms. JIT compiler optimization is one of the ways to achieve this.We discuss the challenges faced by compiler engineers in such a scenario.Technically, the effects of the size of the hot traces given to a JIT by the interpreter are covered. It also covers what kind of optimizations work in a mobile/embedded environment vs desktops. The presentation will also have quantitative information wrt the effect of the optimizations tried out on EEMBC bms. It gives guidelines to application programmers also for utilizing the parallelism built in the hardware and the compiler.

How to Implement a High Performance Security Solution

IPsec and SSL are today’s mainstream VPN technologies, and their implementations can handle the software required at acceptable cost to CPU, power, and bandwidth. But they often require a dedicated acceleration architecture. NPUs are moving to 40 and 100 Gb switching technology, with Comm processors and custom ASICs soon to follow. There’s a trend to add parallel CPUs, but the bandwidth required for switching and routing grows at the same or greater pace. It’s also common to find IPsec or similar implementations in devices like multi-function security appliances, femto cell gateways, session border controllers, storage devices and many machine-to-machine communication devices. This session will teach you to select the best dedicated acceleration architecture implementation for your device.

Securing Multicore Embedded Systems

Historically, security-critical systems have been developed and qualified using single-core processors. These platforms could easily meet their increases in system performance requirements through higher processor clock speeds. However, the industry is now approaching the limit of a relatively simple upgrade path, and there is an increasing trend toward the adoption of multi-core processor architectures in critical systems to address higher performance demands. In this presentation, the challenges involved in migration to multi-core processor architectures are reviewed as well as the specific challenges related to their use in security-sensitive systems.


Parallel Techniques
Location: Hilton - Almaden Ballroom 1
Case Studies on Migration of Single Core SW Applications to Multicore
Legacy migration to multicore software applications comes with its share of challenges, but they can be mitigated when embedded developers divide and conquer the problem, use the right processes, and leverage the right tools for the job. In this course, we will survey two case studies from industry, one a DSP video application, and the other a processor based networking application. We will review the process and methodology used to convert these applications from single core to multicore, and the lessons learned from each. Topics include identification of multicore software; in the application, software partitioning, system configuration, application profiling and optimization, debugging, and integration challenges, and tools support. ME892»
Design Tradeoffs for Portable and Scalable Multicore Software

Rather than developing custom MultiCore software for a specific hardware architecture, general universal software can be written that can run on a variety of architectures---current architectures as well as yet-unknown future architectures. Presented is an examination of software architecture design and trade-offs for use on MultiCore systems using divide-and-conquer algorithms. The presentation also includes a classification of paralyzable problem-types based on data considerations, overhead-reduction techniques, resource considerations, and performance-tuning using run-time monitoring.

Effective Migration of Software onto Multicore DSP Platforms

This session highlights a package of tools and an underlying methodology with which a software developer with a good understanding of C and signal processing algorithms can start to analyze, explore and decide on the correct parallelization strategy for migrating the sequential digital signal processing algorithm onto a multicore DSP. The tutorial walks the user through the whole flow, highlighting the various decision points on the way, and provides clear guidelines to consider when deciding on various parallelization strategies to implement the algorithm on a multicore DSP platform. The tutorial is based around a practical example in the medical signal processing field. The presentation will also touch upon potential implementation programming models such as OpenMP and MCAPI.

Fine-Grain vs. Coarse-Grain Parallelism in Linux

Multicore CPUs are becoming commonplace in embedded and desktop systems. With these new CPUs comes many enhanced instruction sets such as NEON, SSE3 and more. In addition, compilers are now aware of OpenMP and can spread code across multiple cores. But, how do these techniques compare with more traditional multi-threaded approaches such as pthreads? This presentation will discuss the pros and cons of these different approaches to improving performance and what the developer needs to know to use them effectively.

Migrating Serial Code to Utilize a Scalable Number of Local and Networked Processing Cores

Software developers creating and upgrading their code to harness the processing power available from the multicore processors. Legacy applications in these domains are often written for a single processing core and thus cannot leverage the parallelism inherent in multicore/multi-processor systems without extensive refactoring and reprogramming. Therefore, despite the price/performance advantages of multicore/multi-processor systems, it is hard to accelerate and scale the performance of single-threaded embedded applications to take advantage of additional computing resources. This talk will explore various techniques that can successfully be used to convert single-threaded code to utilize a scalable number of threads. As our real-world example, we will use the results of an exercise Zircon Computing and AMD worked through to show that with proper technique and careful attention to the functionality and synchronization requirements of the code to be optimized, near-linear, scalable performance gains can be realized as the number of threads available to that code is increased.