ONE - On-device Neural Engine
|
Executor (IExecutor
) is an execution engine of a subgraph that can execute inference for the subgraph. It is the result of a Subgraph
compilation. Compared to common programming language tools, it is like an interpreter with code to execute.
We can think of an NNPackage model as a set of tasks with dependencies. In other words, it is a form of DAG (more precisely, it is a set of DAGs, as we need multiple subgraphs to support control flow operations). And that is exactly the same concept with Dataflow programming.
That is, there are some input tensors that must be ready to run a operation. And the execution must be done in topological order. Here's the workflow for execution.
We have 3 different types of executors in our codebase and they all are based on the above explanation. However, only LinearExecutor
is official and the other two are experimental.
LinearExecutor
is the main executor. As we know the model to run and the model graph does not change at runtime, we do not need to do the above steps 3-5 at runtime. During the compilation for Linear Executor, it sorts operations in topological order so we can just execute in that fixed order which means that it cannot perform the operations in parallel.
If the tensors are static, it also can analyze the lifetimes of the tensors and pre-allocate tensor memory with reusing memory between the tensors whose lifetimes do not overlap.
Unlike LinearExecutor
, DataflowExecutor
does steps 3-5 at runtime. By doing it we can know which operations are available at a specific point. However this executor still executes the operations one at a time. Just choose any operation that is ready then execute, wait for it to finish then repeat. So there may be no advantage compared to LinearExecutor
but DataflowExecutor
is the parent class of ParallelExecutor
. And DataflowExecutor
can be used for profiling executions for the heterogeneous scheduler.
Just like DataflowExecutor
, ParallelExecutor
does steps 3-5 at runtime. One big difference is that it creates a ThreadPool
for each backend for parallel execution (ThreadPool
is supposed to have multiple threads, however for now, it can have only one thread). Multiple operations ready to execute can be executed in different backends at the same time, which could lead to some performance gain.