ONE - On-device Neural Engine
Loading...
Searching...
No Matches
arm_compute::NEFullyConnectedHybridLayer Class Reference

#include <NEFullyConnectedHybridLayer.h>

Collaboration diagram for arm_compute::NEFullyConnectedHybridLayer:

Public Member Functions

 NEFullyConnectedHybridLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 
 NEFullyConnectedHybridLayer (const NEFullyConnectedHybridLayer &)=delete
 
 NEFullyConnectedHybridLayer (NEFullyConnectedHybridLayer &&)=default
 
NEFullyConnectedHybridLayeroperator= (const NEFullyConnectedHybridLayer &)=delete
 
NEFullyConnectedHybridLayeroperator= (NEFullyConnectedHybridLayer &&)=default
 
void configure (const ITensor *input, const ITensor *weights, const ITensor *biases, ITensor *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
 
void run () override
 
void prepare () override
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
 

Detailed Description

Basic function to compute a Fully Connected layer on NEON. This function calls the following NEON kernels:

  1. NEIm2ColKernel (called when the input comes from a convolutional layer)
  2. NETranspose (if are_weights_reshaped is set to false and transpose_weights is set to true ) (called once)
  3. NEGEMMMatrixMultiplyKernel or NEGEMMLowpMatrixMultiplyCore (if quantized asymmetric)
  4. NEGEMMMatrixAccumulateBiasesKernel or NEGEMMLowpQuantizeDownInt32ToUint8ScaleByFixedPoint (if quantized asymmetric) (if biases is not equal to nullptr)
Note
The fully connected layer accepts "weights" tensors only with 2 dimensions.

Definition at line 69 of file NEFullyConnectedHybridLayer.h.

Constructor & Destructor Documentation

◆ NEFullyConnectedHybridLayer() [1/3]

NEFullyConnectedHybridLayer::NEFullyConnectedHybridLayer ( std::shared_ptr< IMemoryManager >  memory_manager = nullptr)

Constructor

Definition at line 67 of file NEFullyConnectedHybridLayer.cpp.

69 : _memory_group(std::move(memory_manager)), _reshape_weights_function(), _quant_input_kernel(),
70 _mm_gemmlowp(), _accumulate_biases_kernel(), _reshape_weights_output(), _quantized_input(),
71 _scale_factor(), _original_weights(nullptr), _are_weights_reshaped(false),
72 _accumulate_biases(false), _is_prepared(false)
73{
74}

◆ NEFullyConnectedHybridLayer() [2/3]

arm_compute::NEFullyConnectedHybridLayer::NEFullyConnectedHybridLayer ( const NEFullyConnectedHybridLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ NEFullyConnectedHybridLayer() [3/3]

arm_compute::NEFullyConnectedHybridLayer::NEFullyConnectedHybridLayer ( NEFullyConnectedHybridLayer &&  )
default

Default move constructor

Member Function Documentation

◆ configure()

void NEFullyConnectedHybridLayer::configure ( const ITensor *  input,
const ITensor *  weights,
const ITensor *  biases,
ITensor *  output,
FullyConnectedLayerInfo  fc_info = FullyConnectedLayerInfo() 
)

Set the input and output tensors.

Parameters
[in]inputSource tensor. Data type supported: F16/F32.
[in]weightsWeights tensor. The weights must be 2 dimensional. If this function is called after a Convolution Layer, the (transposed) weights will have as many rows as the product of the first 3 input's dimensions. If it is called after another FullyConnected Layer, the (transposed) weights will have as many rows as the input's first dimension. Data type supported: S8.
[in]biasesBias tensor. Can be nullptr. Data type supported:Same as input.
[out]outputDestination tensor. Its shape should be equal to the output of a matrix multiplication between:
  • The output of im2col on the input and the (transposed) 2D weights, if the function is called after a Convolution Layer
  • The input tensor and the (transposed) 2D weights, if the function is called after another FullyConnected Layer. Data type supported: Same as input.
[in]fc_info(Optional) Fully connected layer additional info

Definition at line 85 of file NEFullyConnectedHybridLayer.cpp.

88{
89 ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
90
91 // Perform validate step
92 ARM_COMPUTE_ERROR_THROW_ON(NEFullyConnectedHybridLayer::validate(
93 input->info(), weights->info(), biases != nullptr ? biases->info() : nullptr, output->info(),
94 fc_info));
95
96 _are_weights_reshaped = fc_info.transpose_weights ? fc_info.are_weights_reshaped : true;
97 _accumulate_biases = false;
98 _is_prepared = fc_info.retain_internal_weights;
99 _original_weights = weights;
100
101 // Configure accumulate biases kernel for non quantized asymmetric types
102 if (biases != nullptr)
103 {
104 _accumulate_biases = true;
105
106 // Configure accumulate biases kernel
107 _accumulate_biases_kernel.configure(output, biases);
108 }
109
110 // With the Fully Connected layer we can have 4 different cases:
111 // 1) Convolution layer -> Fully Connected layer without batches
112 // 2) Fully Connected layer -> Fully Connected layer without batches
113 // 3) Convolution layer -> Fully Connected layer with batches
114 // 4) Fully Connected layer -> Fully Connected layer with batches
115
116 const ITensor *weights_to_use = weights;
117
118 // Check if we have a fully connected layer with batches
119 const bool is_batched_fc_layer = output->info()->dimension(1) > 1;
120 bool _is_fc_after_conv = false;
121 if (is_batched_fc_layer)
122 {
123 _is_fc_after_conv =
124 (TensorShape::num_max_dimensions >= 4) &&
125 (std::equal(input->info()->tensor_shape().cbegin() + 3, input->info()->tensor_shape().cend(),
126 output->info()->tensor_shape().cbegin() + 1));
127 }
128 else
129 {
130 _is_fc_after_conv = input->info()->num_dimensions() > 1 && input->info()->dimension(1) > 1;
131 }
132 ARM_COMPUTE_ERROR_ON_MSG(_is_fc_after_conv,
133 "NEFullyConnectedHybridLayer does not support after conv");
134 ARM_COMPUTE_UNUSED(_is_fc_after_conv);
135
136 // Reshape weights if needed
137 if (!_are_weights_reshaped)
138 {
139 // Reshape the weights
140 _reshape_weights_output.allocator()->init(
141 weights->info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(
142 compute_transposed_shape(*weights->info())));
143 _reshape_weights_function.configure(weights_to_use, &_reshape_weights_output);
144 weights_to_use = &_reshape_weights_output;
145 }
146
147 // Quantize input
148 _quantized_input.allocator()->init(
149 input->info()->clone()->set_is_resizable(true).reset_padding().set_data_type(
150 DataType::QASYMM8_SIGNED));
151 _scale_factor.allocator()->init(
152 TensorInfo(TensorShape{output->info()->dimension(1)}, 1, DataType::F32));
153 _quant_input_kernel.configure(input, &_quantized_input, &_scale_factor);
154
155 // GEMM
156 _gemmlowp_output.allocator()->init(
157 output->info()->clone()->set_is_resizable(true).reset_padding().set_data_type(DataType::S32));
158 configure_mm(&_quantized_input, weights_to_use, &_gemmlowp_output);
159
160 // Multiply scale
161 _multiply_scale_kernel.configure(&_gemmlowp_output, &_scale_factor, output,
162 weights->info()->quantization_info().uniform().scale);
163
164 _are_weights_reshaped = _are_weights_reshaped || fc_info.retain_internal_weights;
165
166 _quantized_input.allocator()->allocate();
167 _scale_factor.allocator()->allocate();
168 _gemmlowp_output.allocator()->allocate();
169}
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
void configure(const ITensor *input, const ITensor *scale_factor, ITensor *output, float multiplier=1.f)
void configure(const ITensor *input, ITensor *output, ITensor *scale_factor)
volatile const char info[]

References arm_compute::NEMultiplyScaleFactorKernel::configure(), arm_compute::NEQuantizationSymmetricKernel::configure(), arm_compute::NEGEMMMatrixAccumulateBiasesKernel::configure(), and validate().

◆ operator=() [1/2]

NEFullyConnectedHybridLayer & arm_compute::NEFullyConnectedHybridLayer::operator= ( const NEFullyConnectedHybridLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ operator=() [2/2]

NEFullyConnectedHybridLayer & arm_compute::NEFullyConnectedHybridLayer::operator= ( NEFullyConnectedHybridLayer &&  )
default

Default move assignment operator

References validate().

◆ prepare()

void NEFullyConnectedHybridLayer::prepare ( )
override

Definition at line 254 of file NEFullyConnectedHybridLayer.cpp.

255{
256 if (!_is_prepared)
257 {
258 ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
259
260 auto release_unused = [](Tensor *w) {
261 if (!w->is_used())
262 {
263 w->allocator()->free();
264 }
265 };
266
267 // Reshape of the weights (happens only once)
268 if (!_are_weights_reshaped)
269 {
270 // Run reshape weights kernel and mark weights as unused
271 _reshape_weights_output.allocator()->allocate();
272 _reshape_weights_function.run();
273
274 _are_weights_reshaped = true;
275 // We can not release _original_weights because it can be used in other nodes
276 }
277
278 // Prepare GEMM prepare and release unused weights
279 _mm_gemmlowp.prepare();
280
281 // Release reshaped weights if unused
282 release_unused(&_reshape_weights_output);
283
284 _is_prepared = true;
285 }
286}

Referenced by run().

◆ run()

void NEFullyConnectedHybridLayer::run ( )
override

Definition at line 232 of file NEFullyConnectedHybridLayer.cpp.

233{
234 prepare();
235
236 MemoryGroupResourceScope scope_mg(_memory_group);
237
238 // Quantize input
239 NEScheduler::get().schedule(&_quant_input_kernel, Window::DimY);
240
241 // Run matrix multiply
242 _mm_gemmlowp.run();
243
244 // Multiply scale factor
245 NEScheduler::get().schedule(&_multiply_scale_kernel, Window::DimY);
246
247 // Accumulate biases if provided
248 if (_accumulate_biases)
249 {
250 NEScheduler::get().schedule(&_accumulate_biases_kernel, Window::DimY);
251 }
252}

References prepare().

Referenced by package.infer.session::inference().

◆ validate()

Status NEFullyConnectedHybridLayer::validate ( const ITensorInfo *  input,
const ITensorInfo *  weights,
const ITensorInfo *  biases,
const ITensorInfo *  output,
FullyConnectedLayerInfo  fc_info = FullyConnectedLayerInfo() 
)
static

Static function to check if given info will lead to a valid configuration of NEFullyConnectedHybridLayer

Parameters
[in]inputSource tensor info. Data type supported: F16/F32.
[in]weightsWeights tensor info. The weights must be 2 dimensional. If this function is called after a Convolution Layer, the (transposed) weights will have as many rows as the product of the first 3 input's dimensions. If it is called after another FullyConnected Layer, the (transposed) weights will have as many rows as the input's first dimension. Data type supported: S8.
[in]biasesBias tensor info. Can be nullptr. Data type supported:Same as input.
[out]outputDestination tensor info. Its shape should be equal to the output of a matrix multiplication between:
  • The output of im2col on the input and the (transposed) 2D weights, if the function is called after a Convolution Layer
  • The input tensor and the (transposed) 2D weights, if the function is called after another FullyConnected Layer. Data type supported: Same as input.
[in]fc_info(Optional) Fully connected layer additional info
Returns
a status

Definition at line 171 of file NEFullyConnectedHybridLayer.cpp.

174{
175 ARM_COMPUTE_UNUSED(fc_info.retain_internal_weights);
176 ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(input, weights, output);
177 ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(input, 1, DataType::F16, DataType::F32);
178 ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(weights, 1, DataType::QASYMM8_SIGNED);
179 ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, output);
180 ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 2);
181 ARM_COMPUTE_RETURN_ERROR_ON(output->num_dimensions() > 2);
182
183 bool weights_reshaped = fc_info.transpose_weights ? fc_info.are_weights_reshaped : true;
184
185 const ITensorInfo &reshaped_weights =
186 TensorInfo(weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(
187 compute_transposed_shape(*weights)));
188
189 // Configure accumulate biases kernel for non quantized asymmetric types
190 if (biases != nullptr)
191 {
192 ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, biases);
193 ARM_COMPUTE_RETURN_ON_ERROR(NEGEMMMatrixAccumulateBiasesKernel::validate(output, biases));
194 }
195
196 // With the Fully Connected layer we can have 4 different cases:
197 // 1) Convolution layer -> Fully Connected layer without batches
198 // 2) Fully Connected layer -> Fully Connected layer without batches
199 // 3) Convolution layer -> Fully Connected layer with batches
200 // 4) Fully Connected layer -> Fully Connected layer with batches
201
202 const ITensorInfo *weights_to_use = weights;
203
204 if (!weights_reshaped)
205 {
206 // Validate reshape weights kernel
207 ARM_COMPUTE_RETURN_ON_ERROR(NETranspose::validate(weights_to_use, &reshaped_weights));
208 weights_to_use = &reshaped_weights;
209 }
210
211 // Fully Connected layer after a Fully Connected Layer without batches
212 ARM_COMPUTE_RETURN_ERROR_ON(input->dimension(0) != weights_to_use->dimension(1));
213
214 // Validate quantization kernel
215 const ITensorInfo &quantized_input = TensorInfo(
216 input->clone()->set_is_resizable(true).reset_padding().set_data_type(DataType::QASYMM8_SIGNED));
217 const ITensorInfo &scale_factor = TensorInfo(TensorShape{output->dimension(1)}, 1, DataType::F32);
218 ARM_COMPUTE_RETURN_ON_ERROR(
219 NEQuantizationSymmetricKernel::validate(input, &quantized_input, &scale_factor));
220
221 const ITensorInfo &gemmlowp_output = TensorInfo(
222 output->clone()->set_is_resizable(true).reset_padding().set_data_type(DataType::S32));
223 // Validate matrix multiply kernel
224 ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(quantized_input, *weights_to_use, gemmlowp_output));
225
226 ARM_COMPUTE_RETURN_ON_ERROR(NEMultiplyScaleFactorKernel::validate(
227 &gemmlowp_output, &scale_factor, output, weights->quantization_info().uniform().scale));
228
229 return Status{};
230}
static Status validate(const ITensorInfo *accum, const ITensorInfo *biases)
static Status validate(const ITensorInfo *input, const ITensorInfo *scale_factor, const ITensorInfo *output, float multiplier=1.f)
static Status validate(const ITensorInfo *input, const ITensorInfo *output, const ITensorInfo *scale_factor)

References arm_compute::NEGEMMMatrixAccumulateBiasesKernel::validate(), arm_compute::NEQuantizationSymmetricKernel::validate(), and arm_compute::NEMultiplyScaleFactorKernel::validate().

Referenced by configure(), and operator=().


The documentation for this class was generated from the following files: