ONE - On-device Neural Engine
Loading...
Searching...
No Matches
arm_compute::CLFullyConnectedHybridLayer Class Reference

#include <CLFullyConnectedHybridLayer.h>

Collaboration diagram for arm_compute::CLFullyConnectedHybridLayer:

Public Member Functions

 CLFullyConnectedHybridLayer (std::shared_ptr< IMemoryManager > memory_manager=nullptr)
 
 CLFullyConnectedHybridLayer (const CLFullyConnectedHybridLayer &)=delete
 
 CLFullyConnectedHybridLayer (CLFullyConnectedHybridLayer &&)=default
 
CLFullyConnectedHybridLayeroperator= (const CLFullyConnectedHybridLayer &)=delete
 
CLFullyConnectedHybridLayeroperator= (CLFullyConnectedHybridLayer &&)=default
 
void configure (const ICLTensor *input, const ICLTensor *weights, const ICLTensor *biases, ICLTensor *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
 
void run () override
 
void prepare () override
 

Static Public Member Functions

static Status validate (const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
 

Detailed Description

Basic function to compute a Fully Connected layer on OpenCL. This function calls the following OpenCL kernels:

  1. CLIm2ColKernel (called when the input comes from a convolutional layer)
  2. CLTranspose (if are_weights_reshaped is set to false and transpose_weights is set to true ) (called once)
  3. CLGEMMLowpMatrixMultiplyCore (if quantized symmetric)
  4. CLGEMMMatrixAccumulateBiasesKernel (if biases is not equal to nullptr)
Note
The fully connected layer accepts "weights" tensors only with 2 dimensions.

Definition at line 69 of file CLFullyConnectedHybridLayer.h.

Constructor & Destructor Documentation

◆ CLFullyConnectedHybridLayer() [1/3]

CLFullyConnectedHybridLayer::CLFullyConnectedHybridLayer ( std::shared_ptr< IMemoryManager >  memory_manager = nullptr)

Constructor

Definition at line 68 of file CLFullyConnectedHybridLayer.cpp.

70 : _memory_group(memory_manager), _reshape_weights_kernel(), _quant_input_kernel(),
71 _mm_gemmlowp(memory_manager), _multiply_scale_kernel(), _accumulate_biases_kernel(),
72 _reshape_weights_output(), _quantized_input(), _scale_factor(), _gemmlowp_output(),
73 _are_weights_reshaped(true), _accumulate_biases(false), _is_prepared(false),
74 _original_weights(nullptr)
75{
76}

◆ CLFullyConnectedHybridLayer() [2/3]

arm_compute::CLFullyConnectedHybridLayer::CLFullyConnectedHybridLayer ( const CLFullyConnectedHybridLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ CLFullyConnectedHybridLayer() [3/3]

arm_compute::CLFullyConnectedHybridLayer::CLFullyConnectedHybridLayer ( CLFullyConnectedHybridLayer &&  )
default

Default move constructor

Member Function Documentation

◆ configure()

void CLFullyConnectedHybridLayer::configure ( const ICLTensor *  input,
const ICLTensor *  weights,
const ICLTensor *  biases,
ICLTensor *  output,
FullyConnectedLayerInfo  fc_info = FullyConnectedLayerInfo() 
)

Set the input and output tensors.

Parameters
[in]inputSource tensor. Data type supported: F16/F32.
[in]weightsWeights tensor. The weights must be 2 dimensional. If this function is called after a Convolution Layer, the (transposed) weights will have as many rows as the product of the first 3 input's dimensions. If it is called after another FullyConnected Layer, the (transposed) weights will have as many rows as the input's first dimension. Data type supported: S8.
[in]biasesBias tensor. Can be nullptr. Data type supported:Same as input.
[out]outputDestination tensor. Its shape should be equal to the output of a matrix multiplication between:
  • The output of im2col on the input and the (transposed) 2D weights, if the function is called after a Convolution Layer
  • The input tensor and the (transposed) 2D weights, if the function is called after another FullyConnected Layer. Data type supported: Same as input.
[in]fc_info(Optional) Fully connected layer additional info

Definition at line 88 of file CLFullyConnectedHybridLayer.cpp.

91{
92 ARM_COMPUTE_ERROR_ON_NULLPTR(input, weights, output);
93
94 // Perform validate step
95 ARM_COMPUTE_ERROR_THROW_ON(CLFullyConnectedHybridLayer::validate(
96 input->info(), weights->info(), biases != nullptr ? biases->info() : nullptr, output->info(),
97 fc_info));
98
99 _are_weights_reshaped = fc_info.transpose_weights ? fc_info.are_weights_reshaped : true;
100 _accumulate_biases = false;
101 _is_prepared = fc_info.retain_internal_weights;
102 _original_weights = weights;
103
104 // Configure accumulate biases kernel for non quantized asymmetric types
105 if (biases != nullptr)
106 {
107 ARM_COMPUTE_ERROR_ON_MISMATCHING_DATA_TYPES(input, biases);
108
109 _accumulate_biases = true;
110
111 // Configure accumulate biases kernel
112 _accumulate_biases_kernel.set_target(CLScheduler::get().target());
113 _accumulate_biases_kernel.configure(output, biases);
114 }
115
116 const ICLTensor *weights_to_use = weights;
117
118 // With the Fully Connected layer we can have 4 different cases:
119 // 1) Convolution layer -> Fully Connected layer without batches
120 // 2) Fully Connected layer -> Fully Connected layer without batches
121 // 3) Convolution layer -> Fully Connected layer with batches
122 // 4) Fully Connected layer -> Fully Connected layer with batches
123
124 // Check if we have a fully connected layer with batches
125 const bool is_batched_fc_layer = output->info()->dimension(1) > 1;
126 bool is_fc_after_conv = false;
127 if (is_batched_fc_layer)
128 {
129 is_fc_after_conv =
130 (TensorShape::num_max_dimensions >= 4) &&
131 (std::equal(input->info()->tensor_shape().cbegin() + 3, input->info()->tensor_shape().cend(),
132 output->info()->tensor_shape().cbegin() + 1));
133 }
134 else
135 {
136 is_fc_after_conv = input->info()->num_dimensions() > 1 && input->info()->dimension(1) > 1;
137 }
138 ARM_COMPUTE_ERROR_ON_MSG(is_fc_after_conv,
139 "CLFullyConnectedHybridLayer does not support after conv");
140 ARM_COMPUTE_UNUSED(is_fc_after_conv);
141
142 // Reshape weights if needed
143 if (!_are_weights_reshaped)
144 {
145 // Reshape the weights
146 _reshape_weights_output.allocator()->init(
147 weights->info()->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(
148 compute_transposed_shape(*weights->info())));
149 _reshape_weights_kernel.configure(weights_to_use, &_reshape_weights_output);
150 weights_to_use = &_reshape_weights_output;
151 }
152
153 // Extract scale factor
154 _scale_factor.allocator()->init(
155 TensorInfo(TensorShape{output->info()->dimension(1)}, 1, input->info()->data_type()));
156 _memory_group.manage(&_scale_factor);
157 _scale_factor_kernel.configure(input, &_scale_factor);
158
159 // Quantize input
160 _quantized_input.allocator()->init(
161 input->info()->clone()->set_is_resizable(true).reset_padding().set_data_type(
162 DataType::QASYMM8_SIGNED));
163 _memory_group.manage(&_quantized_input);
164 _quant_input_kernel.configure(input, &_scale_factor, &_quantized_input);
165
166 // GEMMLowp
167 _gemmlowp_output.allocator()->init(
168 output->info()->clone()->set_is_resizable(true).reset_padding().set_data_type(DataType::S32));
169 _memory_group.manage(&_gemmlowp_output);
170 configure_mm(&_quantized_input, weights_to_use, &_gemmlowp_output,
171 fc_info.retain_internal_weights);
172 _quantized_input.allocator()->allocate();
173
174 // Multiply scale
175 _multiply_scale_kernel.configure(&_gemmlowp_output, &_scale_factor, output,
176 weights->info()->quantization_info().uniform().scale);
177 _gemmlowp_output.allocator()->allocate();
178 _scale_factor.allocator()->allocate();
179
180 _are_weights_reshaped = _are_weights_reshaped || fc_info.retain_internal_weights;
181}
static Status validate(const ITensorInfo *input, const ITensorInfo *weights, const ITensorInfo *biases, const ITensorInfo *output, FullyConnectedLayerInfo fc_info=FullyConnectedLayerInfo())
void configure(ICLTensor *accum, const ICLTensor *biases)
void configure(const ICLTensor *input, const ICLTensor *scale_factor, ICLTensor *output, float multiplier=1.f)
void configure(const ICLTensor *input, const ICLTensor *scale_factor, ICLTensor *output)
void configure(const ICLTensor *input, ICLTensor *output)
volatile const char info[]
Option< std::string > target(optname("--target"), overview("select target language to emit for given architecture." "Valid values are '" NNC_TARGET_ARM_CPP "', '" NNC_TARGET_X86_CPP "', '" NNC_TARGET_ARM_GPU_CPP "', '" NNC_TARGET_INTERPRETER "'"), std::string(), optional(false), optvalues(NNC_TARGET_ARM_CPP "," NNC_TARGET_X86_CPP "," NNC_TARGET_ARM_GPU_CPP "," NNC_TARGET_INTERPRETER), nullptr, separators("="))
Definition Options.h:47

References arm_compute::CLQuantizationSymmetricKernel::configure(), arm_compute::CLMultiplyScaleFactorKernel::configure(), arm_compute::CLScaleFactorSymm8Kernel::configure(), arm_compute::CLGEMMMatrixAccumulateBiasesKernel::configure(), and validate().

◆ operator=() [1/2]

CLFullyConnectedHybridLayer & arm_compute::CLFullyConnectedHybridLayer::operator= ( CLFullyConnectedHybridLayer &&  )
default

Default move assignment operator

References validate().

◆ operator=() [2/2]

CLFullyConnectedHybridLayer & arm_compute::CLFullyConnectedHybridLayer::operator= ( const CLFullyConnectedHybridLayer )
delete

Prevent instances of this class from being copied (As this class contains pointers)

◆ prepare()

void CLFullyConnectedHybridLayer::prepare ( )
override

Definition at line 290 of file CLFullyConnectedHybridLayer.cpp.

291{
292 if (!_is_prepared)
293 {
294 ARM_COMPUTE_ERROR_ON(!_original_weights->is_used());
295
296 auto release_unused = [](CLTensor *w) {
297 if (!w->is_used())
298 {
299 CLScheduler::get().queue().finish();
300 w->allocator()->free();
301 }
302 };
303
304 // Reshape of the weights if needed (happens only once)
305 if (!_are_weights_reshaped)
306 {
307 // Run reshape weights kernel and mark weights as unused
308 _reshape_weights_output.allocator()->allocate();
309 _reshape_weights_kernel.run();
310
311 _are_weights_reshaped = true;
312 // We can not release _original_weights because it can be used in other nodes
313 }
314
315 // Prepare GEMM prepare and release unused weights
316 _mm_gemmlowp.prepare();
317
318 // Release reshaped weights if unused
319 release_unused(&_reshape_weights_output);
320
321 _is_prepared = true;
322 }
323}

Referenced by run().

◆ run()

void CLFullyConnectedHybridLayer::run ( )
override

Definition at line 265 of file CLFullyConnectedHybridLayer.cpp.

266{
267 prepare();
268
269 MemoryGroupResourceScope scope_mg(_memory_group);
270
271 // Extract scale_factor
272 CLScheduler::get().enqueue(_scale_factor_kernel);
273
274 // Quantize input
275 CLScheduler::get().enqueue(_quant_input_kernel);
276
277 // Run matrix multiply
278 _mm_gemmlowp.run();
279
280 // Multiply scale factor
281 CLScheduler::get().enqueue(_multiply_scale_kernel);
282
283 // Accumulate biases if provided
284 if (_accumulate_biases)
285 {
286 CLScheduler::get().enqueue(_accumulate_biases_kernel);
287 }
288}

References prepare().

Referenced by package.infer.session::inference().

◆ validate()

Status CLFullyConnectedHybridLayer::validate ( const ITensorInfo *  input,
const ITensorInfo *  weights,
const ITensorInfo *  biases,
const ITensorInfo *  output,
FullyConnectedLayerInfo  fc_info = FullyConnectedLayerInfo() 
)
static

Static function to check if given info will lead to a valid configuration of CLFullyConnectedHybridLayer

Parameters
[in]inputSource tensor info. Data type supported: F16/F32.
[in]weightsWeights tensor info. The weights must be 2 dimensional. If this function is called after a Convolution Layer, the (transposed) weights will have as many rows as the product of the first 3 input's dimensions. If it is called after another FullyConnected Layer, the (transposed) weights will have as many rows as the input's first dimension. Data type supported: S8.
[in]biasesBias tensor info. Can be nullptr. Data type supported:Same as input.
[out]outputDestination tensor info. Its shape should be equal to the output of a matrix multiplication between:
  • The output of im2col on the input and the (transposed) 2D weights, if the function is called after a Convolution Layer
  • The input tensor and the (transposed) 2D weights, if the function is called after another FullyConnected Layer. Data type supported: Same as input.
[in]fc_info(Optional) Fully connected layer additional info
Returns
a status

Definition at line 183 of file CLFullyConnectedHybridLayer.cpp.

186{
187 ARM_COMPUTE_RETURN_ERROR_ON_NULLPTR(input, weights, output);
188 ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(input, 1, DataType::F16, DataType::F32);
189 ARM_COMPUTE_RETURN_ERROR_ON_DATA_TYPE_CHANNEL_NOT_IN(weights, 1, DataType::QASYMM8_SIGNED);
190 ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, output);
191 ARM_COMPUTE_RETURN_ERROR_ON(weights->num_dimensions() > 2);
192
193 bool weights_reshaped = fc_info.transpose_weights ? fc_info.are_weights_reshaped : true;
194 bool is_fc_after_conv = true;
195 const GPUTarget gpu_target = CLScheduler::get().target();
196
197 const ITensorInfo &reshaped_weights =
198 TensorInfo(weights->clone()->set_is_resizable(true).reset_padding().set_tensor_shape(
199 compute_transposed_shape(*weights)));
200
201 // Configure accumulate biases kernel for non quantized asymmetric types
202 if (biases != nullptr)
203 {
204 ARM_COMPUTE_RETURN_ERROR_ON_MISMATCHING_DATA_TYPES(input, biases);
205 ARM_COMPUTE_RETURN_ON_ERROR(
206 CLGEMMMatrixAccumulateBiasesKernel::validate(output, biases, gpu_target));
207 }
208
209 // With the Fully Connected layer we can have 4 different cases:
210 // 1) Convolution layer -> Fully Connected layer without batches
211 // 2) Fully Connected layer -> Fully Connected layer without batches
212 // 3) Convolution layer -> Fully Connected layer with batches
213 // 4) Fully Connected layer -> Fully Connected layer with batches
214
215 const ITensorInfo *weights_to_use = weights;
216
217 // Check if we have a fully connected layer with batches
218 const bool is_batched_fc_layer = output->dimension(1) > 1;
219 if (is_batched_fc_layer)
220 {
221 is_fc_after_conv = (TensorShape::num_max_dimensions >= 4) &&
222 (std::equal(input->tensor_shape().cbegin() + 3, input->tensor_shape().cend(),
223 output->tensor_shape().cbegin() + 1));
224 }
225 else
226 {
227 is_fc_after_conv = input->num_dimensions() > 1 && input->dimension(1) > 1;
228 }
229 ARM_COMPUTE_RETURN_ERROR_ON_MSG(is_fc_after_conv,
230 "CLFullyConnectedHybridLayer does not support after conv");
231
232 if (!weights_reshaped)
233 {
234 // Validate reshape weights kernel
235 ARM_COMPUTE_RETURN_ON_ERROR(CLTranspose::validate(weights_to_use, &reshaped_weights));
236 weights_to_use = &reshaped_weights;
237 }
238
239 // Validate Scale factor kernel
240 const ITensorInfo &scale_factor =
241 TensorInfo(TensorShape{output->dimension(1)}, 1, input->data_type());
242 ARM_COMPUTE_RETURN_ON_ERROR(CLScaleFactorSymm8Kernel::validate(input, &scale_factor));
243
244 // Validate quantization symm8 kernel
245 const ITensorInfo &quantized_input = TensorInfo(
246 input->clone()->set_is_resizable(true).reset_padding().set_data_type(DataType::QASYMM8_SIGNED));
247 ARM_COMPUTE_RETURN_ON_ERROR(
248 CLQuantizationSymmetricKernel::validate(input, &scale_factor, &quantized_input));
249
250 // Fully Connected layer after a Fully Connected Layer without batches
251 ARM_COMPUTE_RETURN_ERROR_ON(input->dimension(0) != weights_to_use->dimension(1));
252
253 // Validate matrix multiply kernel
254 const ITensorInfo &gemmlowp_output = TensorInfo(
255 output->clone()->set_is_resizable(true).reset_padding().set_data_type(DataType::S32));
256 ARM_COMPUTE_RETURN_ON_ERROR(validate_mm(quantized_input, *weights_to_use, gemmlowp_output));
257
258 // Multiply scale
259 ARM_COMPUTE_RETURN_ON_ERROR(
260 CLMultiplyScaleFactorKernel::validate(&gemmlowp_output, &scale_factor, output));
261
262 return Status{};
263}
static Status validate(const ITensorInfo *accum, const ITensorInfo *biases, GPUTarget gpu_target)
static Status validate(const ITensorInfo *input, const ITensorInfo *scale_factor, const ITensorInfo *output)
static Status validate(const ITensorInfo *input, const ITensorInfo *scale_factor, const ITensorInfo *output)
static Status validate(const ITensorInfo *input, const ITensorInfo *output)

References arm_compute::CLGEMMMatrixAccumulateBiasesKernel::validate(), arm_compute::CLScaleFactorSymm8Kernel::validate(), arm_compute::CLMultiplyScaleFactorKernel::validate(), and arm_compute::CLQuantizationSymmetricKernel::validate().

Referenced by configure(), and operator=().


The documentation for this class was generated from the following files: