Pass to quantize activation, weights, and bias. More...

#include <QuantizeWithMinMaxPass.h>

Collaboration diagram for luci::QuantizeWithMinMaxPass:

Data Structures
struct	Context

Public Member Functions
	QuantizeWithMinMaxPass (std::unique_ptr< Context > &&ctx)

virtual const char *	name (void) const

bool	run (loco::Graph *graph)

Public Member Functions inherited from logo::Pass
virtual	~Pass ()=default

Detailed Description

Pass to quantize activation, weights, and bias.

Definition at line 34 of file QuantizeWithMinMaxPass.h.

Constructor & Destructor Documentation

◆ QuantizeWithMinMaxPass()

luci::QuantizeWithMinMaxPass::QuantizeWithMinMaxPass ( std::unique_ptr< Context > && ctx )

inline

Definition at line 50 of file QuantizeWithMinMaxPass.h.

                                                       : _ctx{std::move(ctx)}
  {
    // DO NOTHING
  }

Member Function Documentation

◆ name()

virtual const char * luci::QuantizeWithMinMaxPass::name ( void ) const

inlinevirtual

Reimplemented from logo::Pass.

Definition at line 55 of file QuantizeWithMinMaxPass.h.

55{ return "luci::QuantizeWithMinMaxPass"; }

◆ run()

bool luci::QuantizeWithMinMaxPass::run ( loco::Graph * g )

virtual

How QuantizeWithMinMax works?

We categorized tensors into four groups

Activation: Feature maps (both Const/Non-const)
Weights: Const tensors of specific Ops (Conv, FC, ...)
Bias: Const tensors of specific Ops (Conv, FC, ...)
Others: padding value, one_hot value, axis, ..

Activation is quantized in different ways

For non-constant activation, quantize using recorded min/max
For constant activation, quantize using min/max of its value
For some Ops (ex: pad_v2), output qparam is used as input qparam (backward propagation)
For some Ops (ex: reshape), input qparam is used as output qparam (forward propagation)
For some Ops (ex: tanh), output qparam has pre-defined values

Weights is quantized using min/max of its value

Bias is quantized using input scale (s_i) and weights scale (s_w)

Therefore, activation and weights should be quantized earlier than bias

Overall Quantization Steps

Quantize Activation
- Quantize using recorded min/max (QuantizeActivation)
- Insert Quantize Ops for mixed-precision quantization (InsertQuantizeOp)
- Remove redundant Quantize Ops (RemoveRedundantQuantizePass)
- Propagate qparam backward (PropagateQParamBackwardPass)
- Quantize const inputs (QuantizeConstInputActivation)
- Quantize using pre-defined values (QuantizeSpecialActivation)
- Propagate qparam forward (PropagateQParamForwardPass)
Quantize Weights
Quantize Bias
Set input dtype
Set output dtype

Why quantization sequence was determined as above?

Activation and weights should be quantized before bias (1->2->3). Input/Output dtype is updated after all the other nodes are quantzied (4->5).
During activation quantization,
- Backward propagation is performed earlier than forward propagation. This allows backward-propagated qparam to be overwritten during forward propagation. We made the decision because it's more common to propagate qparam forward (reshape, transpose) than backward (concat, pad_v2, ..).
- QuantizeSpecialActivation is called before forward propagation to make sure that the pre-defined qparam values are propagated.

Implements logo::Pass.

Definition at line 561 of file QuantizeWithMinMaxPass.cpp.

{
  LOGGER(l);
  INFO(l) << "QuantizeWithMinMaxPass Start" << std::endl;
 
  auto info_by_name = layer_info_map(g, _ctx->layers_info);
 
  auto quantize_dtype = [&](const luci::CircleNode *node) {
    auto iter = info_by_name.find(node->name());
 
    // Return designated quantization dtype
    if (iter != info_by_name.end())
      return iter->second.dtype;
 
    // Return default quantization dtype
    return _ctx->output_model_dtype;
  };
 
  auto quantize_granularity = [&](const luci::CircleNode *node) {
    auto iter = info_by_name.find(node->name());
 
    // Return designated quantization granularity
    if (iter != info_by_name.end())
      return iter->second.granularity;
 
    // Return default quantization granularity
    return _ctx->granularity;
  };
 
  // Quantize activation
  // Why all_nodes?
  // Models can have inactive (unused) inputs.
  // We do not reject such models, but quantize them too
  for (auto node : loco::all_nodes(g))
  {
    auto circle_node = loco::must_cast<luci::CircleNode *>(node);
    QuantizeActivation qa(quantize_dtype(circle_node));
    circle_node->accept(&qa);
  }
 
  // Insert Quantize Op
  for (auto node : loco::active_nodes(loco::output_nodes(g)))
  {
    auto circle_node = loco::must_cast<luci::CircleNode *>(node);
    auto op_dtype = quantize_dtype(circle_node);
    if (op_dtype != _ctx->output_model_dtype)
    {
      InsertQuantizeOp iqo(_ctx->output_model_dtype, op_dtype);
      circle_node->accept(&iqo);
    }
  }
 
  // Remove redundant Quantize Op
  {
    logo::Phase phase;
 
    phase.emplace_back(std::make_unique<luci::RemoveRedundantQuantizePass>());
 
    ProgressReporter prog(g, logo::PhaseStrategy::Saturate);
    logo::PhaseRunner<logo::PhaseStrategy::Saturate> phase_runner{g};
    phase_runner.attach(&prog);
    phase_runner.run(phase);
  }
 
  // Backward propagation of activation qparam
  {
    PropagateQParamBackwardPass pqbp(_ctx->output_model_dtype);
    pqbp.run(g);
  }
 
  // Quantize const input activation
  for (auto node : loco::active_nodes(loco::output_nodes(g)))
  {
    auto circle_node = loco::must_cast<luci::CircleNode *>(node);
    QuantizeConstInputActivation qcia(quantize_dtype(circle_node));
    circle_node->accept(&qcia);
  }
 
  // Update qparam of output of special Ops
  for (auto node : loco::active_nodes(loco::output_nodes(g)))
  {
    auto circle_node = loco::must_cast<luci::CircleNode *>(node);
 
    // At this point, all activations have to be quantized.
    // Un-quantized nodes are not the quantization target (ex: int32 tensor),
    // so we skip them
    if (circle_node->quantparam() == nullptr)
      continue;
 
    QuantizeSpecialActivation qsa(quantize_dtype(circle_node));
    circle_node->accept(&qsa);
  }
 
  // Forward propagation of activation qparam
  logo::Phase phase;
 
  phase.emplace_back(std::make_unique<luci::PropagateQParamForwardPass>(_ctx->TF_style_maxpool));
 
  ProgressReporter prog(g, logo::PhaseStrategy::Saturate);
  logo::PhaseRunner<logo::PhaseStrategy::Saturate> phase_runner{g};
  phase_runner.attach(&prog);
  phase_runner.run(phase);
 
  // Quantize weights
  for (auto node : loco::active_nodes(loco::output_nodes(g)))
  {
    auto circle_node = loco::must_cast<luci::CircleNode *>(node);
    QuantizeWeights qw(_ctx->input_model_dtype, quantize_dtype(circle_node),
                       quantize_granularity(circle_node));
    circle_node->accept(&qw);
  }
 
  // Quantize bias
  for (auto node : loco::active_nodes(loco::output_nodes(g)))
  {
    auto circle_node = loco::must_cast<luci::CircleNode *>(node);
    QuantizeBias qb(_ctx->input_model_dtype, quantize_dtype(circle_node),
                    quantize_granularity(circle_node));
    circle_node->accept(&qb);
  }
 
  // Update output dtype
  auto graph_outputs = g->outputs();
  for (auto node : loco::output_nodes(g))
  {
    auto circle_node = loco::must_cast<luci::CircleOutput *>(node);
    if (static_cast<luci::CircleNode *>(circle_node->from())->dtype() == _ctx->output_model_dtype)
    {
      circle_node->dtype(_ctx->output_model_dtype);
      auto graph_output = graph_outputs->at(circle_node->index());
      graph_output->dtype(_ctx->output_model_dtype);
    }
  }
 
  // Set input type
  set_input_type(g);
 
  // Set output type
  set_output_type(g);
 
  // Remove redundant Quantize Op
  {
    logo::Phase phase;
 
    phase.emplace_back(std::make_unique<luci::RemoveRedundantQuantizePass>());
 
    ProgressReporter prog(g, logo::PhaseStrategy::Saturate);
    logo::PhaseRunner<logo::PhaseStrategy::Saturate> phase_runner{g};
    phase_runner.attach(&prog);
    phase_runner.run(phase);
  }
 
  if (not _ctx->save_min_max)
  {
    // Remove min/max values
    for (auto node : loco::all_nodes(g))
    {
      auto circle_node = loco::must_cast<luci::CircleNode *>(node);
      if (auto qparam = circle_node->quantparam())
      {
        warn_accuracy_with_range(circle_node);
        qparam->min.clear();
        qparam->max.clear();
      }
    }
  }
 
  INFO(l) << "QuantizeWithMinMaxPass End" << std::endl;
  return false; // one time run
}

References loco::active_nodes(), loco::all_nodes(), INFO, luci::layer_info_map(), LOGGER, luci::CircleNode::name(), loco::output_nodes(), luci::PropagateQParamBackwardPass::run(), logo::Saturate, and luci::warn_accuracy_with_range().

The documentation for this class was generated from the following files:

compiler/luci/pass/include/luci/Pass/QuantizeWithMinMaxPass.h
compiler/luci/pass/src/QuantizeWithMinMaxPass.cpp