ONE - On-device Neural Engine
Loading...
Searching...
No Matches
luci::QuantizeWithMinMaxPass Class Reference

Pass to quantize activation, weights, and bias. More...

#include <QuantizeWithMinMaxPass.h>

Collaboration diagram for luci::QuantizeWithMinMaxPass:

Data Structures

struct  Context
 

Public Member Functions

 QuantizeWithMinMaxPass (std::unique_ptr< Context > &&ctx)
 
virtual const char * name (void) const
 
bool run (loco::Graph *graph)
 
- Public Member Functions inherited from logo::Pass
virtual ~Pass ()=default
 

Detailed Description

Pass to quantize activation, weights, and bias.

Definition at line 34 of file QuantizeWithMinMaxPass.h.

Constructor & Destructor Documentation

◆ QuantizeWithMinMaxPass()

luci::QuantizeWithMinMaxPass::QuantizeWithMinMaxPass ( std::unique_ptr< Context > &&  ctx)
inline

Definition at line 50 of file QuantizeWithMinMaxPass.h.

50 : _ctx{std::move(ctx)}
51 {
52 // DO NOTHING
53 }

Member Function Documentation

◆ name()

virtual const char * luci::QuantizeWithMinMaxPass::name ( void  ) const
inlinevirtual

Reimplemented from logo::Pass.

Definition at line 55 of file QuantizeWithMinMaxPass.h.

55{ return "luci::QuantizeWithMinMaxPass"; }

◆ run()

bool luci::QuantizeWithMinMaxPass::run ( loco::Graph g)
virtual

How QuantizeWithMinMax works?

We categorized tensors into four groups

  • Activation: Feature maps (both Const/Non-const)
  • Weights: Const tensors of specific Ops (Conv, FC, ...)
  • Bias: Const tensors of specific Ops (Conv, FC, ...)
  • Others: padding value, one_hot value, axis, ..

Activation is quantized in different ways

  1. For non-constant activation, quantize using recorded min/max
  2. For constant activation, quantize using min/max of its value
  3. For some Ops (ex: pad_v2), output qparam is used as input qparam (backward propagation)
  4. For some Ops (ex: reshape), input qparam is used as output qparam (forward propagation)
  5. For some Ops (ex: tanh), output qparam has pre-defined values

Weights is quantized using min/max of its value

Bias is quantized using input scale (s_i) and weights scale (s_w)

  • Therefore, activation and weights should be quantized earlier than bias

Overall Quantization Steps

  1. Quantize Activation
    • Quantize using recorded min/max (QuantizeActivation)
    • Insert Quantize Ops for mixed-precision quantization (InsertQuantizeOp)
    • Remove redundant Quantize Ops (RemoveRedundantQuantizePass)
    • Propagate qparam backward (PropagateQParamBackwardPass)
    • Quantize const inputs (QuantizeConstInputActivation)
    • Quantize using pre-defined values (QuantizeSpecialActivation)
    • Propagate qparam forward (PropagateQParamForwardPass)
  2. Quantize Weights
  3. Quantize Bias
  4. Set input dtype
  5. Set output dtype

Why quantization sequence was determined as above?

  • Activation and weights should be quantized before bias (1->2->3). Input/Output dtype is updated after all the other nodes are quantzied (4->5).
  • During activation quantization,
    • Backward propagation is performed earlier than forward propagation. This allows backward-propagated qparam to be overwritten during forward propagation. We made the decision because it's more common to propagate qparam forward (reshape, transpose) than backward (concat, pad_v2, ..).
    • QuantizeSpecialActivation is called before forward propagation to make sure that the pre-defined qparam values are propagated.

Implements logo::Pass.

Definition at line 561 of file QuantizeWithMinMaxPass.cpp.

562{
563 LOGGER(l);
564 INFO(l) << "QuantizeWithMinMaxPass Start" << std::endl;
565
566 auto info_by_name = layer_info_map(g, _ctx->layers_info);
567
568 auto quantize_dtype = [&](const luci::CircleNode *node) {
569 auto iter = info_by_name.find(node->name());
570
571 // Return designated quantization dtype
572 if (iter != info_by_name.end())
573 return iter->second.dtype;
574
575 // Return default quantization dtype
576 return _ctx->output_model_dtype;
577 };
578
579 auto quantize_granularity = [&](const luci::CircleNode *node) {
580 auto iter = info_by_name.find(node->name());
581
582 // Return designated quantization granularity
583 if (iter != info_by_name.end())
584 return iter->second.granularity;
585
586 // Return default quantization granularity
587 return _ctx->granularity;
588 };
589
590 // Quantize activation
591 // Why all_nodes?
592 // Models can have inactive (unused) inputs.
593 // We do not reject such models, but quantize them too
594 for (auto node : loco::all_nodes(g))
595 {
596 auto circle_node = loco::must_cast<luci::CircleNode *>(node);
597 QuantizeActivation qa(quantize_dtype(circle_node));
598 circle_node->accept(&qa);
599 }
600
601 // Insert Quantize Op
602 for (auto node : loco::active_nodes(loco::output_nodes(g)))
603 {
604 auto circle_node = loco::must_cast<luci::CircleNode *>(node);
605 auto op_dtype = quantize_dtype(circle_node);
606 if (op_dtype != _ctx->output_model_dtype)
607 {
608 InsertQuantizeOp iqo(_ctx->output_model_dtype, op_dtype);
609 circle_node->accept(&iqo);
610 }
611 }
612
613 // Remove redundant Quantize Op
614 {
615 logo::Phase phase;
616
617 phase.emplace_back(std::make_unique<luci::RemoveRedundantQuantizePass>());
618
621 phase_runner.attach(&prog);
622 phase_runner.run(phase);
623 }
624
625 // Backward propagation of activation qparam
626 {
627 PropagateQParamBackwardPass pqbp(_ctx->output_model_dtype);
628 pqbp.run(g);
629 }
630
631 // Quantize const input activation
632 for (auto node : loco::active_nodes(loco::output_nodes(g)))
633 {
634 auto circle_node = loco::must_cast<luci::CircleNode *>(node);
635 QuantizeConstInputActivation qcia(quantize_dtype(circle_node));
636 circle_node->accept(&qcia);
637 }
638
639 // Update qparam of output of special Ops
640 for (auto node : loco::active_nodes(loco::output_nodes(g)))
641 {
642 auto circle_node = loco::must_cast<luci::CircleNode *>(node);
643
644 // At this point, all activations have to be quantized.
645 // Un-quantized nodes are not the quantization target (ex: int32 tensor),
646 // so we skip them
647 if (circle_node->quantparam() == nullptr)
648 continue;
649
650 QuantizeSpecialActivation qsa(quantize_dtype(circle_node));
651 circle_node->accept(&qsa);
652 }
653
654 // Forward propagation of activation qparam
655 logo::Phase phase;
656
657 phase.emplace_back(std::make_unique<luci::PropagateQParamForwardPass>(_ctx->TF_style_maxpool));
658
661 phase_runner.attach(&prog);
662 phase_runner.run(phase);
663
664 // Quantize weights
665 for (auto node : loco::active_nodes(loco::output_nodes(g)))
666 {
667 auto circle_node = loco::must_cast<luci::CircleNode *>(node);
668 QuantizeWeights qw(_ctx->input_model_dtype, quantize_dtype(circle_node),
669 quantize_granularity(circle_node));
670 circle_node->accept(&qw);
671 }
672
673 // Quantize bias
674 for (auto node : loco::active_nodes(loco::output_nodes(g)))
675 {
676 auto circle_node = loco::must_cast<luci::CircleNode *>(node);
677 QuantizeBias qb(_ctx->input_model_dtype, quantize_dtype(circle_node),
678 quantize_granularity(circle_node));
679 circle_node->accept(&qb);
680 }
681
682 // Update output dtype
683 auto graph_outputs = g->outputs();
684 for (auto node : loco::output_nodes(g))
685 {
686 auto circle_node = loco::must_cast<luci::CircleOutput *>(node);
687 if (static_cast<luci::CircleNode *>(circle_node->from())->dtype() == _ctx->output_model_dtype)
688 {
689 circle_node->dtype(_ctx->output_model_dtype);
690 auto graph_output = graph_outputs->at(circle_node->index());
691 graph_output->dtype(_ctx->output_model_dtype);
692 }
693 }
694
695 // Set input type
696 set_input_type(g);
697
698 // Set output type
699 set_output_type(g);
700
701 // Remove redundant Quantize Op
702 {
703 logo::Phase phase;
704
705 phase.emplace_back(std::make_unique<luci::RemoveRedundantQuantizePass>());
706
709 phase_runner.attach(&prog);
710 phase_runner.run(phase);
711 }
712
713 if (not _ctx->save_min_max)
714 {
715 // Remove min/max values
716 for (auto node : loco::all_nodes(g))
717 {
718 auto circle_node = loco::must_cast<luci::CircleNode *>(node);
719 if (auto qparam = circle_node->quantparam())
720 {
721 warn_accuracy_with_range(circle_node);
722 qparam->min.clear();
723 qparam->max.clear();
724 }
725 }
726 }
727
728 INFO(l) << "QuantizeWithMinMaxPass End" << std::endl;
729 return false; // one time run
730}
#define LOGGER(name)
Definition Log.h:65
#define INFO(name)
Definition Log.h:68
std::set< Node * > all_nodes(Graph *)
Enumerate all the nodes in a given graph.
Definition Graph.cpp:59
std::set< loco::Node * > active_nodes(const std::vector< loco::Node * > &roots)
Enumerate all the nodes required to compute "roots".
std::vector< Node * > output_nodes(Graph *)
Definition Graph.cpp:101
std::vector< std::unique_ptr< Pass > > Phase
Definition Phase.h:31
void warn_accuracy_with_range(luci::CircleNode *n)
LayerInfoMap layer_info_map(loco::Graph *g, std::vector< LayerInfo > &layers_info)
Class to propagate quantization parameters of an operator's output to input.
Quantize non-const activation using recorded min/max values.
QuantizeBias quantizes tensors for bias.
Quantize non-const activaion using pre-defined scale/zp for special Ops.
QuantizeWeights quantizes tensors for weights.

References loco::active_nodes(), loco::all_nodes(), INFO, luci::layer_info_map(), LOGGER, luci::CircleNode::name(), loco::output_nodes(), luci::PropagateQParamBackwardPass::run(), logo::Saturate, and luci::warn_accuracy_with_range().

Referenced by package.infer.session::inference().


The documentation for this class was generated from the following files: