How QuantizeWithMinMax works?
We categorized tensors into four groups
- Activation: Feature maps (both Const/Non-const)
- Weights: Const tensors of specific Ops (Conv, FC, ...)
- Bias: Const tensors of specific Ops (Conv, FC, ...)
- Others: padding value, one_hot value, axis, ..
Activation is quantized in different ways
- For non-constant activation, quantize using recorded min/max
- For constant activation, quantize using min/max of its value
- For some Ops (ex: pad_v2), output qparam is used as input qparam (backward propagation)
- For some Ops (ex: reshape), input qparam is used as output qparam (forward propagation)
- For some Ops (ex: tanh), output qparam has pre-defined values
Weights is quantized using min/max of its value
Bias is quantized using input scale (s_i) and weights scale (s_w)
- Therefore, activation and weights should be quantized earlier than bias
Overall Quantization Steps
- Quantize Activation
- Quantize using recorded min/max (QuantizeActivation)
- Insert Quantize Ops for mixed-precision quantization (InsertQuantizeOp)
- Remove redundant Quantize Ops (RemoveRedundantQuantizePass)
- Propagate qparam backward (PropagateQParamBackwardPass)
- Quantize const inputs (QuantizeConstInputActivation)
- Quantize using pre-defined values (QuantizeSpecialActivation)
- Propagate qparam forward (PropagateQParamForwardPass)
- Quantize Weights
- Quantize Bias
- Set input dtype
- Set output dtype
Why quantization sequence was determined as above?
- Activation and weights should be quantized before bias (1->2->3). Input/Output dtype is updated after all the other nodes are quantzied (4->5).
- During activation quantization,
- Backward propagation is performed earlier than forward propagation. This allows backward-propagated qparam to be overwritten during forward propagation. We made the decision because it's more common to propagate qparam forward (reshape, transpose) than backward (concat, pad_v2, ..).
- QuantizeSpecialActivation is called before forward propagation to make sure that the pre-defined qparam values are propagated.
Implements logo::Pass.
Definition at line 561 of file QuantizeWithMinMaxPass.cpp.
562{
564 INFO(l) <<
"QuantizeWithMinMaxPass Start" << std::endl;
565
567
570
571
573 return iter->second.dtype;
574
575
576 return _ctx->output_model_dtype;
577 };
578
581
582
584 return iter->second.granularity;
585
586
587 return _ctx->granularity;
588 };
589
590
591
592
593
595 {
599 }
600
601
603 {
606 if (
op_dtype != _ctx->output_model_dtype)
607 {
608 InsertQuantizeOp
iqo(_ctx->output_model_dtype,
op_dtype);
610 }
611 }
612
613
614 {
616
617 phase.emplace_back(std::make_unique<luci::RemoveRedundantQuantizePass>());
618
623 }
624
625
626 {
629 }
630
631
633 {
637 }
638
639
641 {
643
644
645
646
648 continue;
649
652 }
653
654
656
657 phase.emplace_back(std::make_unique<luci::PropagateQParamForwardPass>(_ctx->TF_style_maxpool));
658
663
664
666 {
671 }
672
673
675 {
680 }
681
682
685 {
688 {
692 }
693 }
694
695
696 set_input_type(g);
697
698
699 set_output_type(g);
700
701
702 {
704
705 phase.emplace_back(std::make_unique<luci::RemoveRedundantQuantizePass>());
706
711 }
712
713 if (
not _ctx->save_min_max)
714 {
715
717 {
720 {
724 }
725 }
726 }
727
728 INFO(l) <<
"QuantizeWithMinMaxPass End" << std::endl;
729 return false;
730}
std::set< Node * > all_nodes(Graph *)
Enumerate all the nodes in a given graph.
std::set< loco::Node * > active_nodes(const std::vector< loco::Node * > &roots)
Enumerate all the nodes required to compute "roots".
T must_cast(FeatureEncoder *node)
A helper dynamic_cast that throws when failed.
std::vector< Node * > output_nodes(Graph *)
std::vector< std::unique_ptr< Pass > > Phase
void warn_accuracy_with_range(luci::CircleNode *n)
LayerInfoMap layer_info_map(loco::Graph *g, std::vector< LayerInfo > &layers_info)
T must_cast(loco::Node *node)
Class to propagate quantization parameters of an operator's output to input.
Quantize non-const activation using recorded min/max values.
QuantizeBias quantizes tensors for bias.
Quantize non-const activaion using pre-defined scale/zp for special Ops.
QuantizeWeights quantizes tensors for weights.
References loco::active_nodes(), loco::all_nodes(), INFO, luci::layer_info_map(), LOGGER, loco::must_cast(), luci::must_cast(), luci::CircleNode::name(), loco::output_nodes(), logo::Saturate, and luci::warn_accuracy_with_range().