|
ONE - On-device Neural Engine
|
#include <KVCache.h>
Public Member Functions | |
| size_t | element_size () const |
| NNFW_TYPE | to_nnfw_type () const |
| bool | is_valid () const |
| int64_t | pos () const |
| void | set_pos (int pos) |
| void | reset_pos () |
| void | advance_pos () |
| void | init (const ggma::GGMAConfig &cfg, int cache_size) |
| void | transpose (bool is_k_cache, const char *perm, size_t seq_len, size_t num_heads, size_t head_dim) |
| Transpose cache with "0213" permutation [0,2,1,3]. | |
Data Fields | |
| KVCacheDataType | data_type |
| std::vector< std::vector< uint8_t > > | k |
| std::vector< std::vector< uint8_t > > | v |
| int64_t | _pos = 0 |
|
inline |
|
inline |
Definition at line 51 of file KVCache.h.
References data_type, ggma::FLOAT32, and ggma::UINT8.
Referenced by init(), and transpose().
| void ggma::KVCache::init | ( | const ggma::GGMAConfig & | cfg, |
| int | cache_size | ||
| ) |
Definition at line 100 of file KVCache.cc.
References data_type, element_size(), ggma::ModelConfig::hidden_size, k, ggma::GGMAConfig::kv_cache_type, ggma::GGMAConfig::model, ggma::ModelConfig::n_layers, and v.
Referenced by ggma::Context::Context().
|
inline |
|
inline |
|
inline |
Definition at line 94 of file KVCache.h.
References _pos.
Referenced by ggma::Context::generate().
|
inline |
|
inline |
Definition at line 65 of file KVCache.h.
References data_type, ggma::FLOAT32, NNFW_TYPE_TENSOR_FLOAT32, NNFW_TYPE_TENSOR_UINT8, and ggma::UINT8.
Referenced by ggma::Context::prefill().
| void ggma::KVCache::transpose | ( | bool | is_k_cache, |
| const char * | perm, | ||
| size_t | seq_len, | ||
| size_t | num_heads, | ||
| size_t | head_dim | ||
| ) |
Transpose cache with "0213" permutation [0,2,1,3].
| is_k_cache | true for K cache, false for V cache |
| perm | Permutation string (must be "0213") |
| seq_len | Sequence length dimension |
| num_heads | Number of attention heads |
| head_dim | Head dimension |
Definition at line 68 of file KVCache.cc.
References element_size(), k, and v.
Referenced by ggma::Context::generate().
| int64_t ggma::KVCache::_pos = 0 |
Definition at line 48 of file KVCache.h.
Referenced by advance_pos(), pos(), reset_pos(), and set_pos().
| KVCacheDataType ggma::KVCache::data_type |
Definition at line 45 of file KVCache.h.
Referenced by element_size(), init(), and to_nnfw_type().
| std::vector<std::vector<uint8_t> > ggma::KVCache::k |
Definition at line 46 of file KVCache.h.
Referenced by init(), is_valid(), ggma::Context::prefill(), and transpose().
| std::vector<std::vector<uint8_t> > ggma::KVCache::v |
Definition at line 47 of file KVCache.h.
Referenced by init(), is_valid(), ggma::Context::prefill(), and transpose().