|
ONE - On-device Neural Engine
|
Go to the source code of this file.
Functions | |
| GGMA_STATUS | ggma_generate (ggma_context *context, ggma_token *tokens, size_t n_tokens, size_t n_tokens_max, size_t *n_tokens_out) |
| Generates a sequence of tokens based on the provided prompt tokens. | |
| GGMA_STATUS ggma_generate | ( | struct ggma_context * | context, |
| ggma_token * | tokens, | ||
| size_t | n_tokens, | ||
| size_t | n_tokens_max, | ||
| size_t * | n_tokens_out | ||
| ) |
Generates a sequence of tokens based on the provided prompt tokens.
This function performs the core inference step, taking an initial sequence of prompt tokens and generating new tokens autoregressively.
| [in] | context | The GGMA context to use for generation. |
| [in,out] | tokens | An array of input prompt tokens. The generated tokens will be placed in this buffer |
| [in] | n_tokens | The number of tokens in the input tokens array. This also often specifies the maximum number of tokens to generate. |
| [in] | n_tokens_max | The maximum number of tokens that the tokens buffer can hold. |
| [out] | n_tokens_out | A pointer to a variable that will receive the number of element in the tokens after generation |
GGMA_STATUS_NO_ERROR on success, or an appropriate error code on failure. Definition at line 22 of file ggma_generate.cc.
References GGMA_RETURN_ERROR_IF_NULL.