Skip to main content

AI Entity Extract tool

The AI Entity Extract tool analyses a block's source text with a large language model and records two kinds of stand-off annotation: named entities (people, organizations, products, locations, and also dates, times, currencies, and measurements) and terminology candidates (domain-specific terms that would benefit from a termbase entry). Each entity carries a suggested do-not-translate flag; each term candidate carries a category and a translatability classification (do-not-translate, consistent, or free). It is read-only — it writes annotations only and never changes the source or target.

Extraction can optionally combine the LLM with a NER provider for fast entity detection; the LLM classification is preferred where the two overlap. Blocks can be analysed one at a time or grouped into batches sent in a single structured call, and batches can run concurrently. Known terms already in the termbase can be supplied so they are not re-proposed. A provider and, for hosted providers, credentials are required.

IDai-entity-extract
SourceBuilt-in
Categoryanalysis
Cardinalitymonolingual
Requirescredentials
Tagsai-powered

Parameters

ParameterTypeDefaultDescription
apiKeystringAPI key for the AI provider
batchConcurrencyinteger1Number of concurrent batch calls (0 or 1 = sequential)
batchSizeinteger1Number of blocks per LLM call (0 or 1 = one block per call)
enginestringllmllm (AI provider; default) / ner (local on-device model — nothing leaves the machine) / hybrid (both)
knownTermsstring[]Terms to exclude from extraction (already in termbase)
localestringLocale of the source content
modelstringAI model name
providerstringanthropicAI provider

Configure these parameters interactively and copy the flow-step YAML on the Tool Reference.

Examples

Extract entities and terms with Anthropic

Analyse source blocks one at a time with an Anthropic model.

provider: anthropic
locale: en-US

Batched extraction

Analyse blocks in batches of 20, four batches at a time.

provider: openai
batchSize: 20
batchConcurrency: 4

Processing notes

  • Operates on translatable blocks with non-empty source; other parts pass through unchanged.

  • Read-only — writes entity and term-candidate annotations and never modifies source or target.

  • When both an LLM and a NER provider produce an entity at the same span, the LLM classification is kept.

  • Dates, times, currencies, and measurements are not defaulted to do-not-translate, since they need locale-specific formatting.

Limitations

  • Requires a provider and, for hosted providers, valid credentials; hosted providers make billed, rate-limited network calls.

  • The NER provider is optional and supplied programmatically; with no NER provider, extraction is LLM-only.

  • Entity and term suggestions (including the do-not-translate flag and translatability) are model proposals and should be reviewed before acting on them.

← Back to the Tool Reference