Tools

Transliterate Text

Converts text from one script to another without translation, preserving the phonetic representation. For example, converts Arabic or Cyrillic script text to Latin script, or romanized Arabic text to standard Arabic characters.

Analyze Sentiment

Determines sentiment in text at the document level and optionally at the entity level. Returns sentiment labels (pos, neu, neg) with confidence scores. Entity-level analysis identifies entities and determines the sentiment expressed toward each one.

Analyze Morphology

Performs morphological analysis on text, returning part of speech tags, lemmas (dictionary form), compound components, and Han readings for each token. Can return all features at once or individual features.

Detect Language

Identifies the language of a given text. Supports detection across 55 languages. Can optionally use multilingual detection mode to identify language regions within the same document, useful when text contains multiple languages.

Translate Name

Translates a name from one language to another, using knowledge of language-specific naming conventions. Recognizes when to transliterate a name phonetically vs. translate meaning (e.g., titles). Supports 13 source languages to English, and translation between Chinese, Japanese, and Korean.

Compare Addresses

Compares two addresses to determine whether they refer to the same location. Accepts both structured addresses (with individual fields) and unstructured addresses (as plain text strings). Can mix structured and unstructured formats.

Extract Topics

Discovers central keywords and concepts from text. Unlike categorization or entity extraction, topic extraction is not constrained by a finite list. It identifies "keyphrases" (exact terms) and "concepts" (broader ideas) ranked by relative importance.

Compare Names

Compares two names and returns a similarity score between 0 and 1. Accounts for typographical errors, phonetic spelling variations, transliteration differences, initials, nicknames, and cross-language variations. Supports PERSON, LOCATION, and ORGANIZATION entity types.

Deduplicate Names

Identifies and groups duplicate names from a list, accounting for linguistic variations across languages and scripts. Useful for cleaning databases with duplicate records, merging contact lists, or consolidating name data.

Split Sentences

Splits text into individual sentences. Useful for preprocessing text before further NLP analysis, or for breaking large documents into sentence-level units.

Categorize Text

Classifies text content under topic categories drawn from the IAB Quality Assurance Guidelines Taxonomy (Tier 1 contextual categories). Useful for automatically tagging documents, web pages, or articles by topic.

Compare Records

Compares structured data records with multiple fields to determine similarity. Each record can contain up to 5 fields of types including name, address, date, number, boolean, and string. Fields can be individually weighted to control their impact on the final match score. Compares left records against right records pairwise.

Get Text Embedding

Generates a numerical vector representation (embedding) of text for semantic similarity computation. Transforms text ranging from a single word to an entire document into a vector in semantic space. Supports cross-lingual semantic comparison without translation.

Get Syntax Dependencies

Provides syntactic dependency parse trees showing grammatical relationships between tokens in sentences. Each token is annotated with its dependency role (e.g., subject, object, modifier) relative to its head word.

Extract Relationships

Extracts relationships between entities in text. Identifies the grammatical and semantic connections between two entities, recognizing the action or predicate that connects them. Uses a combination of deep learning and semantic rules.

Tokenize Text

Splits text into individual tokens (words, numbers, punctuation) using advanced statistical modeling. Particularly useful for languages like Chinese, Japanese, and Thai where word boundaries are not marked by spaces.

Extract Entities

Extracts named entities from text, identifying up to 18 entity types (PERSON, LOCATION, ORGANIZATION, PRODUCT, etc.) across 20 languages. Optionally links entities to Wikidata, DBpedia, or Refinitiv PermID knowledge bases for disambiguation. Can also calculate salience and confidence scores.