The Types table

A text corpus is a sequence of elements like words, numbers and punctuation, separated by whitespace. These elements are called tokens. Orthographically identical tokens can be seen as instances of the same type. dlexDB's Types table contains every type that occurs at least once in the corpus. Types are defined case-sensitively, i.e., singt, Singt and SINGT are three distinct types.

The Types table provides type-related information like type frequency, familiarity, regularity, initial letter/bigram/trigram frequencies, neighborhood measures and more.

If you are interested in the variants of these measures that have been computed case-insensitively, please refer to the downcased types table. If you are, on the other hand, interested in properties related to annotation and/or morphosyntactic analysis, please refer to the Annotated types table instead.

The Types table offers the following filters or variables for output: