The Types table
A text corpus is a sequence of elements like words, numbers and punctuation, separated by whitespace. These elements are called tokens. Orthographically identical tokens can be seen as instances of the same type. dlexDB's Types table contains every type that occurs at least once in the corpus. Types are defined case-sensitively, i.e., singt, Singt and SINGT are three distinct types.
The Types table provides type-related information like type frequency, familiarity, regularity, initial letter/bigram/trigram frequencies, neighborhood measures and more.
If you are interested in the variants of these measures that have been computed case-insensitively, please refer to the downcased types table. If you are, on the other hand, interested in properties related to annotation and/or morphosyntactic analysis, please refer to the Annotated types table instead.
The Types table offers the following filters or variables for output:
- Surface filters
- Frequency filters
- Numerical filters
- Familiarity
- Regularity
- Document frequency
- Sentence frequency
- Cumulative syllable corpus frequency
- Cumulative syllable lexicon frequency
- Cumulative character corpus frequency
- Cumulative character lexicon frequency
- Cumulative character bigram corpus frequency
- Cumulative character bigram lexicon frequency
- Cumulative character trigram corpus frequency
- Cumulative character trigram lexicon frequency
- Initial letter
- Initial bigram
- Initial trigram
- Uniqueness point (orth.) prefix length
- Uniqueness point (orth.) neg. offs.
- Uniqueness point (lemma) prefix length
- Uniqueness point (lemma) neg. offs.
- Avg. cond. prob., in bigrams
- Avg. inf. cont., in bigrams
- Avg. cond. prob., in trigrams
- Avg. inf. cont., in trigrams
- Neighborhood measures
- Neighbors Coltheart higher freq., cum. freq.
- Neighbors Coltheart higher freq., count
- Neighbors Coltheart all, cum. freq.
- Neighbors Coltheart all, count
- Neighbors Levenshtein higher freq., cum. freq.
- Neighbors Levenshtein higher freq., count
- Neighbors Levenshtein all, cum. freq.
- Neighbors Levenshtein all, count
Contents
Current version
- 0.3
- New tables: all measures in case-insensitive variant.