Class ParsedDocumentMetadata
- Namespace
- Textkernel.Tx.Models
- Assembly
- Textkernel.Tx.SDK.dll
Metadata about a parsed document
public class ParsedDocumentMetadata
- Inheritance
-
ParsedDocumentMetadata
- Derived
- Inherited Members
Properties
DocumentCulture
An ISO 3066 code that represents the cultural context of the document regarding formatting of
numbers, dates, character symbols, etc. This value is usually a simple concatenation of the
language and country codes, such as en-US
for US English; however, note that culture
can be set independently of language and country to achieve fine-tuned cultural control over parsing,
so if you use this value you should not assume that it always matches the language and country.
public string DocumentCulture { get; set; }
Property Value
DocumentLanguage
An ISO 639-1 code that represents the primary language of the parsed text. When the
language could not be automatically determined, it is reported as the special value
iv
(invariant/unknown). Note that the two-letter ISO codes reported by the
Parser - such as zh
for Chinese - do not differentiate between language
variants, such as Mandarin and Cantonese.
public string DocumentLanguage { get; set; }
Property Value
DocumentLastModified
The last-revised/last-modified date that was provided for the document. This was used to calculate all of the important metrics about skills and jobs.
public DateTime DocumentLastModified { get; set; }
Property Value
ParserSettings
The full parser settings that were used during parsing
public string ParserSettings { get; set; }
Property Value
PlainText
The plain text that was used for parsing
public string PlainText { get; set; }