Table of Contents

Class ParsedDocumentMetadata

Namespace
Textkernel.Tx.Models
Assembly
Textkernel.Tx.SDK.dll

Metadata about a parsed document

public class ParsedDocumentMetadata
Inheritance
ParsedDocumentMetadata
Derived
Inherited Members

Properties

DocumentCulture

An ISO 3066 code that represents the cultural context of the document regarding formatting of numbers, dates, character symbols, etc. This value is usually a simple concatenation of the language and country codes, such as en-US for US English; however, note that culture can be set independently of language and country to achieve fine-tuned cultural control over parsing, so if you use this value you should not assume that it always matches the language and country.

public string DocumentCulture { get; set; }

Property Value

string

DocumentLanguage

An ISO 639-1 code that represents the primary language of the parsed text. When the language could not be automatically determined, it is reported as the special value iv (invariant/unknown). Note that the two-letter ISO codes reported by the Parser - such as zh for Chinese - do not differentiate between language variants, such as Mandarin and Cantonese.

public string DocumentLanguage { get; set; }

Property Value

string

DocumentLastModified

The last-revised/last-modified date that was provided for the document. This was used to calculate all of the important metrics about skills and jobs.

public DateTime DocumentLastModified { get; set; }

Property Value

DateTime

ParserSettings

The full parser settings that were used during parsing

public string ParserSettings { get; set; }

Property Value

string

PlainText

The plain text that was used for parsing

public string PlainText { get; set; }

Property Value

string