Using a neural net to determine the language in which a text is written

Lyon, C. and Matthews, C. (1995) Using a neural net to determine the language in which a text is written. [Report]

Copy

There are statistical patterns of letter sequences in natural language, and different languages have different characteristic patterns. This effect can be used to determine in which language a text is written. The patterns are captured with a single layer, feed forward neural net trained in supervised mode. The sequential dependencies of letters are modelled by taking adjacent letter pairs and letter triples. Training and test data are converted to sets of these tuples, which are the basic elements classified by the network. This approach is supported by information theoretic results on the entropy of letter sequences for English. The architecture of the network used is shown to be appropriate for data with the characteristics of natural language letter sequences. For 3 languages over 99% of test strings are correct. For 4 languages, including Dutch and German which are similar, over 92% are correct.

Item Type	Report
Date Deposited	15 May 2025 15:58
Last Modified	21 Oct 2025 23:02

Explore Further

Lyon, C.

picture_as_pdf: CSTR+212.pdf

View

Download

EndNote

BibTeX

Reference Manager

Refer

Atom

Dublin Core

RIOXX2 XML

OpenURL ContextObject in Span

OpenURL ContextObject

METS

HTML Citation

ASCII Citation

Data Cite XML

MODS

MPEG-21 DIDL

Export

Downloads