Inside Out

Notes on seeking wisdom and crafting software

Understanding vocabularies

Table of contents

We will talk about languages, their importance and motivation in a human perspective today. This is the first in a series of posts around knowledge organization, communication and vocabularies.


According to wiktionary, language is:

(countable) A body of words, and set of methods of combining them (called a grammar), understood by a community and used as a form of communication.

(uncountable) The ability to communicate using words.

At the outset, a language is used for communication. What does communication entail? Exchange of information between an Individual and one or more Individuals, we may say. So everything else that comprises a language just aids to that end? Let’s enquire further.

A language comprises a set of words. What are words?

The smallest unit of language that has a particular meaning and can be expressed by itself; the smallest discrete, meaningful unit of language.

I love this definition. Word is the building block which aids communication. It cannot be further broken. Each word carries a notion, a meaning with itself. To share an information, first we have to decompose it to a set of words, and then combine them together using something called a grammar.

A grammar is defined as

A system of rules and principles for speaking and writing a language.

Why is a grammar required? Go back to the definition of communication - it is exchange of information between two different sets of people. Source of information could be an Individual, and the Receiver end could be an Individual or a group. A set of rules forces both ends of the communication to interpret the spoken/written idea in the same manner.

Summary so far.

  1. What is communication? Exchange of information between one/more Individuals
  2. How to communicate? Use a Language. A set of words, rules for exchanging ideas
  3. What is a word? Smallest unit of language, carries meaning with itself
  4. How to compose words? Use a grammar. A set of rules which provides a single interpretation to a sequence of words


Let’s take a slight detour here and introduce Vocabulary.

Vocabulary is often defined as the set of words in a language. Creating another abstraction on top of the fundamental unit (Word) allows us to bring a notion of context into this discussion. For example,

  1. Vocabulary can be defined in the context of a specific field of study. E.g. Medicine Vocabulary or Legal Vocabulary etc..

  2. Wikipedia also classifies Vocabulary in terms of use case. E.g. reading, writing, listening and speaking.

  3. Guess what, someone thought we could further slice and dice sets of words into themes, frequency etc. to help mastering a language.

Vocabulary is controlled. Any random person cannot introduce a new Word into the English Vocabulary, or the Medicine Vocabulary for that matter. It represents the sum total of concepts in a domain.

Vocabulary doesn’t change everyday. Knowledge (or concepts there of) evolves with radical changes in thinking, it takes years for a new branch of science to emerge. In our times, we have noticed a few of them in Computer Science, e.g. data science, artificial intelligence etc.


How does learning work in real world?

  1. We take a source of knowledge (a book for example) at the start.
  2. We use our understanding of grammar (e.g. English Grammar) to start interpretation
  3. Interpretation of Words is aided by the Vocabulary (e.g. English Vocabulary, or a domain specific one like Medicine Vocabulary)

Learning uses the unchanging (the domain concepts, grammar) to describe the changing (state of affairs in the domain)

Books, reference works in a Domain are numerous and they are ever growing. Vocabulary and Grammar on the other hand are relatively constant and controlled. So learning a few concepts, aids in understanding significant aspects of the domain.

How do machines learn? Do they use a similar principle to interpret?

We will answer these questions and understand Vocabularies from a machine perspective in the next post.