Bnf Ebnf Homework Clipart

Defining a language

A grammar defines a language.

In computer science, the most common type of grammar is the context-free grammar, and these grammars will be the primary focus of this article.

Context-free grammars have sufficient richness to describe the recursive syntactic structure of many (though certainly not all) languages.

I'll discuss grammars beyond context-free at the end.

Components of a context-free grammar

A set of rules is the core component of a grammar.

Each rule has two parts: (1) a name and (2) an expansion of the name.

For instance, if we were creating a grammar to handle english text, we might add a rule like:

noun-phrase may expand into articlenoun.

from which we could ultimately deduce that "the dog" is a noun-phrase.

Or, if we were describing a programming language, we could add a rule like:

expression may expand into expressionexpression

If we're working with grammars as mathematical objects, then instead of writing "may expand into," we'd simply write $\rightarrow$:

noun-phrase $\rightarrow$ articlenoun
expression $\rightarrow$ expressionexpression

As an example, consider the classic unambiguous expression grammar:

\[ \mathit{expr} \rightarrow \mathit{term}\; \mathtt{+}\; \mathit{expr} \\ \mathit{expr} \rightarrow \mathit{term} \\ \mathit{term} \rightarrow \mathit{term}\; \mathtt{*}\; \mathit{factor} \\ \mathit{term} \rightarrow \mathit{factor} \\ \mathit{factor} \rightarrow \mathtt{(}\;\mathit{expr}\;\mathtt{)} \\ \mathit{factor} \rightarrow \mathit{const} \\ \mathit{const} \rightarrow \mathit{integer} \]

So, how do we know that is a valid expression?


expr may expand into term;
which may expand into termfactor;
which may expand into factorfactor;
which may expand into constfactor;
which may expand into constconst;
which may expand into const;
which may expand into .

Backus-Naur Form (BNF) notation

When describing languages, Backus-Naur form (BNF) is a formal notation for encoding grammars intended for human consumption.

Many programming languages, protocols or formats have a BNF description in their specification.

Every rule in Backus-Naur form has the following structure:

$\mathit{name}$ $\mathit{expansion}$

The symbol means "may expand into" and "may be replaced with."

In some texts, a name is also called a non-terminal symbol.

Every name in Backus-Naur form is surrounded by angle brackets, , whether it appears on the left- or right-hand side of the rule.

An $\mathit{expansion}$ is an expression containing terminal symbols and non-terminal symbols, joined together by sequencing and choice.

A terminal symbol is a literal like ( or ) or a class of literals (like ).

Simply juxtaposing expressions indicates sequencing.

A vertical bar indicates choice.

For example, in BNF, the classic expression grammar is:

<expr> ::= <term> "+" <expr> | <term> <term> ::= <factor> "*" <term> | <factor> <factor> ::= "(" <expr> ")" | <const> <const> ::= integer

Naturally, we can define a grammar for rules in BNF:

$\mathit{rule}$ $\rightarrow$ $\mathit{name}$ $\mathit{expansion}$
$\mathit{name}$ $\rightarrow$ $\mathit{identifier}$
$\mathit{expansion}$ $\rightarrow$ $\mathit{expansion}$ $\mathit{expansion}$
$\mathit{expansion}$ $\rightarrow$ $\mathit{expansion}$ $\mathit{expansion}$
$\mathit{expansion}$ $\rightarrow$ $\mathit{name}$
$\mathit{expansion}$ $\rightarrow$ $\mathit{terminal}$

We might define identifiers as using the regular expression .

A terminal could be a quoted literal (like , or ) or the name of a class of literals (like ).

The name of a class of literals is usually defined by other means, such as a regular expression or even prose.

Extended BNF (EBNF) notation

Extended Backus-Naur form (EBNF) is a collection of extensions to Backus-Naur form.

Not all of these are strictly a superset, as some change the rule-definition relation to , while others remove the angled brackets from non-terminals.

More important than the minor syntactic differences between the forms of EBNF are the additional operations it allows in expansions.


In EBNF, square brackets around an expansion, , indicates that this expansion is optional.

For example, the rule:

<term> ::= [ "-" ] <factor>

allows factors to be negated.


In EBNF, curly braces indicate that the expression may be repeated zero or more times.

For example, the rule:

<args> ::= <arg> { "," <arg> }

defines a conventional comma-separated argument list.


To indicate precedence, EBNF grammars may use parentheses, , to explictly define the order of expansion.

For example, the rule:

<expr> ::= <term> ("+" | "-") <expr>

defines an expression form that allows both addition and subtraction.


In some forms of EBNF, the operator explicitly denotes concatenation, rather than relying on juxtaposition.

Augmented BNF (ABNF) notation

Protocol specifications often use Augmented Backus-Naur Form (ABNF).

For example, RFC 5322 (email), uses ABNF.

RFC 5234 defines ABNF.

ABNF is similar to EBNF in principle, except that its notations for choice, option and repetition differs.

ABNF also provides the ability to specify specific byte values exactly -- detail which matters in protocols.


  • choice is ; and
  • option uses square brackets: ; and
  • repetition is prefix; and
  • repetition or more times is prefix; and
  • repetition to times is prefix.

EBNF's becomes in ABNF.

Here's a definition of a date and time format taken from RFC 5322.

date-time = [ day-of-week "," ] date time [CFWS] day-of-week = ([FWS] day-name) / obs-day-of-week day-name = "Mon" / "Tue" / "Wed" / "Thu" / "Fri" / "Sat" / "Sun" date = day month year day = ([FWS] 1*2DIGIT FWS) / obs-day month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun" / "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec" year = (FWS 4*DIGIT FWS) / obs-year time = time-of-day zone time-of-day = hour ":" minute [ ":" second ] hour = 2DIGIT / obs-hour minute = 2DIGIT / obs-minute second = 2DIGIT / obs-second zone = (FWS ( "+" / "-" ) 4DIGIT) / obs-zone

Regular extensions to BNF

It's common to find regular-expression-like operations inside grammars.

For instance, the Python lexical specification uses them.

In these grammars:

  • postfix means "repeated 0 or more times"
  • postfix means "repeated 1 or more times"
  • postfix means "0 or 1 times"

The definition of floating point literals in Python is a good example of combining several notations:

floatnumber ::= pointfloat | exponentfloat pointfloat ::= [intpart] fraction | intpart "." exponentfloat ::= (intpart | pointfloat) exponent intpart ::= digit+ fraction ::= "." digit+ exponent ::= ("e" | "E") ["+" | "-"] digit+

It does not use angle brackets around names (like many EBNF notations and ABNF), yet does use (like BNF). It mixes regular operations like for non-empty repetition with EBNF conventions like for option.

The grammar for the entire Python language uses a slightly different (but still regular) notation.

Grammars in mathematics

Even when grammars are not an object of mathematical study themselves, in texts that deal with discrete mathematical structures, grammars appear to define new notations and new structures.

For more on this, see my article on translating math into code.

Beyond context-free grammars

Regular expressions sit just beneath context-free grammars in descriptive power: you could rewrite any regular expression into a grammar that represents the srings matched by the expression. But, the reverse is not true: not every grammar can be converted into an equivalent regular expression.

To go beyond the expressive power of context-free grammars, one needs to allow a degree of context-sensitivity in the grammar.

Context-sensitivity means that terminal symbols may also appear in the left-hand sides of rules.

Consider the following contrived grammar:

<top> ::= <a> ")" <a> ::= "(" <exp> "(" <exp> ")" ::= 7

may expand into ;
which may expand into ;
which may expand into .

While this change appears small, it makes grammars equivalent to Turing machines in terms of the languages they can describe.

By restricting the rules so that the the left-hand side has strictly fewer symbols than all expansions on the right, context-sensitive grammars are equivalent to (decidable) linear-bounded automata.

Even though some languages are context-sensitive, context-sensitive grammars are rarely used for describing computer languages.

For instance, C is slightly context-sensitive because of the way it handles identifiers and type, but this context-sensitivity is resolved by a special convention, rather than by introducing context-sensitivity into the grammar.


This article covered the process of interpreting grammars and common notations.

A closely related topic is parsing.

Parsing takes a grammar and a string and answers two questions:

  1. Is that string in the language of the grammar?
  2. What is the structure of that string relative to the grammar?

For an comprehensive treatment of parsing techniques, I recommend Grune and Jacobs, Parsing Techniques: A Practical Guide:

As an aside, if you think you've invented a new parsing technique, you need to check this book first. Your peer reviewers will check it.

My own articles on parsing may also serve as a useful reference:

Дэвид терпеливо ждал. - Сьюзан Флетчер, я люблю. Будьте моей женой. Она подняла голову. Глаза ее были полны слез.

One thought on “Bnf Ebnf Homework Clipart

Leave a Reply

Your email address will not be published. Required fields are marked *