{"id":17,"date":"2009-04-01T12:51:56","date_gmt":"2009-04-01T11:51:56","guid":{"rendered":"http:\/\/paguilar.org\/?page_id=17"},"modified":"2009-04-01T12:51:56","modified_gmt":"2009-04-01T11:51:56","slug":"writing-command-line-interfaces-for-embedded-devices","status":"publish","type":"page","link":"https:\/\/paguilar.org\/?page_id=17","title":{"rendered":"Writing Command Line Interfaces for Embedded Devices"},"content":{"rendered":"<p><font size=\"4\"><br \/>\n<strong>Table of contents<\/strong><br \/>\n<\/font><\/p>\n<p><font size=\"4\">1. Introduction<\/font><br \/>\n<font size=\"4\">2. Background<\/font><br \/>\n<font size=\"4\">3. CLI Design<\/font><\/p>\n<ul>\n<ol>\n<font size=\"3\">3.1 Lexical analysis<\/font><br \/>\n<font size=\"3\">3.2 Syntax analysis<\/font><br \/>\n<font size=\"3\">3.3 Data structure<\/font>\n<\/ol>\n<\/ul>\n<p><font size=\"4\">4. CLI Development<\/font><\/p>\n<ul>\n<ol>\n<font size=\"3\">4.1 Scanner<\/font><br \/>\n<font size=\"3\">4.2 Parser<\/font><br \/>\n<font size=\"3\">4.3 The graph and the command execution<\/font>\n<\/ol>\n<\/ul>\n<p><font size=\"4\">5. Conclusions<\/font><br \/>\n<font size=\"4\">6. References<\/font><\/p>\n<p><font size=\"4\"><br \/>\n<strong>1. Introduction<\/strong><br \/>\n<\/font><\/p>\n<p>The configuration, maintenance a monitoring of embedded devices or dedicated servers that need human interaction at least once, such as routers, switchers and firewalls, can provide several interfaces that allow these operations. These interfaces usually are of two types: Command Line Interface (CLI) and Graphical User Interface (GUI). <\/p>\n<p>A GUI, given its user-friendly nature, it&#8217;s becoming a common tool for interacting with an embedded system, specially when the device is a home appliance such as a Set Top Box or a mobile telephone.<\/p>\n<p>However, the default and usually the most powerful tool is the CLI. Most embedded devices provide it given that is extremely useful during the development stages of the system and usually allows more advanced operations that are not present in the GUI. Virtually all telecommunications devices provide a CLI for management. Another very useful case of a CLI in which is the only way for interacting with the system is a bootloader.<\/p>\n<p>For the above reasons, the CLI becomes an important topic during the development cycle of an embedded system or a server.<\/p>\n<p>A CLI can be designed and implemented in many different ways. The CLI described in this article comprehends an organized, efficient and dynamic approach that takes advantage of scanners, parsers and graphs.<\/p>\n<p>Normally, embedded devices provide a standard shell such a bash or ash (used in Busybox), but the administrators that will have access to the final product must not have a shell access to the system. They must have access only to a reduced set of operations that include the control of specific peripherals such as the management of a microwave channel in a long-haul radio bridge.<\/p>\n<p>Notice that a CLI is not the same as a shell. The later is a different and very powerful tool that comprehends a command language interpreter that may support concepts such as shell variables, redirection and aliases. A CLI provides just a set of required commands for executing a specific task.<br \/>\nHowever, there are some similarities between a CLI a shell. For example, both have to take control of the terminal, both have a set of reserved words and both have a specific grammar that allows them to interpret the given commands.<\/p>\n<p>The examples in this guide are based on the <a href=\"http:\/\/sourceforge.net\/projects\/tarasca\">Tarasca Project<\/a>, a proof-of-concept (alpha stage) and Open Source project and that is completely written in C, lex and yacc for *nix. It contains the infrastructure for creating a CLI with user-defined commands that supports privileged menus, tab completion, command completion, commands history and some other features.<\/p>\n<p><font size=\"4\"><br \/>\n<strong>2. Background<\/strong><br \/>\n<\/font><\/p>\n<p>A CLI can be designed and developed without any specific knowledge in a given area, but the design and development steps can be accomplished faster and can be better structured with a basic knowledge of compiler theory and a programming language.<\/p>\n<p>Notice that a CLI can be designed and developed  without the use of any scanner or parser, but as we will see later, this tools are quite helpful when the programmer needs to implement a considerable amount of commands for configuring, monitoring and maintaining a complex system such as a router or a long-haul microwave bridge.<\/p>\n<p>Compiler theory is large and complex, but there are only two concepts from this theory that we apply to CLIs in a simplified manner: lexical analysis and syntax analysis. Lexical analysis is the processing of an input sequence of characters to produce, as output, a sequence of symbols called tokens. Syntax analysis is the process of analyzing a sequence of tokens in order to determine its grammatical structure with respect to a given formal grammar. Sections 3.1 and 3.2 give a brief and, hopefully, a clear explanation.<br \/>\nThis <a href=\"http:\/\/www.cs.man.ac.uk\/~pjj\/farrell\/compmain.html\">site<\/a> and this book [1] give a further explanation.<\/p>\n<p>The last useful element for a working CLI is the knowledge of a programming language. There is no need for knowing everything about it, but some experience with data structures and pointers is definitely useful.  Also, there is no need of a particular programming language, a CLI can be developed in the language of your choice such as C, Perl, Python or Lisp.<br \/>\nSince the Tarasca Project is written in C, we will use this programming language for developing the CLI.<\/p>\n<p><font size=\"4\"><br \/>\n<strong>3. CLI design<\/strong><br \/>\n<\/font><\/p>\n<p>We can design and develop a CLI in many different ways. The selected choice, that it is used in the Tarasca Project, allows to:<br \/>\nDefine the commands in configuration files that can be part of the filesystem or compiled into the executable.<\/p>\n<ul>\n<li>Process the commands until a convenient data structure is generated.<\/li>\n<li>Accept input from a terminal that supports add-on functionality such as tab-completion and command history.<\/li>\n<li>Match the given input from the terminal to the data structure in memory.<\/li>\n<li>Execute the command.<\/li>\n<li>Send the output to the terminal.<\/li>\n<\/ul>\n<p>An illustration explains better the chosen design. Figure 1 shows the building blocks of the CLI:<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.paguilar.org\/images\/figure01.png\" alt=\"Building block of the CLI\" \/><\/p>\n<p>In this article we will focus on the core issues, the red-highlighted block, for building a CLI. We will address other topics, such as the terminal control and add-on functionality in further articles.<br \/>\nIn order to analyze the highlighted blocks of Figure 1, we will assume that we already have the other blocks working properly.<br \/>\nAn example of a quick and simple CLI&#8217;s command is the following: suppose that we have a radio bridge with multiple communication channels in which we want the command<\/p>\n<pre lang=\"bash\">\n> config <channel_number> <bandwidth><\/code><\/pre>\n<p>to set the bandwidth of the given channel number.<br \/>\nThis command, called <em>config<\/em>, receives two parameters, also called arguments. The first argument, <em>channel_number<\/em>, is an integer number greater than 0; and the second argument, <em>bandwidth<\/em>, is an integer number in the range [0 \u2013 4096]. The bandwidth is measured in KB. Any other value is considered an error.<\/p>\n<p><font size=\"3\"><br \/>\n<strong>3.1 Lexical analysis<\/strong><br \/>\n<\/font><\/p>\n<p>Lexical analysis is the processing of an input sequence of characters to produce, as output, a sequence of symbols called tokens. A scanner is the tool used for this procedure. For example, a typical scanner that receives as input the character sequence <em>config 2 2000<\/em>, will produce as output the tokens <em>config<\/em>, <em>2<\/em> and <em>2000<\/em>. These tokens become the input of the parser.<br \/>\nA dictionary allows the characters to be in any given sequence and to follow a specific order. This is, we need to define the characters that will be accepted by our CLI and we must order them in a particular manner.<br \/>\nFor example, we can decide that the allowed characters will be the digits <em>[0-9]<\/em>, all letters from the English alphabet <em>[a-zA-Z]<\/em> and the characters  -,  _, space, tab and enter. We can decide also, that all characters can start and end with a sequence of letters, and can have zero or more digits, a \u2013 or a _ in the middle of the letter sequences. The spaces and tabs are discarded, but do not produce any error as it would be with any other character such as the &#038; or the %. The enter tells the parser that the sequence is over and that it has to produce a token for the scanner.<\/p>\n<p>Using regular expressions syntax, the allowed syntax would be:<\/p>\n<pre lang=\"bash\">\n[^s*[a-zA-Z][-|_|[0-9]*][a-zA-Z]s*$]\n<\/pre>\n<p>Therefore the following two sequences are valid: <\/p>\n<pre lang=\"bash\">\npwd-enable\nt activate\n<\/pre>\n<p>but the following two sequences are not valid:<\/p>\n<pre lang=\"bash\">\n_meminfo         (the sequence cannot start with a _)\nifup_0              (the sequence cannot end with a digit)\n<\/pre>\n<p>A finite state machine, that graphically can be seen as a connected graph, is the best way for designing a scanner.<br \/>\nThe graph in Figure 2 shows several vertices, also called nodes, that are connected through unidirectional edges. Each node represents a state and each edge represents an input character. Thus, for producing the token <em>pwd-enable<\/em> when receiving the sequence<\/p>\n<pre lang=\"bash\">\npwd-enabletenter        ( t means tab )\n<\/pre>\n<p>we have to move through the directed graph in the following way:<br \/>\nWe always start from the initial state 0, then when we receive the character <em>p<\/em> we move to state 1 because when a letter is received in state 0, the matching edge takes us to state 1. When we receive the characters <em>w<\/em> and <em>d<\/em> we remain in state 1 because of the edge that points to itself. Then , when we receive the character  \u2013, we move to state 2, then when we receive the character <em>e<\/em> we return to state 1 and remain there for the characters <em>n<\/em>, <em>a<\/em>, <em>b<\/em>, <em>l<\/em> and <em>e<\/em> . Then the <em>t<\/em> or tab is discarded. Finally, when we receive the enter we check if we are in an acceptable state or in an error state, this is, we check if it is a valid sequence of characters. If it is valid, we produce the token, otherwise we produce an error.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.paguilar.org\/images\/figure02.png\" alt=\"Finite state machine for a scanner\" \/><\/p>\n<p><font size=\"3\"><br \/>\n<strong>3.2 Syntax analysis<\/strong><br \/>\n<\/font><\/p>\n<p>Syntax analysis is the process of analyzing a sequence of tokens in order to determine its grammatical structure with respect to a given formal grammar.<br \/>\nA parser is the tools used for this procedure and their input are the output tokens produced by the scanner.<\/p>\n<p>Defining a formal grammar could be a complex process, the dragon book [1] explains it in great detail. Here, we will just describe the necessary concepts for developing the CLI.<\/p>\n<p>A formal grammar is the precise description of a set of strings. There are two main categories of formal grammars: an analytic grammar describes how to recognize when strings are members in the set, whereas a generative grammar only describes how to write those strings in the set.<\/p>\n<p>Since we are interested in writing a set of strings, we will use a generative grammar.<br \/>\nThese kinds of grammars are classified in several types known as the <a href=\"http:\/\/en.wikipedia.org\/wiki\/Chomsky_hierarchy\">Chomsky hierarchy<\/a>. The most relevant for our purposes are the context-free grammars. The languages that can be described with such a grammar are called context-free languages in which it must be possible to tell how to parse any portion of an input string with just a single token of look-ahead. Strictly speaking, that is a description of an LR grammar.<\/p>\n<p>In the formal grammatical rules for a language, each kind of syntactic unit or grouping is named by a symbol.  Those which are built by grouping smaller constructs according to grammatical rules are called nonterminal symbols whilst those which cannot be subdivided are called terminal symbols or token types.  An input corresponding to a single terminal symbol is called token, and a piece corresponding to a single nonterminal symbol is called grouping.<\/p>\n<p>For example, the tokens of C are identifiers, constants, the various keywords such as <em>if<\/em> and <em>for<\/em>, arithmetic operators and punctuation marks.  So the terminal symbols of a grammar for C include <em>identifier<\/em>, <em>number<\/em>, <em>string<\/em>, one symbol for each keyword, operator or punctuation mark. Examples of terminal symbols are <em>if<\/em>, <em>return<\/em>, <em>static<\/em>, <em>int<\/em>, the plus-sign, the semi-colon and others.<\/p>\n<p>The following C function subdivided into tokens explains the above paragraphs:<\/p>\n<pre lang=\"c\">\n  int                   \/* keyword int *\/\n  sum (int x, int y)    \/* identifier, open-parenthesis, keyword int, identifier,\n                         * coma, keyword int, identifier, close-parenthesis *\/\n  {                     \/* open-brace *\/\n    return x + y;       \/* keyword return, identifier, plus sign, identifier,\n                         * semicolon *\/\n  }                     \/* close-brace *\/\n<\/pre>\n<p><font size=\"1\">Listing 1. The tokens of a simple C function<\/font><\/p>\n<p>The syntactic groupings of C include the expression, the statement, the declaration, and the function definition.  These are represented in the grammar of C by nonterminal symbols <em>expression<\/em>, <em>statement<\/em>, <em>declaration<\/em> and <em>function definition<\/em>.  The full grammar uses many additional language constructs, each with its own nonterminal symbol, in order to express the meanings of these four.<br \/>\nFor example, Listing 1 is a function definition that contains one declaration and one statement.  In the statement there are three expressions, the first one is <em>x + y<\/em> that is composed of the two expression <em>x<\/em> and <em>y<\/em>.<\/p>\n<p>The <em>start symbol<\/em> is the only nonterminal symbol that must be distinguished as the special one which<br \/>\ndefines a complete utterance in the language.<br \/>\nIn C, the start symbol is the <em>sequence of definitions and declarations<\/em>.<\/p>\n<p>Each time a terminal or a nonterminal symbol is validated, the parser calls a function that will add an element to the data structure that the CLI will use. Therefore, at the end of the syntax analysis, if there were no errors, we will have a validated group of tokens (the terminal and nonterminal symbols) and the data structure for the CLI. The creation and the description of the elements of this data structure is explained in the following section.<\/p>\n<p><font size=\"3\"><br \/>\n<strong>3.3 Data structure<\/strong><br \/>\n<\/font><\/p>\n<p>In order to execute a command, we must create the necessary structures for finding, executing and eventually extracting information from these commands.<br \/>\nThe way we  chose for creating the CLI structure is quite fast and organized, although there could be several other ways.<br \/>\nThis structure can be created on the fly at the same time at which the syntax analysis takes place. Once the parser has validated and identified the token type, it calls a specific function that will create a specific part of this structure.<\/p>\n<p>Before we describe the creation of this structure and the way its elements interact between them, we identify and describe these elements:<\/p>\n<p><strong>Modes<\/strong><br \/>\nThe modes are used for classifying in levels the different types of commands. For example, the  commands that configure a radio channel could be inside the mode <em>channel-conf<\/em>, whilst the command that change the privileged password could be inside the mode <em>privileged<\/em>. Between one mode and another there could be a password that allows a privileged user to execute certain commands.<\/p>\n<p><strong>Commands<\/strong><br \/>\nThe commands describe an instruction that will be executed. For example, the command <em>activate<\/em> could execute the action of activating a radio channel. A command has several properties, the most important ones are the function it calls when it is entered, which actually executes it, and  its description that could be used for generating its help. Each command belongs to a specific mode, but it can be exported to other modes becoming available to several modes at the same time. This is useful for generic commands such as <em>help<\/em> and <em>quit<\/em> that must be available at any moment.<\/p>\n<p><strong>Arguments<\/strong><br \/>\nThe argument is a parameter that is given to a command. For example, the command <em>config<\/em> has the argument <em>channel_number<\/em> that identifies a specific channel and the argument <em>bandwidth<\/em> that sets the desired bandwidth. Depending on the command, some arguments can be mandatory, and others can be optional.<\/p>\n<p><strong>Prompt<\/strong><br \/>\nIdentifies the functional level in which we are at any given moment. It could be a static or a dynamic string. For example, using the host name as a prompt would be a dynamic string.<\/p>\n<p><strong>The graph<\/strong><br \/>\nThe structure that puts all these elements together in an organized and efficient manner is a graph.<br \/>\nThis graph contains 3 types of single-linked lists:<\/p>\n<ul>\n<li>List 1: The first list is composed of modes, each node in the list represent a mode. For example, the first node could be used as a <em>user<\/em> mode for basic commands, second node could be used as <em>privileged<\/em> mode for sensitive commands such as the device configuration operations, third mode could be used as privileged mode for the configuration of radio channels only, and so on. There is only one list of modes.\n<\/li>\n<li>List 2: Since each mode can have several commands, each node of the mode list has a list of  commands. For example, typical commands for the third node would be <em>activate<\/em> channel and <em>lockout<\/em> channel that are typical commands for configuring a network interface and only a privileged user can have access to them.<br \/>\nSome other commands could allow us to move from one mode to another.<br \/>\nFor example, the command <em>enable<\/em> could move us from the <em>user<\/em> mode to the <em>privileged<\/em> mode, an the command <em>exit<\/em> could move us to the previous mode.<\/li>\n<li>List 3: Since each command can have several mandatory and\/or optional arguments, each node of the command list has a list of arguments. For example, the command <em>config<\/em> channel needs as arguments the radio <em>channel<\/em> number and its <em>bandwidth<\/em>.<\/li>\n<\/ul>\n<p>Figure 3 show the graph as the main data structure for the CLI with its three types of lists: modes, commands and arguments.<\/p>\n<p><img decoding=\"async\" src=\"https:\/\/www.paguilar.org\/images\/figure03.png\" alt=\"CLI graph with modes, commands and arguments\" \/><\/p>\n<p>Having a graph of this kind facilitates significantly how to execute a command, how to extract its data, or simply how to move from one element to another.<\/p>\n<p>For example, taking as reference Figure 3, if we are in the node that represents the <em>channel-conf<\/em> mode and from the CLI  we enter the <em>config 2 2000<\/em> command, we have to move through the list of commands until we find the node of the <em>config<\/em> command. If the command is not found an error is issued. If it is found, we have to move along the arguments list of this command for finding each one of its arguments. If at least one of the arguments is not found, an error is issued. First we search for the <em>channel_number<\/em> argument. If we found it, we move ahead for searching the <em>bandwidth<\/em> argument. If we found it, we stop, otherwise we issue an error.<br \/>\nAfter this procedure, we can execute the command or extract information about it and its arguments.<br \/>\nNotice that one of the properties of a command node is the function that must be called when this command is given. <\/p>\n<p>The following sections describe how the user can implement the scanner, the parser and how to create the graph and implement the functions of each command.<\/p>\n<p><font size=\"4\"><br \/>\n<strong>4 CLI development<\/strong><br \/>\n<\/font><\/p>\n<p><font size=\"3\"><br \/>\n<strong>4.1 Scanner<\/strong><br \/>\n<\/font><\/p>\n<p>A scanner is a tool for performing lexical analysis, this is, a scanner processes an input sequence of characters to produce, as output, a sequence of tokens.<br \/>\nThe Tarasca Project uses the widespread open source scanner called <a href=\"http:\/\/flex.sourceforge.net\/manual\/\">Flex<\/a>, therefore we will use it for explaining how to generate our tokens. The code in this section is taken from the file <em>rhal\/rhal_lexer.l<\/em><br \/>\nA Flex input file consists of three sections, separated by a line with <em>%%<\/em> in it:<\/p>\n<pre lang=\"lex\">\ndefinitions\n%%\nrules\n%%\nuser code\n%%\n<\/pre>\n<p><font size=\"1\">Listing 2. The structure of a Flex file<\/font><\/p>\n<p>The definitions section contains declarations of simple name  definitions  to  simplify  the  scanner  specification and declarations of start conditions. Listing 3 shows the definitions of the start and end identifiers, quoted string, comments, space and new line character that are the elements that our scanner needs for the CLI:<\/p>\n<pre lang=\"lex\">\nidentifier_start  [A-Za-z_]\nidentifier_end    [\u000055]*[A-Za-z_0-9$]\nidentifier        {identifier_start}{identifier_end}*|[?]\nqstr              \"[^\"n]*[\"rn]\ncomment           #.*\nspace             [ tr]+\nnl                [n]\n<\/pre>\n<p><font size=\"1\">Listing 3. Scanner definitions for the CLI using Flex<\/font><\/p>\n<p>The <em>identifier_start<\/em> identifier can only be composed of the upper and lower case letter of the alphabet plus the underscore. The <em>comment<\/em> identifier is composed of the hash character plus zero or more characters of any type except the new line character. And so on.<\/p>\n<p>The second section of the Flex input , called rules section, contains a series of rules of the form<\/p>\n<pre lang=\"lex\">\n           pattern   action\n<\/pre>\n<p>where the pattern must be unindented and the action must begin on the same line. The patterns in the input are written using an extended set of regular expressions. The CLI does not need complex patterns, mainly reserved words. A reserved word is used for uniquely identifying a given token. In C, the tokens <em>if<\/em> and <em>for<\/em> are reserved words that cannot be used for anything else. The same happens with our CLI, the <em>mode<\/em> and <em>prompt<\/em> tokens are reserved words.<\/p>\n<p>The action is an arbitrary C statement. Listing 4 shows the pattern-action section of the CLI and we can divide it in three parts. At the top of the listing we see that the spaces and comments are eaten up and that the quoted strings <em>qstr<\/em> defined above in the <em>definitions<\/em> section, must contain the double quote character. At the bottom of the listing, the <em>nl<\/em> characters are counted for displaying the line number when there is a parser error, and the &#8216;={};&#8217; characters along with the invalid character represented by the &#8216;.&#8217; character are returned verbatim.<br \/>\nThe middle of the list contains all the tokens that are matched against a reserved word; thus, if we match the token <em>mode<\/em> we will return the <em>MODE<\/em> value, and so on. The <em>identifier<\/em> token is treated similarly to the <em>qstr<\/em> token.<br \/>\nThese pattern-action lines in the middle of the listing call the function <em>SymLook()<\/em> that receives as parameters the pattern matched and returns the name and type of the matched token inside a struct if there was a match, or NULL if it was not found. Notice that all tokens are stored in the Flex global character pointer <em>yytext<\/em> and their length in the global integer <em>yyleng<\/em>. <\/p>\n<p>The returned values of Flex are the input of the scanner. This is, the scanner will receive the <em>yytext<\/em> variable when there is a <em>return yytext<\/em> sentence, or a <em>MODE<\/em>, <em>PROMPT<\/em> or some other value of a reserved word. Notice that these values are defined in the scanner file because it is responsible for validating the tokens created by Flex.<\/p>\n<pre lang=\"lex\">\n{space}\t\/* Eat up spaces *\/\n{comment}\t\/* Eat up comments *\/\n{qstr}\t{\n\t      if(yytext[yyleng - 1] != '\"') {\n\t\t\tlog_error(LOG_PARSER, \"Flex: Error while scanning, \n\t\t          string not quoted: %s\", yytext);\n\t\t\tfprintf(FD_OUT, \"Flex: Error while parsing, \n\t\t\t    string not quoted: %sn\", yytext);\n\t\t   }\n               else {\n                  rhal_lval.string = yytext;\n                  return QSTRING;\n               }\n            }\n\n\nmode        { rhal_lval.stptr = SymLook(yytext); return MODE; } \n...         \/* Reserved words are here *\/\nmandatory   { rhal_lval.stptr = SymLook(yytext); return MANDATORY; }\n{identifier}{ rhal_lval.string = yytext; return IDENTIFIER; }\n\n\n{nl}        { line_count++; }\n[={};]      { return yytext[0]; }\n.           { return yytext[0]; }\n<\/pre>\n<p><font size=\"1\">Listing 4. A fragment of the rules section<\/font><\/p>\n<p>The user code section is simply copied to the output C file verbatim, but it&#8217;s not used by the CLI.  In other cases is useful as companion  routines  which call or are called by the scanner.  The presence of this section is optional; if it is missing, the second <em>%%<\/em> in the input file may  be skipped, too, as in the case of our example.<\/p>\n<p>Notice that the generated C file is the real scanner, Flex is just the utility for generating the scanner that is compiled into the final program<\/p>\n<p><font size=\"3\"><br \/>\n<strong>4.2 Parser<\/strong><br \/>\n<\/font><\/p>\n<p>The Tarasca Project uses the widespread open source parser called <a href=\"http:\/\/www.gnu.org\/software\/bison\/manual\/\">Bison<\/a>, therefore we will use it for explaining how to validate our tokens.<br \/>\nBison  is  a parser generator in the style of yacc, it should be upwardly compatible with input files designed for yacc and it&#8217;s definitely more Flexible. However, in the Tarasca Project, the way in which bison is used is compatible with the yacc functionality. Bison is a complex tool; thus, we will only outline its most important aspects.<br \/>\nBison reads a sequence of tokens as its input, and groups the tokens using the grammar rules.  If the input is valid, the end result is that the entire token sequence is reduced to a single grouping whose symbol is the grammar&#8217;s start symbol. If the input is not valid, the parser reports a syntax error. In our CLI, the entire input must be a sequence of modes, commands and arguments declarations.<br \/>\nThe structure of a Bison file that defines the grammar contains several parts that are shown in Listing 5. Once Bison processes this file, it generates a C file as output that is compiled for generating the final executable. Notice that the generated C file is the real parser, Bison is just the utility for generating the parser that is compiled into the final program.<\/p>\n<pre lang=\"yacc\">\n     %{\n       Prologue\n     %}\n     Bison declarations\n     %%\n     Grammar rules\n     %%\n     Epilogue\n<\/pre>\n<p><font size=\"1\">Listing 5. The structure of a Bison file<\/font><\/p>\n<p>The Prologue section contains C macro definitions and declarations of functions and variables that are used in the actions in the grammar rules. This C code is copied verbatim to the output C file.<br \/>\nThe Bison declarations section contains declarations that define terminal and nonterminal symbols, specify precedence, and so on.  This section defines the symbols used in formulating the grammar and the data types of semantic values. All token type names must be declared here, except for the single char literal tokens such as the plus sign.<br \/>\nThe grammar rule section has the general form:<\/p>\n<pre lang=\"yacc\">\n     Result: Components...\n             ;\n\n<\/pre>\n<p>where Result is the nonterminal symbol that this rule describes and Components are various terminal and nonterminal symbols that are put together by this rule.<br \/>\nThe Epilogue section, as the Prologue, contains plain C code that is copied verbatim to the output C file. The last %%  are removed if this section is empty.<\/p>\n<p>The code fragments of Listing 6 shows a Bison file for the CLI. This code is taken from the file rhal\/rhal_parser.y of the Tarasca Project. The file extension .y is a standard for yacc grammars.<\/p>\n<pre lang=\"yacc\">\n%{\n#include <stdio.h>\n#include <string.h>\n#include \"rhal.h\"\n\n...                  \/* Other include statements or data\/functions declarations *\/\n\nextern CLIMatrix        matrix;\n%}\n\n%union\n{\n    char        *string;\n    SymTable    stptr;\n}\n\n%token <string> IDENTIFIER\n%token <string> QSTRING\n%type <string> cstring\n%type <string> name\n%type <stptr> value_expr\n%type <stptr> prio_expr\n\n%token <stptr> MODE PROMPT COMMAND FUNC AVAIL ARG NAME PRIORITY VALUE PARENT DESC\n%token <stptr> BOOL NUMBER STRING HEX IPADDR MACADDR RANGE OPTIONAL MANDATORY\n\n%%\n\nmode_sections: mode_section\n        | mode_sections mode_section\n;\n\nmode_section: MODE name { matrix = graph_mode_insert($2); }\n                        '{' mode_attrs '}'\n;\n\nmode_attrs: prompt_section\n        | cmd_sections\n        | prompt_section cmd_sections\n;\n\nprompt_section: PROMPT '=' cstring { matrix = graph_mode_insert_prompt($3); } ';'\n;\n\ncmd_sections: cmd_section\n        | cmd_sections cmd_section\n;\n\ncmd_section: COMMAND name { matrix = graph_cmd_insert($2); }\n                          '{' cmd_attrs '}'\n;\n\n\n...              \/* Other grammar rules and actions follow such as cmd_attrs *\/\n\n\nvalue_expr: BOOL\n        | STRING\n        | NUMBER\n        | HEX\n        | IPADDR\n        | MACADDR\n        | RANGE         { $$ = $1; }\n;\n\nprio_expr: OPTIONAL\n        | MANDATORY     { $$ = $1; }\n;\n\nparent_section: \/* empty *\/\n        | PARENT '=' name { matrix = graph_arg_insert_parent($3); } ';'\n;\n\nname: IDENTIFIER { $$ = $1; }\n;\n\ncstring: QSTRING { $$ = $1; }\n;\n\ndesc_section: \/* empty *\/\n        | DESC '=' cstring { matrix = graph_insert_desc($3); } ';'\n;\n%%\n\nvoid enable_yacc_debug() {\n    #if YYDEBUG != 0\n        yydebug = 1;\n    #endif\n}\n<\/pre>\n<p><font size=\"1\">Listing 6. A fragment of the Bison file<\/font><\/p>\n<p>The <em>prologue<\/em> section contains includes and global variables that are copied verbatim. The include files contain the definitions of the functions used in the rules section such as <em>graph_mode_insert()<\/em>.<br \/>\nIn this Bison file we see two types of rules in the rules section. The first type and simpler one does not contain recursion and has the form<\/p>\n<pre lang=\"yacc\">\nexp:      exp '+' exp\n             ;\n<\/pre>\n<p>and says that two groupings of type <em>exp<\/em>, with a + token in between, can be combined into a larger grouping of type <em>exp<\/em>. Example of this rule are the <em>name<\/em> and <em>cstring<\/em> rules at the end of Listing 6.<br \/>\nThe second type of rules are the recursive rules that are the only way to define a sequence of any number of a particular thing. These rules have the form<\/p>\n<pre lang=\"yacc\">\nexpseq:  exp\n             | expseq ',' exp\n             ;\n<\/pre>\n<p>and says that a group of type <em>expseq<\/em> can be composed of a group of type <em>exp<\/em> or by two groupings of type <em>expseq<\/em> and <em>exp<\/em> with a &#8216;+&#8217; token in between. This is called left recursion because the recursive use of <em>expseq<\/em> is the leftmost symbol in the right hand side of the rule. Left recursion is preferable over right recursion because it can parse a sequence of any number of elements with bounded stack space.<\/p>\n<p>The epilogue in this case contains a function that indicates Bison to display debugging information if the flag <em>YYDEBUG<\/em> is active. If there was not an epilogue, the <em>%%<\/em> would not be necessary.<\/p>\n<p>This last function is interesting because it shows one of the Bison predefined variables: <em>yydebug<\/em>. Bison has some predefined variables and functions that start with the prefix <em>yy<\/em>, although it can be changed when invoking it from the shell or a makefile. These functions and variables are highly customizable, some of the can even be overwritten to fit specific needs. However, in their default form, they are responsible for doing the real parsing.<br \/>\nThe most important predefined functions that are worth mentioning are:<br \/>\n<em>yyparse()<\/em>:  This function reads tokens, executes actions, and returns when it encounters end-of-input or an unrecoverable syntax error. This function is called, in our case, from the CLI in the function <em>rhal_parse_files()<\/em> in the file <em>rhal\/rhal.c<\/em>.<br \/>\n<em>yylex()<\/em>: It recognizes tokens from the input stream and returns them to the parser. In other words, <em>yylex()<\/em> is called from the scanner produced by Flex, and the output is sent to the parser. This function provides the glue between the scanner and the parser.<br \/>\n<em>yyerror()<\/em>: This function is called when the Bison parser finds a syntax or parse error. This function is normally overwritten for keeping it aligned with the rest of the program&#8217;s error printing functions. In our CLI, <em>yyerror()<\/em> is overwritten in the file <em>rhal\/rhal.c<\/em>.<\/p>\n<p><font size=\"3\"><br \/>\n<strong>4.3 The graph and the command execution<\/strong><br \/>\n<\/font><\/p>\n<p>As we described in Section 3.3, the main data structure of the CLI is a graph that contains three types of single-linked lists: a list of modes, lists of commands and lists of arguments.<\/p>\n<p>Listing 7 shows an example of a simple CLI with three modes called user, privileged, and channel_conf. Each mode has one or more commands. Each command can have zero or more arguments.<br \/>\nThis Listing is the Configuration Files block of Figure 1. It is the input of the scanner.<br \/>\nFigure 3 illustrates how Listing 7 would look like after been processed by the scanner and parser, and after we create the graph with its lists.<\/p>\n<pre lang=\"bash\">\nmode \u201cuser\u201d {\n\tcommand \u201cquit\u201d {\n\t\tdescription: Quit the shell\n\t}\n\tcommand \u201chelp\u201d {\n\t\tdescription: \"Display the syntax of the given command\"\n\t\targument \u201ccmd\u201d {\n\t\t\tPriority: mandatory\n\t\t\tDescription: \"Name of the command\"\n\t\t}\n\t}\n\tcommand \u201cenable\u201d {\n\t\tdescription: \"Change to privileged mode\"\n    \t}\n}\n\nmode \u201cprivileged\u201d {\n    \tcommand configure {\n\t\tdescription: \"Configure a system's module\"\n\t\targument \u201cinterface\u201d {\n\t\t\tpriority: mandatory\n\t\t\tdescription: \"Sub-system to be configured\"\n\t\t}\n    \t}\n\tcommand \u201cpwd-enable\u201d {\n\t\tdescription: \"Modify enable password\"\n    \t}\n}\n\nmode \u201cchannel_conf\u201d {\n\tcommand \u201cactivate\u201d {\n\t\tdescription: \"Configure the given radio channel\"\n\t\targument \u201cnumber\u201d {\n\t\t\tpriority: mandatory;\n\t\t\tdescription: \"Channel number\"\n      \t\t}\n\tcommand \u201clockout\u201d {\n\t\tdescription: \"Sent radio channel to protection mode\"\n\t\targument \u201cnumber\u201d {\n\t\t\tpriority: mandatory;\n\t\t\tdescription: \"Channel number\"\n        \t}\n\t}\n\tcommand \u201cconfig\u201d {\n\t\tdescription: \"Configure a radio channel\"\n\t\targument \u201cnumber\u201d {\n\t\t\tpriority: mandatory;\n\t\t\tdescription: \"Channel number\"\n        \t}\n\t\targument \u201cbandwidth\u201d {\n\t\t\tpriority: mandatory;\n\t\t\tdescription: \"Channel bandwidth\"\n        \t}\n\t}\n\tcommand \u201cinfo\u201d {\n\t\tdescription: \"Display channel information\";\n\t\targument \u201citem\u201d {\n\t\t\tpriority: mandatory\n\t\t\tdescription: \"Display basic channel info\"\n\t\t}\n    \t}\n}\n<\/pre>\n<p><font size=\"1\">Listing 7. Configuration Files of the CLI<\/font><\/p>\n<p>In order to build the graph and move along its different lists efficiently, there is a global struct that contains pointers to each one of the three types of lists. These pointers will move to the mode, command and argument that are been entered at any given moment.<\/p>\n<p>The commands and arguments entered in the terminal are compared against the commands and arguments in this graph. If  a match is found, the user-defined function associated to the command is executed. It is simply a callback function.<\/p>\n<p>The functions <em>rhal_validate_input_list()<\/em>, <em>match_args()<\/em> and <em>rhal_exec_cmd()<\/em> in the file<em> rhal\/rhal.c<\/em> of the Tarasca Project validate the input and match it against the graph, if a match is found the command is executed.<br \/>\nThe file that actually contains the calls to these functions is <em>cli\/crish.c<\/em>. The functions in this file handle part of the terminal control and the command matching and execution.<\/p>\n<p><font size=\"4\"><br \/>\n<strong>5. Conclusions<\/strong><br \/>\n<\/font><\/p>\n<p>With the support of a scanner and a parser, we designed and developed a Command Line Interface in which the user can define the commands and their organization and concentrate only in their functionality, leaving aside the tedious details of parsing and validating the terminal input. This allows us to reuse the CLI for other projects in which we will only have to focus in the new commands and their functionality.<\/p>\n<p><font size=\"4\"><strong>6. References<\/strong><\/font><br \/>\n<font size=\"1\"><br \/>\n[1] The \u201cdragon book\u201d:<br \/>\n\t\tAho V. Alfred, Sethi Ravi, Ullman D. Jeffrey<br \/>\n\t\tCompilers: Principles, Techniques, and Tools<br \/>\n\t\tAddison Wesley;<br \/>\n\t\tJanuary 1, 1986.<br \/>\n<\/font><\/p>\n<p>Go back to the <a href=\"https:\/\/paguilar.org\/?page_id=18\">articles<\/a> page.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Table of contents 1. Introduction 2. Background 3. CLI Design 3.1 Lexical analysis 3.2 Syntax analysis 3.3 Data structure 4. CLI Development 4.1 Scanner 4.2 Parser 4.3 The graph and the command execution 5. Conclusions 6. References 1. Introduction The configuration, maintenance a monitoring of embedded devices or dedicated servers that need human interaction at\u2026 <span class=\"read-more\"><a href=\"https:\/\/paguilar.org\/?page_id=17\">Read More &raquo;<\/a><\/span><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":18,"menu_order":0,"comment_status":"open","ping_status":"open","template":"","meta":{"footnotes":""},"class_list":["post-17","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/paguilar.org\/index.php?rest_route=\/wp\/v2\/pages\/17","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/paguilar.org\/index.php?rest_route=\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/paguilar.org\/index.php?rest_route=\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/paguilar.org\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/paguilar.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=17"}],"version-history":[{"count":0,"href":"https:\/\/paguilar.org\/index.php?rest_route=\/wp\/v2\/pages\/17\/revisions"}],"up":[{"embeddable":true,"href":"https:\/\/paguilar.org\/index.php?rest_route=\/wp\/v2\/pages\/18"}],"wp:attachment":[{"href":"https:\/\/paguilar.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=17"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}