EE443_L05_CtypeReview
EE443_L05_CtypeReview
Özdemirel
Lecture 5
Review of C Data Types
Contents:
Following examples give the formal declarations for integer data types and the
corresponding numeric range:
signed char MySbyte; // ‐128..+127
unsigned char MyUbyte; // 0..+255
signed short int MySSint; // ‐32,768..+32,767
unsigned short int MyUSint; // 0..+65,535
signed long int MySLint; // ‐2,147,483,648..+2,147,483,647
unsigned long int MyULint; // 0..+4,294,967,295
C compilers accept declaration of integer data types omitting one or more of the
keywords. We often try to minimize memory usage and maximize processing speed
in programming projects for embedded systems. For this reason, we should use the
formal declarations to choose the optimum data type that serves our purpose.
Standard C language has no specific data type to store strings or data in basic
text format. Character strings are stored in simple arrays of char data type. The
standard C library functions described in <string.h> definition file provide the
necessary set of string processing tools (e.g. comparison, search, concatenation,
etc.). All of these string processing functions operate on arrays of characters.
Note that, if a function declares a local variable with the same name given to a
global variable then a second variable is created. The main program given below
declares Counter as a local variable. All access to Counter in the main program
involves the local variable only, and it is independent of the global Counter.
signed short int Counter = 0; // declare global variable
int main()
{ // Local variables of main function:
signed short int Counter;
signed short int Limit;
unsigned char Abyte, Bbyte;
. . .
if (Counter == Limit)
{ Abyte = 0;
. . .
}
Group of functions that perform the tasks related to a certain purpose or target
hardware are usually written as library modules in separate source files. Dividing a
big programming project into several source files has several advantages compared
to keeping all of the source code in a single long file :
1. Shorter source files can be edited more efficiently without wasting time scrolling
through a single big file.
2. Variables and other critical data related to a specific group of tasks can be
isolated from the other parts of the program.
3. When there are several engineers working on a project, each engineer can work
on a separate module.
The C compiler processes the source files of a project seperately and it creates
an object code file corresponding to each source file. In the next compilation step, a
linker utility makes the necessary connections between all object codes and creates
a single executable code for the project.
Global variables declared in a source file can be made accessible in the other
files of a project by using the extern attribute. In the following example, main
procedure and UpdateCounter are written in separate files. Counter is declared as a
global variable in the file CounterSource.c. Specification of Counter with the extern
attribute in MainSource.c does not create a global variable. It just tells the compiler
that Counter variable was declared in another source file of the project. Similarly, the
prototype declaration of UpdateCounter specifies the parameters of this function and
what it returns at the end. All of this information is used to establish the necessary
links between the object codes when the linker creates the executable code.
// Source file: MainSource.c
// Specify Counter as an external variable declared in another source
file:
extern signed short int Counter;
// Prototype of UpdateCounter function defined in another source file:
void UpdateCounter(signed short int Increment);
int main()
{ // Local variables of main function:
signed short int Limit;
unsigned char Abyte, Bbyte;
. . .
if (Counter == Limit)
{ Abyte = 0;
. . .
}
5.2 Pointers in C
Pointers are stored in memory and accessed through the symbols we define
just like any other variable. The numeric value stored in a pointer serves as an
address, and the C compiler gives us the necessary means to access data utilizing
this address information. Pointers are declared placing an asterisk (*) before the
variable name:
signed char *SbytePtr; // pointer to signed char
unsigned short int *USintPtr; // pointer to short int
signed long int *LintPtr; // pointer to long int
Each pointer stores the address of the first byte of the sequential memory locations
reserved for a particular data type. The numeric variables declared in the previous
section use up different number of bytes depending on the data type. MySbyte
requires one byte, MyUSint requires two bytes and MySLint requires four bytes. All
of the pointers declared above require the same amount of memory that depends on
the computer system or the microprocessor we are programming. A typical
microprocessor with 16-bit address bus uses two bytes of memory to store the
address in a pointer. Similarly, a C program compiled for another system with 32-bit
addressing would require four bytes of memory for each pointer.
All pointers use the same amount of physical memory regardless of the data
type they are declared for. We are still required to specify the data type in pointer
declarations because the compiler determines the operations in executable code
according to the data type for a pointer. In other words, this is the way we tell the
compiler what to do with the data accessed through a pointer as we will see in the
following sections.
Similarly, we can read the pointer itself or the data addressed by a pointer on the
right hand side of an assignment:
1. Read the address stored by the pointer: The <target> must be an address
variable (i.e. another pointer) compatible with the pointer data type:
<target> = MyPointer;
2. Read the data at the memory location addressed by the pointer: The
<target> must be compatible with the data type addressed by the pointer:
<target> = *MyPointer;
The asterisk character used before a pointer in the left or right hand side of
executable statements means "access through" the address stored in that pointer.
The ampersand (&) character allows us to obtain the address of a variable in a C
program, reversing the function of "*" in a sense. It simply means "address of"
when it precedes a symbol defined in the program. Consider the group of
declarations given below:
unsigned char Byte1, Byte2; // two 1‐byte integers
unsigned char *BytePtr; // pointer to unsigned char
The following three statements,
BytePtr = &Byte1; // copy address of Byte1 into BytePtr
*BytePtr = 99; // 99 goes to Byte1, through BytePtr
Byte2 = *BytePtr; // Byte1 is copied to Byte2, through BytePtr
produce the same result that will be obtained with the statements,
5.3.1 Unions
Unions allow different access ways to a memory location. Syntax rules for
declaration and usage of unions are similar to those of structures, but the resultant
memory organization is not the same. Members of a structure have their individual
memory locations. Members of a union on the other hand, share the same memory
location.
The following example defines a structure of two bytes, and then a union of an
unsigned integer with the defined two-byte structure. The first member of the union,
USint, is accessible as a single unsigned number. The same two-byte memory
location is accessible as two detached bytes through the two-byte structure.
typedef struct // type definition of a structure of two bytes
{ unsigned char L;
unsigned char H;
} TwoBytes;
typedef union // type definition of a short int <=> TwoBytes union
{ unsigned short int USint;
TwoBytes bytes;
} USint_2byte;
USint_2byte Number; // declare a union of short int <=> TwoBytes
Number.USint = 0x1A2B; // write a short integer to the union
PORTA = Number.bytes.H; // send out the high byte = 0x1A
PORTA = Number.bytes.L; // send out the low byte = 0x2B
The total memory reserved for the Number union is two bytes in this example. The
first statement after the declaration writes a two-byte integer to the union memory.
Other statements send the same data to PORTA one byte at a time.
memory
contents
...
0x1A Number.bytes.H
Number.USint
0x2B Number.bytes.L
...
The same output can be obtained with the following code that uses right shift
operations:
unsigned short int Dout; // declare a short int
Dout = 0x1A2B;
PortA = Dout >> 8; // send out the high byte = 0x1A
// this will require many instructions
PortA = Dout; // send out the low byte = 0x2B
Although this alternative source code looks simpler, the resultant executable code
implements repeated right shift operations probably in a loop that will take a much
longer time to complete. The first code example using the union is longer because of
the type definitions. The type definitions are handled by the compiler, and the
resultant executable code is more efficient. The unions in C language provide
flexible and efficient usage of memory. The same efficiency can be obtained in
assembly language but with some difficulty in programming.
Input data from an ADC, a timer or another peripheral unit are read one byte at a
time. If there is any buffering requirement, then the input data is stored as a
sequence of bytes.
Data sent to a DAC, a PWM controller or a similar output device are written one
byte at a time, and/or buffered as a sequence of bytes.
Different data types or data structures are stored in a single memory block that is
allocated as a sequence of bytes.
Type casting can also be applied to the memory access through pointers which
is the main focus of this section. The following statements copy four bytes starting at
the address given by BufferPtr into the variable Linteger or into the memory
location pointed to by LongPtr:
Linteger = *(long int *)BufferPtr;
or
*LongPtr = *(long int *)BufferPtr;
The figure given below summarizes the logic behind the type casting operation.
Basically, we tell the C compiler to treat BufferPtr as a pointer to a group of four
bytes when we cast it into a new type with "(long int *)".
*LongPtr = *(long int *)BufferPtr;
pointer to char
pointer to
long int 4-byte long int addressed pointer to long int
through a pointer after type casting
The statements given above can be used for transferring data from an input byte
stream received from another computer system. The reverse operation is possible
applying the sam<me type casting operation on the left hand side of the assignment.
The following statements can be used for copying four bytes into an output buffer that
will be sent out:
*(long int *)BufferPtr = Linteger;
or
*(long int *)BufferPtr = *LongPtr;
The same type casting operations can be used for the structure data type
defined in the previous section:
// Copy 9 bytes from the buffer starting at BufferPtr:
WFparam1 = *(WaveformStruct *)BufferPtr; // directly into data storage
*WFparPtr = *(WaveformStruct *)BufferPtr; // through a pointer to data
Pointers and the related operations described in this document are powerful
features of the C programming language. They all come with serious responsibility
just as any power that needs to be controlled. Programmers should be careful about
the memory requirements especially when they are working on embedded systems
where memory and other resources are minimized. The following statement copies 9
bytes into the buffer starting at the memory location pointed by BufferPtr as we
have seen previously:
*(WaveformStruct *)BufferPtr = WFparam1;
It is the programmer's responsibility to make sure that there is enough space left in
the buffer every time this statement is executed. If the BufferPtr points to the last 3
bytes of the memory area originally reserved for the buffer, then the remaining 6
bytes will be written outside the reserved area. These 6 bytes originating from
WFparam1 will overwrite the other variables stored in the data memory space with
irrelevant values. The C compiler cannot detect this kind of programming errors that
will cause unpredictable program behavior during execution because of the corrupt
memory contents.
Default Settings
At the beginning of this lecture, it is mentioned that the formal declarations,
without omitting optional keywords, should be used to choose the optimum data type.
Avoiding compatibility problems is another reason why we should use formal
declarations. If the "int" keyword is used without the "short" or "long" specification
to generate code for a 32-bit processor, then the compiler is likely to understand it as
a 32-bit integer by default. If the "int" keyword is used alone on an 8-bit processor
then it will probably give a 16-bit integer. Therefore, it is a good practice to use
formal declarations not only for embedded code, but also in programs running on
other computers that communicate with embedded systems.
The big endian order sounds logical, since we write numbers on the paper
starting with the most significant digit. On the other hand, we should remember that
we start with the least significant digit when we add numbers, just like an 8-bit
processor that starts with the least significant byte while adding 16 or 32-bit integers.
The two byte order schemes are common in systems of all sizes. If two
computer systems use different byte orders, then the low and high order bytes of all
multiple byte numbers must be swapped when they exchange data. Note that an 8-
bit processor hardware may support both little endian and big endian byte orders.
A programmer should check for available assembler or compiler directives to change
the byte order before attempting to solve this incompatibility problem by writing byte
swap procedures.
Similar examples can be given for 32-bit memory word size: Four-byte long
int variables should be aligned with the memory addresses that are integer multiples
of 4. Similarly, 2-byte integers should have addresses that are integer multiples of 2.
Compilers follow a set of rules when reserving storage for data structures to optimize
memory access according to the specifications of computer systems. Refer to the
article on "data structure alignment" on http://en.wikipedia.org for more information.
The insertion of unused bytes for optimized memory alignment is called data
structure padding. Consider the following structure definition for example:
typedef struct
{ unsigned char UC;
signed short int SSI;
signed char SC;
signed long int SLI;
} BadStruct;
If the alignment requirements are ignored, this structure requires 1+2+1+4=8 bytes of
memory. On a typical 32-bit system, the C compiler will replace this structure
definition with the following structure after the data structure padding required for
alignment:
typedef struct
{ unsigned char UC;
char DummyA; // padding to align SSI
signed short int SSI;
signed char SC;