Saturday, November 10, 2012

Symbol Resolution, Weak Symbols, How compiler resolves multiple Global Symbols


Whats Symbol Resolution

Linker need to find the definition of all global before generating an .exe or .out file. This process of finding and association each global symbol with a definition is called symbol resolution (again talking in loose term, avoiding shared libs).

For local symbols, resolution is straight forward, the local symbols must be defined in same module. For static local/global variable are also kind of easy, compiler wants them to be defined in same compilation unit i.e. a file.

However, resolving reference for global variable is tricky. When compiler encounter a global undefined symbol, it makes an entry into the linker table, assumption here is that it must be defined in one of other modules.

If after all of its tries, linker is not able to resolve the the global symbols, it generates a undefined symbol.

There is also one more level of difficulty of global symbol resolution, what if global symbol is defined in more than  one linking modules.



How Linker Resolved multiply defined Global symbols, Strong/Weak Linking

At compile time compiler exports each global symbol as strong or weak to the assembler. Assembler add this information into the symbol table.

What goes as Strong symbols
All function definitions, all initialized global variables.

What goes as Weak symbols
Uninitialized global variables.

Rules of resolving multiply defined global symbols
1)  Muliple defined global symbol found, they are given as linker error.
2) Given a strong and a weak symbol(s), linker chooses the strong symbol.
3) If multiple weak symbols, linker is free to choose anything. 

So, be very careful for rule 2 and 3 as you would be in blind spot linker wont stop and give you any error and you may get unexpected results.

e.g consider this example for Rule 2.

/* file1.c */
#include
int glob = 100;
int main()
{
   file2Fun();
   printf("%d", glob);
   return 0;
}



/*file2.c*/

int glob;

void file2Fun()
{
   glob = 200;
}


Well, if you know rule 2 you wont be expecting value of global  as 100 anymore, because file2.c effectively access glob of file1.



 consider this example for Rule 3.

/* file1.c */
#include
int glob ;
int main()
{

   glob = 100;
   file2Fun();
   printf("%d", glob);
   return 0;
}


/*file2.c*/

int glob;

void file2Fun()
{
   glob = 200;
}


IF this is the case, you would never know what would be the value of glob. As both places symbol is weak and linker is free to choose any.

Even more difficult type of buggs would welcome you if suppose in one file glob is define as int and other file as double. If you try to use glob as double variable you may not be able to store those bigger values there. 

1 comment:

Unknown said...

Hi, it's so weird that the when I change the program of rule3, for example, one weak symbol is double , another one is double, my Macbook (clang) 's output is 0.
```
#include
#include
int glob;
void file2Fun();
int main(void)
{
glob = 100.0;
file2Fun();
printf("glob is %d\n", glob);
printf("glob is %f\n", glob);
return 0;
}
```

```
/*file2.c*/
double glob;

void file2Fun()
{
glob = 200.0;
}

```

Could you please tell me why? Thank you very much.