Electronic Pages |
Die Homepage der Familie Beis |

**The CORDIC Algorithm**

**The "normal way" and the solution without multiplier to understand**

( **Deutsche Version: Der CORDIC-Algorithmus)**

Practically nobody needs the CORDIC algorithm anymore. In all high-level languages, all computers and all smartphones, the sine and cosine function is available as a ready-made function as a matter of course. For many, many years. Who still wants to struggle with it?

Ok, I have done it. Twice, in fact. In my DASG (Digital Audio Sine Wave Generator) I realized it almost 10 years ago in an FPGA with VHDL. And because I had simply forgotten how to do it in the meantime (which annoyed me), I wanted to know again. That's the only reason why I looked into it again, otherwise I wouldn't have an application. Well, the fact that an article came out of it was worth something after all.

I don't even know if I realized it at the time: there are two ways of calculating sine and cosine values:

- One version, which I call the
**Sincos table based**version, requires a sine and a cosine table with some entries and a multiplier to calculate the results. It is a little easier to understand. - The other version, which I call the
**Arctan table based**version, requires an arctangent table with some entries, but no multiplier, only shift and add routines. However, you have to think a bit outside the box to understand it. On the other hand, the many necessary multiplications, whose realization in software with the Sincos table-based version not only requires more code (or hardware), but above all much more time.

In principle, this article is superfluous. Everything is shown more than completely in Wikipedia. However, it took me a very long time to understand at least the most important points of the very scientific article. I don't want to go into the criticism of comprehensibility any further here, but I still recommend the Wikipedia article because it also contains interesting historical aspects, among other things.

So I had resolved to do better in terms of comprehensibility and comprehensibility. After I thought I understood the algorithms, I tried it out in Visual Basic to see if I really got it. And when I did, the "supreme discipline": to implement the Arctan table based version in assembler on a µC that has no hardware multiplier. Finally, I will summarize my experiences in this hopefully comprehensible article.

The CORDIC algorithm is used to iteratively calculate the corresponding sine and cosine values for a given angle α (alpha). I will try 1. to describe this in words, 2. to illustrate it with a graphic and 3. to describe the exact calculation algorithm with a flow diagram.

In the **Sincos table-based version**, this is shown in the following graphic with the **red** vectors for α = +66.66° and works as follows:

- For example, you start with an angle β (beta) = 0. There cos(β) = 1 and sin(β) = 0. This is a vector with the coordinates X = 1 and Y = 0 and the length = 1.
- Then determine in which direction β must be rotated to get closer to α.
- The rotation angle
`Φ`(Phi) of the first rotation is large at the beginning: 90° or alternatively 45°. (Here: 90°, so that the calculation for the entire 360° can be made in one run). - β = β +/-
`Φ`: Rotation is performed by this angle: +90° or -90° (alternatively 45°), in this graphic +90°. (With coordinate rotations, the lengths of the vectors remain constant, here = 1).

- The angle β achieved in this way can be greater or smaller than the target α after each step.
- Now the angle of rotation
`Φ`is halved, i.e. 45° (alternatively 22.5°), and the process continues with step 4 in a loop. In the process, β gets closer and closer to α, and the values for X and Y correspond more and more precisely to the cosine and sine of α.

It makes sense to repeat this loop 4 to 6 until the rotation angles `Φ` are so small that the accuracy does not increase any further because numbers are only calculated with finite resolution. The cosine and sine values of the rotation angles `Φ` are required for the coordinate rotation. These must be available in advance in a sine and cosine table. Multiplications and additions or subtractions are also carried out.

So much for an attempt to provide a rough overview of this CORDIC algorithm in words. The example in the following graphic may make it easier to understand. In the upper half, the vectors of the iteration steps for the Sincos table based version are shown in red with the start value (X, Y) = (1 ,0) and the angles β for the target α = +66.66°, whereby the start value and the first 6 iteration steps are numbered 0 to 6:

The **Arctan table based version** is not completely different, but there are clear differences. This is shown in **green** for the target α = -66.66° in the lower half of the graph:

- For example, you start again at an angle β (beta) = 0, but the length of the start vector that is rotated is significantly smaller than 1 at the beginning and reaches the required length of 1 towards the end of the iteration. It is therefore not a pure coordinate rotation.
- The exact length of this start vector depends on the number of iteration steps, is called K and is approx. 0.60725.
- The first rotation
`Φ`is only 45°, which means that the final result = α is only slightly more than +/-90°, so that a simple additional calculation must be made for +/-180°. - The angle of rotation
`Φ`is only in the first step halved to 22.5°; in all subsequent steps, it is always slightly greater than half of the previous step. As a result, this method can in principle be used to calculate up to approx. +/-99.88°. - Instead of two tables for the sine and one cosine values, only one table is required, which contains the corresponding arctangent values of the rotation angles
`Φ`for the respective iteration steps.

I have created the two algorithms in detail with a less suitable, but not completely unsuitable program:

The sine and cosine tables are calculated from:

** SinTab(I) = sin(90°/2^I)
CosTab(I) = cos(90°/2^I)**

with I = 0 up to the number of iteration steps N - 1 (= N table entries).

The arctangent table is calculated from:

** AtnTab(I) = atn(1/2^I)**

with I = 0 up to the number of iteration steps N - 1 (= N table entries).

The arctangent table for 32 iteration steps then looks like this:

I AtnTab(I)0 0,78539816339745

1 0,46364760900081

2 0,24497866312686

3 0,12435499454676

4 0,06241880999596

5 0,03123983343027

6 0,01562372862048

7 0,00781234106010

8 0,00390623013197

9 0,00195312251648

10 0,00097656218956

11 0,00048828121119

12 0,00024414062015

13 0,00012207031189

14 0,00006103515617

15 0,00003051757812

16 0,00001525878906

17 0,00000762939453

18 0,00000381469727

19 0,00000190734863

20 0,00000095367432

21 0,00000047683716

22 0,00000023841858

23 0,00000011920929

24 0,00000005960464

25 0,00000002980232

26 0,00000001490116

27 0,00000000745058

28 0,00000000372529

29 0,00000000186265

30 0,00000000093132

31 0,00000000046566

The factor K corrects the change in length of the vector over the course of the iteration. K is the reciprocal of the product of all cos(`Φ`) from 0 to the last iteration step applied. For N > 24, **K = 0.607252935008881** or hexadecimal with a value range of 0 to 2^32 corresponding to 0 to 1, **K = 0x9B74EDA8**.

For the 32-bit algorithm in a µC or in other binary computing platforms, the AtnTab(I) would look like this:

.long 0x20000000 ; Φ = atn(2^ 0) = 45°

.long 0x12E4051E ; Φ = atn(2^-1) = 26,565051177078°

.long 0x09FB385B ; Φ = atn(2^-2) = 14,0362434679265°

.long 0x051111D4 ; Φ = atn(2^-3) = 7,1250163489018°

.long 0x028B0D43 ; Φ = atn(2^-4) = 3,57633437499735°

.long 0x0145D7E1 ; Φ = atn(2^-5) = 1,78991060824607°

.long 0x00A2F61E ; Φ = atn(2^-6) = 0,895173710211074°

.long 0x00517C55 ; Φ = atn(2^-7) = 0,447614170860553°

.long 0x0028BE53 ; Φ = atn(2^-8) = 0,223810500368538°

.long 0x00145F2F ; Φ = atn(2^-9) = 0,111905677066207°

.long 0x000A2F98 ; Φ = atn(2^-10) = 5,59528918938037E-02°

.long 0x000517CC ; Φ = atn(2^-11) = 2,79764526170037E-02°

.long 0x00028BE6 ; Φ = atn(2^-12) = 0,013988227142265°

.long 0x000145F3 ; Φ = atn(2^-13) = 6,99411367535292E-03°

.long 0x0000A2FA ; Φ = atn(2^-14) = 3,49705685070401E-03°

.long 0x0000517D ; Φ = atn(2^-15) = 1,74852842698045E-03°

.long 0x000028BE ; Φ = atn(2^-16) = 8,7426421369378E-04°

.long 0x0000145F ; Φ = atn(2^-17) = 4,37132106872335E-04°

.long 0x00000A30 ; Φ = atn(2^-18) = 2,18566053439348E-04°

.long 0x00000518 ; Φ = atn(2^-19) = 1,09283026720071E-04°

.long 0x0000028C ; Φ = atn(2^-20) = 5,46415133600854E-05°

.long 0x00000146 ; Φ = atn(2^-21) = 2,73207566800489E-05°

.long 0x000000A3 ; Φ = atn(2^-22) = 1,36603783400252E-05°

.long 0x00000051 ; Φ = atn(2^-23) = 6,83018917001272E-06°

.long 0x00000029 ; Φ = atn(2^-24) = 3,41509458500637E-06°

.long 0x00000014 ; Φ = atn(2^-25) = 1,70754729250319E-06°

.long 0x0000000A ; Φ = atn(2^-26) = 8,53773646251594E-07°

.long 0x00000005 ; Φ = atn(2^-27) = 4,26886823125797E-07°

.long 0x00000003 ; Φ = atn(2^-28) = 2,13443411562898E-07°

.long 0x00000001 ; Φ = atn(2^-29) = 1,06721705781449E-07°

.long 0x00000001 ; Φ = atn(2^-30) = 5,33608528907246E-08°

.long 0x00000000 ; Φ = atn(2^-31) = 2,66804264453623E-08°

It can be seen that the last iteration step with I = 31 would no longer make sense, as the rotation angle `Φ` is 0. In principle, the same also applies to a calculation with 16-bit or 8-bit numbers for the last step, i.e. for I = 15 or I = 7.

With fewer iteration steps or low resolution, e.g. with 16 iterations and 16-bit calculation accuracy, only the first 16 bits of both the table and K need to be used, sensibly rounded.

It is obvious that the accuracy of the results depends on the resolution (number of digits) used for the calculation and the number of iteration steps. As rounding errors accumulate with each iteration, it must be expected that the last digit will be incorrect by several counters. In an experiment with 32-bit numbers and 32 iteration steps, I observed deviations of a maximum of +/-6 (out of two billion!) in the calculated sine and cosine values for approx. 30 different angles. I find this plausible.

It is advisable to use the same number of bits or digits for the input value α as for the tables, the intermediate registers and the output values.

The entire 16- or 32-bit numerical range for 0 to 360° is of course used as the angle input. In other words: 360° = 0xFFFF+1.

The output is a little "awkward": It is usual to understand the number range from 0 to 65535 with a sign as -32768 (0x8000) to +32767 (0x7FFF) with e.g. 16 bits. 0x8000 scaled as -1 for the sine calculation would be ok, but +1 would then not be displayable. It would be conceivable to scale 0x8001 to 0x7FFF as -1 to +1.

In my implementation, I have scaled 0x4000 as +1 and 0xC000 as -1. The number range 0x4001 to 0xBFFF then remains unused. So 1 bit sign, one bit before the decimal point, 14 bits after the decimal point. This also means that you do not have to start with K = 0x9B74EDA8, but with a quarter of it, i.e. K = 0x26DD3B6A.

Of course, I didn't just want to try all this out with simulations. A meaningful demonstration should also run with a µC with the Arctan table based version, i.e. without a multiplier. The µC is an MSP430Gxxxx, a 16 bit µC from TI. A sinusoidal signal was to be generated and its MSByte converted to analog using the R2R network and displayed on the oscilloscope. And I wanted to find out what the calculation times were for the 16-bit and 32-bit variants. Result: At 16 MHz processor clock, a calculation takes

- with 16 bits approx. 63 µs

- with 32 bits approx. 220 µs

Most of the time is needed for the shift instructions. In the 32 bit version there are approx. 500 of these, each of which requires 7 processor cycles in a loop with loop counter and jump, which results in approx. 500*7/16 MHz = 220 µs. A barrel shifter would be very helpful, but this is not available in the MSP430. At least you can take shortcuts, because shift operations greater than 8 or 16 can be partially replaced by byte or word swapping and deletion, so a total time of approx. 220 µs can be achieved. With 63 µs for a 16-bit calculation, even sinusoidal signals up to several kHz are possible in real time, e.g. for frequency modulation.

For the demonstration, I have set the step size for each sample to 0x0055 (16-bit version) or 0x00550055 (32-bit version). A 360° cycle therefore requires ~771.011 calculations and thus takes a correspondingly long time. To avoid re-quantizing a digital scope for screenshots, I took photos with long exposures from the analog scope. First the 16 bit version, then the 32 bit version. The cursor lines show the operating voltage range, of which only about 50% is utilized due to the +1 problem mentioned above.

I am not publishing the code for the MSP430 here. Not because it is secret, but because it is not well maintained. But in this case, "The main thing is that it works" was enough for me for once. Otherwise, I consider this saying to be an admission of the lowest claim a developer can make of himself and his work, which is a great indictment. But for the estimated 0 people who would be interested in it, I didn't want to make the effort, and I don't need it anymore myself. However, if anyone is interested and is not afraid of anything - let me know.

Last update: September 4^{th}, 2024 |
Questions? Suggestions? Email Me! | Uwe Beis |