• Print

Author Topic: Precalculated SIN/COS vs QB64 SIN/COS  (Read 201 times)

TerryRitchie

  • Hero Member
  • *****
  • Posts: 2264
  • FORMAT C:\ /Q /U /AUTOTEST (How to repair Win8)
    • Email
Precalculated SIN/COS vs QB64 SIN/COS
« on: March 10, 2013, 11:25:44 AM »
I have a little time today to program. I've often seen or read that many games use precalculated sine and cosine tables to speed things up so I though I would whip something up to see the time savings vs the two. The code below is what I came up with but curiously with my computer (2.4Ghz Q6600) I am only saving about .3 seconds over ~62 million calculations. It would seem to myself anyway that the increase in speed should be much, much greater? Is there something with my code that is horribly inefficient? Is QB64's implementation of SIN/COS really this efficient? Any thoughts would be appreciated. Thanks.

Code: [Select]
DIM Dummy!
DIM Count%
DIM Count2!
DIM s!
DIM e!

Dummy! = NSIN!(1) '  force function to precalc sine table
Dummy! = NCOS!(1) '  force function to precalc cosine table

s! = TIMER(.001)
FOR Count% = 1 TO 10000
    FOR Count2! = 0 TO 6.283 STEP .001
        Dummy! = NSIN!(Count2!)
        Dummy! = NCOS!(Count2!)
    NEXT Count2!
NEXT Count%
e! = TIMER(.001) - s!
PRINT e!

s! = TIMER(.001)
FOR Count% = 1 TO 10000
    FOR Count2! = 0 TO 6.283 STEP .001
        Dummy! = SIN(Count2!)
        Dummy! = COS(Count2!)
    NEXT Count2!
NEXT Count%
e! = TIMER(.001) - s!
PRINT e!

'----------------------------------------------------------------------------------------------------------------------

FUNCTION NCOS! (radian!) STATIC

'** New COSine function

DIM Cosine!(6283)
DIM Calc%
DIM Count%

IF NOT Calc% THEN
    Calc% = -1
    FOR Count% = 0 TO 6283
        Cosine!(Count%) = COS(Count% / 1000)
    NEXT Count%
END IF
NCOS! = Cosine!(radian! * 1000)

END FUNCTION

'----------------------------------------------------------------------------------------------------------------------

FUNCTION NSIN! (radian!) STATIC

'** New SINe function

DIM Sine!(6283)
DIM Calc%
DIM Count%

IF NOT Calc% THEN
    Calc% = -1
    FOR Count% = 0 TO 6283
        Sine!(Count%) = SIN(Count% / 1000)
    NEXT Count%
END IF
NSIN! = Sine!(radian! * 1000)

END FUNCTION

'----------------------------------------------------------------------------------------------------------------------

Clippy

  • Hero Member
  • *****
  • Posts: 16431
  • I LOVE π = 4 * ATN(1)    Use the QB64 WIKI >>>
    • Pete's Qbasic Site
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #1 on: March 10, 2013, 11:37:03 AM »
It's not that the calculations are faster in QB64, it is the fact that every thing else is too. Qbasic needed pre-calculations to save time when doing something else. Calculations just slowed it down more. 
QB64 WIKI: Main Page
Download Q-Basics Code Demo: Q-Basics.zip
Download QB64 BAT, IconAdder and VBS shortcuts: QB64BAT.zip
Download QB64 DLL files in a ZIP: Program64.zip

TerryRitchie

  • Hero Member
  • *****
  • Posts: 2264
  • FORMAT C:\ /Q /U /AUTOTEST (How to repair Win8)
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #2 on: March 10, 2013, 11:50:01 AM »
So precalculating values is something that was needed in older dialects of BASIC and possibly other legacy languages and therefore not needed in today's modern languages?

That makes sense but still seems counter-intuitive to me. The math behind calculating SIN/COS and even SQR is pretty intense. I have no idea what goes on in the math portion of today's processors but it would seem that the Intel engineers have really made these calculations as efficient as possible then.  Perhaps they already incorporate some sort of precalculated tables for "standard" values?

Thanks for the response Clippy.

TerryRitchie

  • Hero Member
  • *****
  • Posts: 2264
  • FORMAT C:\ /Q /U /AUTOTEST (How to repair Win8)
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #3 on: March 10, 2013, 12:05:52 PM »
Holy smokes! Here is the outcome I was looking for. I eliminated the use of functions and now achieve 2.2 seconds versus 7.4 seconds.

So my next obvious question is why are functions running so slowly?

Code: [Select]
DIM Cosine!(6283)
DIM Sine!(6283)
DIM Dummy!
DIM Count%
DIM Count2!
DIM s!
DIM e!

FOR Count% = 0 TO 6283
    Cosine!(Count%) = COS(Count% / 1000)
    Sine!(Count%) = SIN(Count% / 1000)
NEXT Count%

s! = TIMER(.001)
FOR Count% = 1 TO 10000
    FOR Count2! = 0 TO 6.283 STEP .001
        Dummy! = Sine!(Count2! * 1000)
        Dummy! = Cosine!(Count2! * 1000)
        'Dummy! = NSIN!(Count2!)
        'Dummy! = NCOS!(Count2!)
    NEXT Count2!
NEXT Count%
e! = TIMER(.001) - s!
PRINT e!

s! = TIMER(.001)
FOR Count% = 1 TO 10000
    FOR Count2! = 0 TO 6.283 STEP .001
        Dummy! = SIN(Count2!)
        Dummy! = COS(Count2!)
    NEXT Count2!
NEXT Count%
e! = TIMER(.001) - s!
PRINT e!
« Last Edit: March 10, 2013, 12:18:29 PM by TerryRitchie »

TerryRitchie

  • Hero Member
  • *****
  • Posts: 2264
  • FORMAT C:\ /Q /U /AUTOTEST (How to repair Win8)
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #4 on: March 10, 2013, 12:17:03 PM »
Subroutines are a bit faster than functions, but not much. I get 6 seconds vs 7.4 here.

Code: [Select]
DIM Cosine!(6283)
DIM Sine!(6283)
DIM Dummy!
DIM Count%
DIM Count2!
DIM s!
DIM e!

FOR Count% = 0 TO 6283
    Cosine!(Count%) = COS(Count% / 1000)
    Sine!(Count%) = SIN(Count% / 1000)
NEXT Count%

s! = TIMER(.001)
FOR Count% = 1 TO 10000
    FOR Count2! = 0 TO 6.283 STEP .001
        NSIN Dummy!, Count2!
        NCOS Dummy!, Count2!
        'Dummy! = Sine!(Count2! * 1000)
        'Dummy! = Cosine!(Count2! * 1000)
        'Dummy! = NSIN!(Count2!)
        'Dummy! = NCOS!(Count2!)
    NEXT Count2!
NEXT Count%
e! = TIMER(.001) - s!
PRINT e!

s! = TIMER(.001)
FOR Count% = 1 TO 10000
    FOR Count2! = 0 TO 6.283 STEP .001
        Dummy! = SIN(Count2!)
        Dummy! = COS(Count2!)
    NEXT Count2!
NEXT Count%
e! = TIMER(.001) - s!
PRINT e!

'----------------------------------------------------------------------------------------------------------------------

SUB NCOS (result!, radian!)

'** New COSine function

SHARED Cosine!()

result! = Cosine!(radian! * 1000)

END SUB

'----------------------------------------------------------------------------------------------------------------------

SUB NSIN (result!, radian!)

'** New SINe function

SHARED Sine!()

result! = Sine!(radian! * 1000)

END SUB

'----------------------------------------------------------------------------------------------------------------------

Clippy

  • Hero Member
  • *****
  • Posts: 16431
  • I LOVE π = 4 * ATN(1)    Use the QB64 WIKI >>>
    • Pete's Qbasic Site
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #5 on: March 10, 2013, 12:40:20 PM »
Try Darth's fast math library.

Calculations will always slow down program progress a little, you just don't notice a big difference.

Do those calculations during the splash screen...  ;)
QB64 WIKI: Main Page
Download Q-Basics Code Demo: Q-Basics.zip
Download QB64 BAT, IconAdder and VBS shortcuts: QB64BAT.zip
Download QB64 DLL files in a ZIP: Program64.zip

SMcNeill

  • Hero Member
  • *****
  • Posts: 2414
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #6 on: March 10, 2013, 02:06:58 PM »
Another reason to use tables is (if you're like me) you like to work with degrees over radians.  To swap degree to radian, calculate the sin/cos, and then return a value is a lot of math steps.  Do all the work from 0-359 at the start, calculate the figures, and then simply look up the values in a mem block array, and you're working in degrees without much drag at all for the math portion.  It's not much work to precalculate the figures, toss them in an array or memblock, and it actually uses very little memory on today's systems, and the speed increase is well worth it, if speed is a concern of yours.   :)
http://bit.ly/TextImage -- Library of QB64 code to manipulate text and images, as a BM library.
http://bit.ly/Color32 -- A set of color CONST for use in 32 bit mode, as a BI library.

http://bit.ly/DataToDrive - A set of routines to quickly and easily get data to and from the disk.  BI and BM files

mcalkins

  • Hero Member
  • *****
  • Posts: 1269
    • qbasicmichael.com
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #7 on: March 10, 2013, 11:53:41 PM »
non-inline function calls inherently have overhead. QB64 functions have additional overhead.

For SIN, COS, TAN, and ATN, QB64 uses the CRT functions: sin, cos, tan, and atan respectively.

sin and cos are implemented in statically linked code from libmingwex.a, specifically from lib32_libmingwex_a-sin.o and lib32_libmingwex_a-cos.o.

tan and atan are implemented in msvcrt.dll.

I've disassembled and traced through the relevant code:
http://qb64offtopic.freeforums.org/dissassembly-related-to-sin-cos-tan-and-atan-t41.html

I intend to provide alternate functions based with much less overhead and no error checking. If I get lazy, I'll just provide a static library built with Nasm. Otherwise, I'll try to provide inline functions using GCC-style inline assembly.

Why not just stick with radians, and not bother with degrees? It would seem more efficient...

Regards,
Michael
The QBASIC Forum Community: http://www.network54.com/index/10167 Includes off-topic subforums.
QB64 Off-topic subforum: http://qb64offtopic.freeforums.org/

OlDosLover

  • Hero Member
  • *****
  • Posts: 3859
  • OlDosLover
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #8 on: March 11, 2013, 12:17:56 AM »
Hi all,
    I've found gosubs to be pretty fast and rarely use functions. QB64 seems a bit quirky.
OlDosLover.

Mrwhy

  • Hero Member
  • *****
  • Posts: 2890
  • My Dad called me Mr Why when I was 5.
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #9 on: March 11, 2013, 12:40:01 AM »
QB64 calcs are so fast the slowest part is always getting to see the results on the screen!
That is why the "loops of millions" are so fast - you did not bother with intermediate results and left them unrecorded.

DarthWho

  • Hero Member
  • *****
  • Posts: 3853
  • Timelord of the Sith
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #10 on: March 11, 2013, 01:03:27 AM »
I find that figuring out the problem domain leas me to the best results;
EX:
Problem domain:
Continuous data; specific interval; needs to be decently accurate leads to this; SIN/COS:
Code: [Select]
Aa = 0.001
s! = TIMER(.001)
FOR Count% = 1 TO 10000
    ss = 0
    Cc = 1
    FOR Count2! = 0 TO 6.283 STEP .001
        Dummy! = Cs
        Dummy! = ss
        Cc = Cc - ss * Aa
        ss = ss + Cc * Aa
    NEXT Count2!
NEXT Count%
m! = TIMER(.001) - s!
PRINT m!
always ask yourself what your problem domain is that is the best advice I can give for optimization.

but anyway Terry built in functions have been getting faster at a rate exceeding the speed that of memory calls of any kind; thus again solving for the problem domain is a big boon.
Rassilon: My lord Doctor; My lord Master; My lord DarthWho
The Doctor and the master at the same time :WHAT!?!?!

FastMath 1.1.0 released: http://dl.dropbox.com/u/12359848/fastmath.h

mcalkins

  • Hero Member
  • *****
  • Posts: 1269
    • qbasicmichael.com
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #11 on: March 11, 2013, 03:56:19 AM »
You might give this a go:

custtrig.h
Code: [Select]
// I do not guarantee the correctness of this code.

// compile with -O2 or -O3 to get function inlining.

inline double alt_sin(double x) {
 asm ("fsin"         : "+t" (x));
 return x;
}

inline double alt_cos(double x) {
 asm ("fcos"         : "+t" (x));
 return x;
}

inline double alt_tan(double x) {
 asm ("fptan    \n"              // nasm documentation 0.99.02 fails to mention that fptan also pushes 1 onto x87 stack
      "ffreep %%st(0)" : "+t" (x)
                      :
                      : "st(7)"); // I'm not sure if I'm doing this right.
 return x;
}

inline double alt_atan(double x) {
 asm ("fld1    \n"
      "fpatan"        : "+t" (x)
                      :
                      : "st(7)"); // I'm not sure if I'm doing this right.
 return x;
}

// http://gcc.gnu.org/onlinedocs/gcc-4.7.2/gcc/Extended-Asm.html#Extended-Asm

Code: [Select]
$CHECKING:OFF

'add:
'  -O3 -fno-strict-aliasing -fno-strict-overflow
'to the compiler command line in recompile.bat

DECLARE LIBRARY "custtrig"
 FUNCTION alt_sin# (BYVAL x#)
 FUNCTION alt_cos# (BYVAL x#)
 FUNCTION alt_tan# (BYVAL x#)
 FUNCTION alt_atan# (BYVAL x#)
END DECLARE

DIM s AS SINGLE, e AS SINGLE
DIM count AS LONG
DIM count2 AS DOUBLE, dummy AS DOUBLE

PRINT alt_sin(1), SIN(1)
PRINT alt_cos(1), COS(1)
PRINT alt_tan(1), TAN(1)
PRINT alt_atan(1), ATN(1)
PRINT alt_sin(2), SIN(2)
PRINT alt_cos(2), COS(2)
PRINT alt_tan(2), TAN(2)
PRINT alt_atan(2), ATN(2)

s = TIMER(.001)
FOR count = 1 TO 10000
 FOR count2 = 0 TO 6.283# STEP .001#
  dummy = SIN(count2)
  dummy = COS(count2)
 NEXT
NEXT
e = TIMER(.001) - s
PRINT e, "sin/cos"

s = TIMER(.001)
FOR count = 1 TO 10000
 FOR count2 = 0 TO 6.283# STEP .001#
  dummy = TAN(count2)
  dummy = ATN(count2)
 NEXT
NEXT
e = TIMER(.001) - s
PRINT e, "tan/atn"

s = TIMER(.001)
FOR count = 1 TO 10000
 FOR count2 = 0 TO 6.283# STEP .001#
  dummy = alt_sin(count2)
  dummy = alt_cos(count2)
 NEXT
NEXT
e = TIMER(.001) - s
PRINT e, "alt_sin/alt_cos"

s = TIMER(.001)
FOR count = 1 TO 10000
 FOR count2 = 0 TO 6.283# STEP .001#
  dummy = alt_tan(count2)
  dummy = alt_atan(count2)
 NEXT
NEXT
e = TIMER(.001) - s
PRINT e, "alt_tan/alt_atan"

s = TIMER(.001)
FOR count = 1 TO 10000
 FOR count2 = 0 TO 6.283# STEP .001#
  dummy = 0
  dummy = 0
 NEXT
NEXT
e = TIMER(.001) - s
PRINT e, "empty"

PRINT "press any key": SLEEP: SYSTEM

Please compare it with and without modifying recompile.bat.

Regards,
Michael

P.S. Here's a C++ version, for comparison.

Code: [Select]
#include <windows.h>
#include <stdio.h>
#include <math.h>

#include "custtrig.h"

int main () {
 DWORD s, e;
 UINT count;
 double count2;
 volatile double dummy;  // volatile

 s = GetTickCount();
 for (count = 1; count <= 10000; ++count) {
  for (count2 = 0; count2 <= 6.283; count2 += 0.001) {
   dummy = sin(*(volatile double*) &count2);     // cast through pointer to volatile double
   dummy = cos(*(volatile double*) &count2);
  }
 }
 e = GetTickCount();
 printf("%u\t sin/cos\n", e - s);

 s = GetTickCount();
 for (count = 1; count <= 10000; ++count) {
  for (count2 = 0; count2 <= 6.283; count2 += 0.001) {
   dummy = tan(*(volatile double*) &count2);
   dummy = atan(*(volatile double*) &count2);
  }
 }
 e = GetTickCount();
 printf("%u\t tan/atan\n", e - s);

 s = GetTickCount();
 for (count = 1; count <= 10000; ++count) {
  for (count2 = 0; count2 <= 6.283; count2 += 0.001) {
   dummy = alt_sin(*(volatile double*) &count2);
   dummy = alt_cos(*(volatile double*) &count2);
  }
 }
 e = GetTickCount();
 printf("%u\t alt_sin/alt_cos\n", e - s);

 s = GetTickCount();
 for (count = 1; count <= 10000; ++count) {
  for (count2 = 0; count2 <= 6.283; count2 += 0.001) {
   dummy = alt_tan(*(volatile double*) &count2);
   dummy = alt_atan(*(volatile double*) &count2);
  }
 }
 e = GetTickCount();
 printf("%u\t alt_tan/alt_atan\n", e - s);

 s = GetTickCount();
 for (count = 1; count <= 10000; ++count) {
  for (count2 = 0; count2 <= 6.283; count2 += 0.001) {
   dummy = 0;
   dummy = 0;
  }
 }
 e = GetTickCount();
 printf("%u\t empty\n", e - s);

 return 0;
}

It lacks a lot of the overhead of the QB64 code. However, I had to use volatile to prevent the trig from being optimized away.

Typical results for me are approximately:

QB64 normal
12.8
26
11.9
16.1
1.7

QB64 with -O3 -fno-strict-aliasing -fno-strict-overflow
12.4
26
9.5
13.4
1

C++ without -O3
10.3
23.9
10.1
14.1
.4

C++ with -O3
10.5
23.5
7.4
11.1
.2
« Last Edit: March 11, 2013, 05:19:32 AM by mcalkins »
The QBASIC Forum Community: http://www.network54.com/index/10167 Includes off-topic subforums.
QB64 Off-topic subforum: http://qb64offtopic.freeforums.org/

codeguy

  • Hero Member
  • *****
  • Posts: 3552
  • what the h3ll did i name that code?
    • stuff at dkm
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #12 on: March 11, 2013, 08:55:48 AM »
the overhead for array access is what causes the minimal difference. back in the old days, arithmetic units were not nearly as fast as they are now. qb45 had a purposely introduced delay to allow time for this unit so software would not run faster than calculations could be made. the only time it makes sense now is if there are many transcendental functions involved (roots, exponentiation, etc).
http://denteddisk.forums-free.com/make-an-appointment-with-the-resident-code-guru-f34.html

SkyCharger001

  • Hero Member
  • *****
  • Posts: 1594
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #13 on: March 11, 2013, 09:07:26 AM »
I was thinking: most of the array-overhead could be avoided by using _MEM.

mcalkins

  • Hero Member
  • *****
  • Posts: 1269
    • qbasicmichael.com
    • Email
Re: Precalculated SIN/COS vs QB64 SIN/COS
« Reply #14 on: March 11, 2013, 05:03:18 PM »
Arrays and _MEM should be about the same. Both use indexing off of a base address. The slow part is accessing memory, especially if there is a cache miss.

Just to avoid confusion, none of my previous tests were using look up tables.

Here's a modification of Terry's look up table test:

Code: [Select]
$CHECKING:OFF
DIM Cosine(6283) AS SINGLE
DIM Sine(6283) AS SINGLE
DIM Dummy AS SINGLE
DIM Count AS LONG
DIM Count2 AS LONG
DIM s AS SINGLE
DIM e AS SINGLE

FOR Count = 0 TO 6283
 Cosine(Count) = COS(Count) ' / 1000)
 Sine(Count) = SIN(Count) ' / 1000)
NEXT Count

s = TIMER(.001)
FOR Count = 1 TO 10000
 FOR Count2 = 0 TO 6283
  Dummy = Sine(Count2)
  Dummy = Cosine(Count2)
 NEXT
NEXT
e = TIMER(.001) - s
PRINT e

s = TIMER(.001)
FOR Count = 1 TO 10000
 FOR Count2 = 0 TO 6283
  Dummy = SIN(Count2)
  Dummy = COS(Count2)
 NEXT
NEXT
e = TIMER(.001) - s
PRINT e

I tend to get about:
2.2
11.7

and with  -O3 -fno-strict-aliasing -fno-strict-overflow , about:
1
11.5

Although I ordinarily prefer DOUBLEs, SINGLEs are probably better for large lookup tables, to reduce virtual memory usage, page faults, and possibly cache misses for values relatively near each other.

Unless you have gigantic lookup tables, how are you going to beat the precision of the x87?

SSE2 might have some trig capability also. However, SSE2 calculates using DOUBLEs, whereas x87 calculates using extended precision _FLOATs, and then rounds to DOUBLE upon request. See:
http://en.wikipedia.org/wiki/SSE2#Differences_between_x87_FPU_and_SSE2

Regards,
Michael
The QBASIC Forum Community: http://www.network54.com/index/10167 Includes off-topic subforums.
QB64 Off-topic subforum: http://qb64offtopic.freeforums.org/

  • Print