• Print

Author Topic: data type discussion  (Read 375 times)

mcalkins

  • Hero Member
  • *****
  • Posts: 1279
    • qbasicmichael.com
    • Email
data type discussion
« on: July 20, 2012, 01:19:33 PM »
This is in response to a question in the Beginner's forum.

Quote from: thacket on July 19, 2012, 12:07:10 PM
Hello to All,

I am having a great time learning QB and QB64; Well done on the interface and the stability. I do have a question about functions called from C concerning numeric data types. With the signed data types in C: SHORT=2 bytes, INT=4 bytes(sometimes), LONG=4 bytes(sometimes), LONG LONG=8 bytes, FLOAT=4 bytes, DOUBLE=8 bytes, LONG DOUBLE=8 bytes(sometimes).

If I read the QB64 documentation correctly signed data types are INTEGER=2 bytes, LONG=4 bytes, _INTEGER64=8 bytes, SINGLE=4 bytes, DOUBLE=8 bytes, _FLOAT=32 bytes using 80 bits of that.

When calling C Functions from QB64, I can see where you could get into trouble if you called one type from C and used an incorrect type from QB64 to hold or modify values. I listed the signed data types but bit shifting large unsigned integers or computing normal floating point operations could be a real mess if you got it wrong. Also, if a function called from C returns a LONG value and you accidentally add a QB64 suffix symbol different from the return type in C, what happens?

Am I blowing this out of proportion?
Does the linker/compiler check for these kinds of differences and if so, does is give errors?
Would documenting the differences and common traits between the C data types and QB64 data types be of any use?

I appreciate your patience and apologize for the lengthy text. Any information would be appreciated.

All the Best,
John

Perhaps it would be good to give a specific example demonstrating your concern.

Are you focusing on one platform specifically, or are you trying to stay multiplatform?

To clarify, are you calling QB64 functions from C/C++, or just calling C/C++ functions from QB64? Most of my discussion assumes the latter. I wrote most of this post before realizing that you might have meant the former.

This thread has some useful information, but it is specific to the Windows platform:
http://www.qb64.net/forum/index.php?topic=4527.0

In all cases, in both QB64 and C/C++, the unsigned types are the same size as the signed types.

In QB64:
_BYTE is 1 byte
INTEGER is 2 bytes
LONG is 4 bytes
_INTEGER64 is 8 bytes
SINGLE is 4 bytes
DOUBLE is 8 bytes
_FLOAT is 32 bytes. How many are used depends on implementation, currently 10.
_OFFSET is the size of a memory offset (near pointer) on particular platform/architecture/mode. This is currently either 4 bytes or 8 bytes.



C/C++ officially is rather vague on the sizes of some of the data types:
http://en.wikipedia.org/wiki/C_data_types
http://msdn.microsoft.com/en-us/library/58b0106a-0406-4b74-a430-7cbd315c0f89%28v=vs.90%29

However, as that msdn link, as well as:
http://msdn.microsoft.com/en-us/library/s3f49ktz%28v=vs.90%29.aspx
show, the Microsoft Visual C++ 2008 compiler is specific about the sizes of the data types, on both 32 bit and 64 bit platforms:

bool is 1 byte
char is 1 byte
short is 2 bytes
int is 4 bytes
long is 4 bytes
long long is 8 bytes
float is 4 bytes
double is 8 bytes
long double is 8 bytes

mingw-w32, 32 bit platform, which is used by QB64, is identical to that, with the exception of long double. In mingw-w32, long double is 12 bytes, although by default, only 8 bytes are used. Galleon had to go out of his way to get it to use 10 bytes. ( http://www.qb64.net/forum/index.php?topic=5123.0 )

You can verify with whatever compiler you use:
Code: [Select]
#include <stdio.h>
int main(void) {
 printf("bool        %i\n", sizeof(bool));
 printf("char        %i\n", sizeof(char));
 printf("short       %i\n", sizeof(short));
 printf("int         %i\n", sizeof(int));
 printf("long        %i\n", sizeof(long));
 printf("long long   %i\n", sizeof(long long));
 printf("float       %i\n", sizeof(float));
 printf("double      %i\n", sizeof(double));
 printf("long double %i\n", sizeof(long double));
 return 0;
}

Also, note that the Windows platform headers create a bunch of types with uppercase names. Some of them are listed here:
http://msdn.microsoft.com/en-us/library/aa383751(v=vs.85).aspx
Note, for example, that lowercase char is a built in type in C++, but that uppercase CHAR is a type created by the Windows headers. Note, for example, that bool is 1 byte, BOOL is 4 bytes, and BOOLEAN is 1 byte.  :D ::) ;D

You might take a look at "internal\c\common.cpp".



Trying to get a little more practical:

There is a matter of source code compatibility and binary compatibility.

In the matter of source code compatibility, you have a certain amount of compiler enforcement, although compilers sometimes allow implicit or explicit casts (perhaps involving zero extension or sign extension ) ( http://msdn.microsoft.com/en-us/library/aetzh118%28v=vs.90%29 ). When it comes to source code, it matters what particular compiler you are using, for example, whether a long double is 8 bytes or 12 bytes very much depends on the compiler. int could be 16 bits, 32 bits, or 64 bits, or more, at the compiler's discretion

When it comes to binary compatibility, you could be dealing with code compiled by two separate compilers. Generally, there is no enforced type safety whatsoever. However, as long as you match the two, you are good. For example, if you are linking to a Microsoft DLL that uses a DWORD, which you know is 4 bytes. You can use whatever 4 byte unsigned integer type that is available in your compiler, and it will work. (If you are using the platform headers, then you'll have enforcement because of the header.)

DECLARE DYNAMIC LIBRARY and DECLARE CUSTOMTYPE LIBRARY both entirely bypass all type safety. Type safety becomes entirely your responsibility.



You asked what happens if you mismatch the types.

Let's assume that you mismatch the types for memory access. Let's say that you have a function that wants to write a 4 byte DWORD value to an address that you specify. If you give it the address of a QB64 LONG, all is well. If you give it the address of an _INTEGER64 variable, it will write the first 4 bytes, and leave the second 4 bytes unchanged. If you give it the address of a QB64 INTEGER variable, it will write 4 bytes, clobbering whatever was in the 2 bytes after the variable.

Code: [Select]
'minimum Windows versions: XP SP1 / 2003
DECLARE DYNAMIC LIBRARY "kernel32"
 FUNCTION GetCurrentProcess%& ()
 FUNCTION GetProcessHandleCount& (BYVAL hProcess%&, BYVAL pdwHandleCount%&)
 FUNCTION GetLastError~& ()
END DECLARE

'pdwHandleCount is supposed to be the _OFFSET() of a 4 byte unsigned integer.

DIM me AS _OFFSET
DIM i AS LONG

DIM TwoByteVar(0 TO 3) AS _UNSIGNED INTEGER
DIM FourByteVar(0 TO 3) AS _UNSIGNED LONG
DIM EightByteVar(0 TO 3) AS _INTEGER64

me = GetCurrentProcess

FOR i = 0 TO 3
 TwoByteVar(i) = &H1111
 FourByteVar(i) = &H11111111
 EightByteVar(i) = &H1111111111111111
NEXT

IF 0 = GetProcessHandleCount(me, _OFFSET(TwoByteVar(1))) THEN PRINT "Failed. 0x" + hexd(GetLastError)
IF 0 = GetProcessHandleCount(me, _OFFSET(FourByteVar(1))) THEN PRINT "Failed. 0x" + hexd(GetLastError)
IF 0 = GetProcessHandleCount(me, _OFFSET(EightByteVar(1))) THEN PRINT "Failed. 0x" + hexd(GetLastError)

PRINT "Two byte variable array:"
FOR i = 0 TO 3
 PRINT hexw(TwoByteVar(i)); ",";
NEXT
PRINT: PRINT

PRINT "Four byte variable array:"
FOR i = 0 TO 3
 PRINT hexd(FourByteVar(i)); ",";
NEXT
PRINT: PRINT

PRINT "Eight byte variable array:"
FOR i = 0 TO 3
 PRINT hexq(EightByteVar(i)); ",";
NEXT
END

'$include:'hexx.bi'
' http://www.qb64.net/forum/index.php?topic=4491.msg58252#msg58252

You can see that it is only by supplying the address of a smaller than expected variable that you get memory corruption.



What if you get a mismatch in the return value or the parameter list? That depends very much on what platform you are on. It depends very much on the exact calling convention that is used for the function calls. I can partially speak to how it works for integers on the x86 Win32 platform, but I'm uncertain how it works for floating point types, and I have no knowledge of how it works for the other platforms.

Integers are actually returned in a CPU register. 1 byte in AL, 2 bytes in AX, 4 bytes in EAX, and 8 bytes in EDX and EAX, if I recall correctly. If you specify a smaller type than what the function returns, it is not a question of overwriting memory. I don't recall if register preservation would be an issue in these cases. (For example: I don't recall if functions are allowed to clobber edx if it isn't supposed to be used for a return value.) If you specify a larger type, then the extra space will just contain garbage.

Code: [Select]
DECLARE DYNAMIC LIBRARY "kernel32"
 FUNCTION pretendItReturnsOneByte~%% ALIAS "GetCurrentProcessId" ()
 FUNCTION pretendItReturnsTwoBytes~% ALIAS "GetCurrentProcessId" ()
 FUNCTION GetCurrentProcessId~& ()
 FUNCTION pretendItReturnsEightBytes~&& ALIAS "GetCurrentProcessId" ()
END DECLARE
' the return value is supposed to be a 4 byte unsigned integer.

PRINT hexb(pretendItReturnsOneByte)
'this might be unsafe if the compiler expects AH to be preserved. I don't know.

PRINT hexw(pretendItReturnsTwoBytes)
'this might be unsafe if the compiler expects the high 2 bytes of EAX to be preserved. I don't know.

PRINT hexd(GetCurrentProcessId)

PRINT hexq(pretendItReturnsEightBytes)
'this is safe, but the high 4 bytes are garbage
END

'$include:'hexx.bi'
' http://www.qb64.net/forum/index.php?topic=4491.msg58252#msg58252

I am unsure whether floating point types are supposed to be returned in the normal registers, or in an x87 floating point register...

What about directly in the parameter list? Parameters are placed on the stack, and (on 32 bit x86) always take up a multiple of 4 bytes. So, even a parameter that is supposed to be a 1 or 2 byte value will take up 4 bytes on the stack, the extra space being considered garbage. In other words, stack parameters on Win32 are always aligned on a 4 byte boundary.

delme.h
Code: [Select]
void ParamTest(int n0, int n1, int n2)
{
 char b[29];
 sprintf(b, "%08x, %08x, %08x", n0, n1, n2);
 MessageBoxA(0, b, "Parameters:", MB_OK);
}

Code: [Select]
DECLARE CUSTOMTYPE LIBRARY "delme"
 SUB ParamTest (BYVAL n0&, BYVAL n1&, BYVAL n2&)
 SUB OneTwoNothing ALIAS "ParamTest" (BYVAL n0%%, BYVAL n1%)
 SUB EightFour ALIAS "ParamTest" (BYVAL n0&&, BYVAL n1&)
END DECLARE

_DELAY 1
PRINT "normal"
ParamTest 1, 2, 3
PRINT "normal"
ParamTest -1, -2, -3
PRINT
PRINT "notice that the third parameter is garbage. in this case, the compiler zero extended the first two parameters, but you shouldn't count on it. however, notice that the 2nd parameter is still correctly aligned."
OneTwoNothing 4, 5
PRINT
PRINT "notice that the one 8 byte value spans the first two parameters"
EightFour &H1122334455667788, &HAAAAAAAA
END

You can get away with a variable amount parameters when you are using __cdecl functions. With __cdecl functions, removing the parameters from the stack is the responsibility of the caller. Therefore, __cdecl functions like printf can accept a variable number of parameters. However, QB64 has a known bug, in that it doesn't clear the parameters for __cdecl functions using DECLARE DYNAMIC LIBRARY, effectively causing a stack leak. ( http://www.qb64.net/forum/index.php?topic=4566.0 )

You could make a __cdecl function like printf() read garbage, by making it read more parameters than you supplied. You could crash a program by doing that if you cause it to try to read past the top of the stack, causing an access violation.

Code: [Select]
DECLARE CUSTOMTYPE LIBRARY
 FUNCTION sprintf& (BYVAL buffer%&, BYVAL format%&)
END DECLARE

DIM buffer AS STRING * 8192
DIM format AS STRING
DIM n AS LONG

format = CHR$(0)
DO
 format = "%x," + format
 n = sprintf(_OFFSET(buffer), _OFFSET(format))
 PRINT LEFT$(buffer, n)
 PRINT
 PRINT
 _DELAY .2
LOOP UNTIL n >= 4096
END

__stdcall (also known as WINAPI or CALLBACK) functions are not flexible like that. With __stdcall functions, the function itself clears the parameters off of the stack. For this reason, it has the total number of bytes taken up by the parameters hard coded into its return instruction. If you supply too many parameters, the function will only clear some of them, leaking the rest. This probably won't crash your program, unless you run out of stack space. If you supply too few parameters, the functions clears too much data off of the stack, which could very easily crash the program.

Code: [Select]
DECLARE DYNAMIC LIBRARY "kernel32"
 FUNCTION GetStdHandle%& (BYVAL nStdHandle~&)
 FUNCTION ZeroBytes%& ALIAS "GetStdHandle" ()
 FUNCTION EightBytes%& ALIAS "GetStdHandle" (BYVAL nStdHandle~&, BYVAL notNeeded&)
END DECLARE
'GetStdHandle will clear 4 bytes of parameters off the stack when it returns.

PRINT GetStdHandle(0)
PRINT EightBytes(0, 0) 'this leaks 4 bytes of stack
PRINT "we're still alive..."
_DELAY 1
PRINT ZeroBytes
PRINT "crash maybe? (or just recover the 4 bytes we leaked...)"
DO
 _DELAY 1
 PRINT ZeroBytes
 PRINT "keep doing it until something very bad happens..."
LOOP
END



In all practicality:

On the Win32 platform, using mingw-w32:

C++ --- QB64
bool --- _BYTE
char --- _BYTE
short --- INTEGER
int --- LONG
long --- LONG
long long --- _INTEGER64
float --- SINGLE
double --- DOUBLE
there is not an exact correspondence for _FLOAT
void * --- _OFFSET (_OFFSET can also be used for pointer sized integers.)

All code examples in this post are public domain.

Regards,
Michael


P.S. I said: "notice that the third parameter is garbage. in this case, the compiler zero extended the first two parameters, but you shouldn't count on it."
I got to thinking afterwards that it might have been sign extended, but I can't be bothered to check right now.
« Last Edit: August 28, 2012, 06:25:42 PM by mcalkins »
The QBASIC Forum Community: http://www.network54.com/index/10167 Includes off-topic subforums.
QB64 Off-topic subforum: http://qb64offtopic.freeforums.org/

OlDosLover

  • Hero Member
  • *****
  • Posts: 3968
  • OlDosLover
    • Email
Re: data type discussion
« Reply #1 on: July 21, 2012, 01:04:49 AM »
Hi all,
    Thank you mcalkins for an insightful and enlightening description in this post. I must admit that i always learn from you. Greatly appreciated.
OlDosLover.

Clippy

  • Hero Member
  • *****
  • Posts: 16446
  • I LOVE π = 4 * ATN(1)    Use the QB64 WIKI >>>
    • Pete's Qbasic Site
    • Email
Re: data type discussion
« Reply #2 on: July 21, 2012, 08:23:40 AM »
What command line would I need to run this code in the QB64 folder with the compiler?

Code: [Select]
#include <stdio.h>
int main(void) {
 printf("bool        %i\n", sizeof(bool));
 printf("char        %i\n", sizeof(char));
 printf("short       %i\n", sizeof(short));
 printf("int         %i\n", sizeof(int));
 printf("long        %i\n", sizeof(long));
 printf("long long   %i\n", sizeof(long long));
 printf("float       %i\n", sizeof(float));
 printf("double      %i\n", sizeof(double));
 printf("long double %i\n", sizeof(long double));
 return 0;
}
QB64 WIKI: Main Page
Download Q-Basics Code Demo: Q-Basics.zip
Download QB64 BAT, IconAdder and VBS shortcuts: QB64BAT.zip
Download QB64 DLL files in a ZIP: Program64.zip

mcalkins

  • Hero Member
  • *****
  • Posts: 1279
    • qbasicmichael.com
    • Email
Re: data type discussion
« Reply #3 on: July 21, 2012, 11:46:19 AM »
Thanks, OlDosLover.

Clippy:

internal\c\bin\g++ -s delme.cpp -o delme.exe

-s strips the symbols, making the executable smaller.
delme.cpp is the input file.
-o delme.exe specifies the output file.

Code: [Select]
Microsoft Windows XP [Version 5.1.2600]
(C) Copyright 1985-2001 Microsoft Corp.

C:\Documents and Settings\Owner>cd \q\qb64

C:\q\qb64>notepad delme.cpp

C:\q\qb64>internal\c\bin\g++ -s delme.cpp -o delme.exe

C:\q\qb64>delme
bool        1
char        1
short       2
int         4
long        4
long long   8
float       4
double      8
long double 12

C:\q\qb64>internal\c\bin\g++ --version
g++ (GCC) 4.6.1 20110626 (prerelease)
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.


C:\q\qb64>

There are a few things I want to double check about my post, and a few things I should add, but I'm feeling very lazy at the moment, so I'll try to get to it later.

Regards,
Michael
The QBASIC Forum Community: http://www.network54.com/index/10167 Includes off-topic subforums.
QB64 Off-topic subforum: http://qb64offtopic.freeforums.org/

Clippy

  • Hero Member
  • *****
  • Posts: 16446
  • I LOVE π = 4 * ATN(1)    Use the QB64 WIKI >>>
    • Pete's Qbasic Site
    • Email
Re: data type discussion
« Reply #4 on: July 21, 2012, 12:31:48 PM »
Is there any way to pause the screen output? I piped it from the EXE file as it ran too fast.
QB64 WIKI: Main Page
Download Q-Basics Code Demo: Q-Basics.zip
Download QB64 BAT, IconAdder and VBS shortcuts: QB64BAT.zip
Download QB64 DLL files in a ZIP: Program64.zip

thacket

  • Newbie
  • *
  • Posts: 12
Re: data type discussion
« Reply #5 on: August 28, 2012, 09:38:42 AM »
Hello Mcalkins,

Thanks for the great explanation. To answer your questions:

1. I am not concerned with multiplatform for now; maybe later.
2. I am calling 'C' functions from QB64 only. I will try to go the other way when I get more comfortable.

I have not printed out the code for a thorough examination yet; when I do, I will probably have more questions. Thanks again and take care.

All the Best,
John

Johny B.

  • Sr. Member
  • ****
  • Posts: 488
    • Email
Re: data type discussion
« Reply #6 on: September 01, 2012, 07:01:30 PM »
Clippy - add the following to the end:
Code: [Select]
printf("Press enter to continue...");
getchar();

That should make the program wait for you to press the enter key before exiting. Note that this is the C way of doing it, not the C++ way, but will still work.
"Time is an illusion; Lunchtime doubly so." - Douglas Adams

  • Print