• Print

Author Topic: QBDbase v1.4  (Read 2215 times)

SMcNeill

  • Hero Member
  • *****
  • Posts: 2414
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #30 on: October 15, 2012, 10:13:56 AM »
Update v1.1a -- IMPORTANT UPDATE

--Updated and fixed the bug I mentioned in the previous post, as well as one in the SetQDBRecordsMax subroutine.
-- Removed the EXE files from the rar.  They're easy enough to compile in QB64, and will reduce bandwidth usage for people on dial-up or metered limits.

No new functions or routines added yet.  This is nothing more than a bug fix, but it does address several important issues with custom-length databases.  If you were running with the default database limit before, then nothing probably seemed wrong to you.  If you tried to use a custom database with a non-standard limit of records, it was impossible to get it up and going.  This is the fix for that.  (And I don't think I broke anything else by fixing what I did.  At least I hope I didn't!  ;) )

**************
User Request:

I've noticed one drawback that the database has at the moment is in sorting HUGE databases.  It'll do it, but it does it rather slowly once a database gets a ton of entries.  (It takes me 8-11 hours to sort a database with 708,315 records.)  The mem process is going as quickly as possible, but all it's using at the moment is a "bubble sort" algorithm.  (It basically takes the first record, and compares it against all other records for placement.  then takes the 2nd record and compares it against all remaining records.  and on and on....)   It's easy to code, but not the most efficient method for sorting.

If someone has a really nice and FAST sorting method, and if they don't mind sharing it, I'd love to see it.  I know there's faster methods to sort out there -- I just need to find one that would be easy to plug in the SortData routine and give it a boost.  QB64 and QBDbase can do better than 8 hours to sort a database that size!  I know it can!!  It just needs a more effective sort routine plugged in, instead of the one it has at the moment.

(Side note:  I'm still VERY impressed with QB64's limits!!  Several of the Microsoft tools refuse to even try and sort something with that many records.  A slow sort of 700k+ records is better than NO sort of that many.  ;) )
http://bit.ly/TextImage -- Library of QB64 code to manipulate text and images, as a BM library.
http://bit.ly/Color32 -- A set of color CONST for use in 32 bit mode, as a BI library.

http://bit.ly/DataToDrive - A set of routines to quickly and easily get data to and from the disk.  BI and BM files

Coolman

  • Jr. Member
  • **
  • Posts: 81
Re: QBDbase v1.1a (Important Update)
« Reply #31 on: October 15, 2012, 11:50:56 AM »
congratulations for your work. there are several sorting algorithm. to see here:

http://docvb.free.fr/vbplus/Tris/Tri.php

sorry it is in french but quite understandable

http://en.wikipedia.org/wiki/Sorting_algorithm
http://rosettacode.org/wiki/Sorting_algorithms/Insertion_sort

good luck
*** Excuse my English, I use google translate ***

Clippy

  • Hero Member
  • *****
  • Posts: 16440
  • I LOVE π = 4 * ATN(1)    Use the QB64 WIKI >>>
    • Pete's Qbasic Site
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #32 on: October 15, 2012, 01:00:18 PM »
Hey! This forum is for FINISHED LIBRARIES...you are on page 3...  ;)
QB64 WIKI: Main Page
Download Q-Basics Code Demo: Q-Basics.zip
Download QB64 BAT, IconAdder and VBS shortcuts: QB64BAT.zip
Download QB64 DLL files in a ZIP: Program64.zip

TerryRitchie

  • Hero Member
  • *****
  • Posts: 2264
  • FORMAT C:\ /Q /U /AUTOTEST (How to repair Win8)
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #33 on: October 15, 2012, 01:03:13 PM »
Nothing is ever truly finished when it comes to code ;)

Clippy

  • Hero Member
  • *****
  • Posts: 16440
  • I LOVE π = 4 * ATN(1)    Use the QB64 WIKI >>>
    • Pete's Qbasic Site
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #34 on: October 15, 2012, 01:14:04 PM »
I know, and told him so on page 1...  :D
QB64 WIKI: Main Page
Download Q-Basics Code Demo: Q-Basics.zip
Download QB64 BAT, IconAdder and VBS shortcuts: QB64BAT.zip
Download QB64 DLL files in a ZIP: Program64.zip

Billbo

  • Sr. Member
  • ****
  • Posts: 286
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #35 on: October 15, 2012, 01:56:10 PM »
SMcNeill,

With over 700 thousand record, try Excel 2007 or above,
if delimited like you airport file. Over 1 million rows instead
of the 64 thousand of old. It has sorted a few hundred
thousand rows at a time for me, and not 8 hours. But
you probably want to do everything with Qb64. Don't we
all.

Bill

SMcNeill

  • Hero Member
  • *****
  • Posts: 2414
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #36 on: October 15, 2012, 02:25:42 PM »
Quote from: Coolman on October 15, 2012, 11:50:56 AM
congratulations for your work. there are several sorting algorithm. to see here:

http://docvb.free.fr/vbplus/Tris/Tri.php

sorry it is in french but quite understandable

http://en.wikipedia.org/wiki/Sorting_algorithm
http://rosettacode.org/wiki/Sorting_algorithms/Insertion_sort

good luck

Thanks for the links Coolman.  I'll look into them and see about getting a better routine up and going sometime later for QBDbase.  It'll probably be sometime next month though, as things are in high gear here for Halloween at the moment, so I hope everyone will bear with it and manage to make due with what we've got at the moment.  LOL - like QB64, it works -- but it can work better!  Galleon is doing an overhaul of QB64 to make it better, and I'll do an overhaul of the sorting algorithm later as well. 

As it is, it runs fine for most purposes.  It just doesn't want to work the fastest with HUGE datasets.  (An easy solution would be to simply break it down to 100 databases of 7000 records each, and then sort them.  Then do an insert routine to add the data back to the single database in proper order.)  The only thing is, a method like that would work better with writing to temp files on the disk and cleaning them up afterwards, and I've kind of made it a personal goal to try and keep all this running through _MEM usage, and I don't want to cause too large a burden on the OS with all the extra memory such a method would use.

It'll be a challenge when I delve into it fully, but I've never been afraid to tackle something challenging.  Honestly, I'm looking forward to see how quick a sorting algorithm I can get up and going.  8 hours is a starting benchmark for me, with this file.  Let's see what we're doing the same time in, in a month's time.  ;)
http://bit.ly/TextImage -- Library of QB64 code to manipulate text and images, as a BM library.
http://bit.ly/Color32 -- A set of color CONST for use in 32 bit mode, as a BI library.

http://bit.ly/DataToDrive - A set of routines to quickly and easily get data to and from the disk.  BI and BM files

TerryRitchie

  • Hero Member
  • *****
  • Posts: 2264
  • FORMAT C:\ /Q /U /AUTOTEST (How to repair Win8)
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #37 on: October 15, 2012, 02:30:26 PM »
Quote from: Clippy on October 15, 2012, 01:14:04 PM
I know, and told him so on page 1...  :D

Well this is good stuff he's producing here so I won't mind when I'm reading about new features on page 9!

SMcNeill

  • Hero Member
  • *****
  • Posts: 2414
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #38 on: October 15, 2012, 03:31:54 PM »
Quote from: TerryRitchie on October 15, 2012, 02:30:26 PM
Quote from: Clippy on October 15, 2012, 01:14:04 PM
I know, and told him so on page 1...  :D

Well this is good stuff he's producing here so I won't mind when I'm reading about new features on page 9!

I won't mind writing about new features on page 9.  The problem is all the bug fixes that I seem to be writing about between here and there!  LOL!

Actually, there's been fewer issues than I would have thought, from past experience writing code.  It's easy to write code for yourself; much more challenging to write for the public.  There's been a few hiccups along the way, but its been a smoother ride than I would have thought honestly.  :)

Maybe if things settle down error-wise, I can get back to expanding and optimizing things once again.  ;)
http://bit.ly/TextImage -- Library of QB64 code to manipulate text and images, as a BM library.
http://bit.ly/Color32 -- A set of color CONST for use in 32 bit mode, as a BI library.

http://bit.ly/DataToDrive - A set of routines to quickly and easily get data to and from the disk.  BI and BM files

TerryRitchie

  • Hero Member
  • *****
  • Posts: 2264
  • FORMAT C:\ /Q /U /AUTOTEST (How to repair Win8)
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #39 on: October 15, 2012, 05:40:48 PM »
What I found works best for me when writing libraries for others to use is to create programs that use them along the way.  I'll usually catch shortcomings that turn into new features and LOTS of bugs to fix along the way.  Many times then I'll simplify the programs to become the small examples I include with the libraries.

If for any reason when writing my examples I need to dig into the "meat" of the code to get something to work, say for instance make a change inside an array, I'll turn this into a command. I want the end user as far from the inner workings of the code as possible so they can just focus on the commands and command line options.

I find the hardest (and most tedious) part of writing a library for others to use is the documentation.  Some times it takes me longer to write the docs than it did to write the library.

SMcNeill

  • Hero Member
  • *****
  • Posts: 2414
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #40 on: October 16, 2012, 01:41:41 AM »
I found a fairly simple sorting algorithm that seems to have decent enough improvements to make a SLIGHT difference.  ;)

Compare these methods:

Code: [Select]
limit = 100000

DIM x(limit) AS _INTEGER64
RANDOMIZE TIMER
PRINT "Initializing Data"
FOR i = 0 TO limit
    x(i) = RND(1) * 1234567890987654321
NEXT


t# = TIMER(0.001)


PRINT "Bubble Sorting Data": bubblesort x()
'PRINT "Comb Sorting Data" : combsort x()

t1# = TIMER(0.001)
PRINT USING "Data Sorted in ##,###.##### seconds."; t1# - t#
SLEEP
stepper = 20

FOR i = 0 TO limit - stepper STEP stepper
    FOR j = i TO i + stepper
        PRINT x(j)
    NEXT
    'SLEEP
NEXT


SUB combsort (array() AS _INTEGER64)
gap = UBOUND(array)

DO
    gap = INT(gap / 1.247330950103979)
    IF gap < 1 THEN gap = 1
    i = 0
    swapped = false
    DO
        IF array(i) > array(i + gap) THEN
            SWAP array(i), array(i + gap)
            swapped = true
        END IF
        i = i + 1
    LOOP UNTIL i + gap > UBOUND(array)
LOOP UNTIL gap = 1 AND swapped = false
END SUB

SUB bubblesort (array() AS _INTEGER64)
nr = UBOUND(array)
FOR k = 1 TO nr
    FOR i = k TO nr
        IF array(k) > array(i) THEN SWAP array(k), array(i)
    NEXT
NEXT
END SUB

Try it first as it is, and be patient.  Bubblesort takes about 3 minutes on my machine, and I've got a fast set-up here.  Start it, and then go eat a cookie...

See how long it takes?  This is with ONLY 100,000 comparisons.  Move it up to 1,000,000 and the times go up exponentially!  A very inefficient way to sort (but it's so darn simple to code!)

Now remark out the line PRINTING "Bubble Sort Method", and unremark the line below it.  Give it a run.

See any difference in speed?

_MEM might run faster than disk access, but a good routine runs even faster!   This seems like a nice improvement to swap times, but if someone wants to code me something that seems even faster, I'll be more than happy to look at it. 

Expect to see this change  on the sort routine, in the next update.  (Unless someone finds me something just as simple, but even faster!  ;) )  I can't promise when that'll be, but hopefully it won't take as long as I was thinking.  This IS an important upgrade to our database, and it's something I'd like to implement as soon as possible for everyone to take advantage of.  :D
http://bit.ly/TextImage -- Library of QB64 code to manipulate text and images, as a BM library.
http://bit.ly/Color32 -- A set of color CONST for use in 32 bit mode, as a BI library.

http://bit.ly/DataToDrive - A set of routines to quickly and easily get data to and from the disk.  BI and BM files

SMcNeill

  • Hero Member
  • *****
  • Posts: 2414
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #41 on: October 16, 2012, 10:50:33 AM »
GREAT NEWS!!!   RAAAHRR!!

I've managed to get a small improvement on my sorting routine.   I've now went from 8 - 11 hours sorting 708,215 records, down to 8.94 seconds....

It's going to take a few days to full implement however, as at the moment I've only got it sorting string types.  Each data type will need to be plugged in separate, and it's going to add probably a thousand lines of code in the end, but I think it might be worth it.  10 hours... 10 seconds...   Just a slight difference in those times there.  ;)


A few things I've learned from this:

IF ...  Then is the devil.    Select Case is 100 times faster!!

Code: [Select]
                SELECT CASE sortmethod
                    CASE 2
                        IF VAL(temp$(1)) < VAL(temp$(2)) THEN SWAPEM fileon, first, second, el: swapped = -1
                    CASE 3
                        IF VAL(temp$(1)) > VAL(temp$(2)) THEN SWAPEM fileon, first, second, el: swapped = -1
                    CASE 0
                        IF temp1$ < temp2$ THEN SWAPEM fileon, first, second, el: swapped = -1
                    CASE 1
                        IF temp1$ > temp2$ THEN SWAPEM fileon, first, second, el: swapped = -1
                END SELECT

Compare the above, with the below:

Code: [Select]
          IF sortmethod = 2 AND VAL(temp$(1)) < VAL(temp$(2)) THEN SWAPEM fileon, first, second, el: swapped = -1
          IF sortmethod = 3 AND  VAL(temp$(1)) > VAL(temp$(2)) THEN SWAPEM fileon, first, second, el: swapped = -1
          IF sortmethod = 0 AND  temp1$ < temp2$ THEN SWAPEM fileon, first, second, el: swapped = -1
          IF sortmethod = 1 AND  temp1$ > temp2$ THEN SWAPEM fileon, first, second, el: swapped = -1

Believe it or not, simply making the change from method 2 to method 1 took me from 180 seconds down to 8.9 seconds.

Also, counter-intuitively, adding code runs faster.

first = S + i * el     <--- Our first record.   In the old routine, I just plugged the formula in to where we use it.   I decided to try it like this, and once again it was a NOTICEABLE change.  There's only 2 spots where it figures the math in the routine, but it's quite a bit faster to calculate it once and look it up via a variable, than it is to calculate it twice.  Go figure!

Also I'm swapping out the way the whole sort routine works.

In the past it was ...
 DO for each record in database
     see what type of record it is
          sort fields
 LOOP

Now it's going to be....
    See what type of record it is
          DO for each record in database
               sort fields
          LOOP

It's a lot more code, as the record type is now falling outside the loop and not in it (which forces a loop for each type separate), but it's also one of those speed increasing performance boosts.

I was happy with the way this data library performed before.  I'm absolutely thrilled with the difference this is going to make.  :)

*************************

Since this is going to be a major overhaul of the sort routine, does anyone want to sign up to test it before I upload for the public?  If you've already made a few QBDbase files, don't mind making copies to back them up and preserve them, and are willing to try it out, let me know.  I don't want to corrupt anything someone has already, and it's hard for me to test for every possible data field / database setup.  I don't THINK any of the changes here should harm previous databases, but it's always nice to test as much as possible to be certain.  :)


EDIT:  4.5 seconds sort time now for 708,215 records.   I'm cutting it down a bit at a time.  :P
« Last Edit: October 16, 2012, 10:56:29 AM by SMcNeill »
http://bit.ly/TextImage -- Library of QB64 code to manipulate text and images, as a BM library.
http://bit.ly/Color32 -- A set of color CONST for use in 32 bit mode, as a BI library.

http://bit.ly/DataToDrive - A set of routines to quickly and easily get data to and from the disk.  BI and BM files

Clippy

  • Hero Member
  • *****
  • Posts: 16440
  • I LOVE π = 4 * ATN(1)    Use the QB64 WIKI >>>
    • Pete's Qbasic Site
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #42 on: October 16, 2012, 11:05:22 AM »
GO TIGER! Sounds pretty good! Why 708,215?  ;)
QB64 WIKI: Main Page
Download Q-Basics Code Demo: Q-Basics.zip
Download QB64 BAT, IconAdder and VBS shortcuts: QB64BAT.zip
Download QB64 DLL files in a ZIP: Program64.zip

SMcNeill

  • Hero Member
  • *****
  • Posts: 2414
    • Email
Re: QBDbase v1.1a (Important Update)
« Reply #43 on: October 16, 2012, 11:17:17 AM »
That's how many words I have for my spell checker.  It was just a nice long list of data that I thought might be useful (I'm going to see if I can generate a spell-checker as you type program), but it wasn't sorted like I wanted it to be.  So I converted, sorted, and was VERY disappointed in the wait.  (Of course, I couldn't get word or my old version of excel to even sort it at all.)  It was nice it did it, but it needed to do it faster.   And it's now doing that.  A LOT faster.

The newer versions of Excel limits you to ~ 1 million rows of data.  This won't.   If you have memory for it, you can use it....  i don't know when I'll need more than a million data records, but it's nice to know that if I ever do, I'll be able to work with them.  :)  (Besides, with this I don't have to buy M$ stuff!  :P )

If it does 3 quarters of a million records in less than 5 seconds, image how fast it'll sort the data for something the size most people will use on a regular basis... 

(From a little testing I've been doing: 10,000 records sorts in 0.04 seconds.   100,000 records sort in 0.51 seconds.  And these times are with 75 character strings.   I imagine shorter fields should go even quicker.)

EDIT:  I've got it sorting _UNSIGNED _BYTES.   Since I use those in my database (for word length), I can check how long it takes to sort them as well.  1.5 seconds for 700k+ records sorted by word length, instead of word spelling.  I'm thinking it's fast enough I might be able to use it now...  ;)
« Last Edit: October 16, 2012, 11:36:28 AM by SMcNeill »
http://bit.ly/TextImage -- Library of QB64 code to manipulate text and images, as a BM library.
http://bit.ly/Color32 -- A set of color CONST for use in 32 bit mode, as a BI library.

http://bit.ly/DataToDrive - A set of routines to quickly and easily get data to and from the disk.  BI and BM files

SMcNeill

  • Hero Member
  • *****
  • Posts: 2414
    • Email
Re: QBDbase (Speed-Up Beta Released)
« Reply #44 on: October 16, 2012, 12:23:25 PM »
QBDbase Speed-Up Beta is available!

It's also got the QBD file I was using for testing purposes to check the sort routine.  SpellList.QBD is nothing more than a massive list of 708,215 English language words, names, and abbreviations.  It's got words that range from 1 letter to 60 letters in it, so it's a pretty comprehensive list and good for what most people would ever need.  (It's also got room for another 100,000 words in the database, if anyone wants to add some of their own sometime later.)

I'm leaving the old set of files up, in case anyone wants to grab these and compare how massive the changes are to our SortData and SwapEm routines.  (And also in case this new set causes issues for people somehow with existing QBD files.  I don't think it will, but I prefer to be safe until it's been tested more.)

I was thinking it'd take longer to change everything over, but the process wasn't as bad as I feared.  Once I got the first couple of variables sorted out, all I had to do is cut, copy, paste, and make a few changes to get the others working as well.

Grab it, Try it, and Tell me if it explodes badly on you, or manages to impress you with the difference in how it performs.  :)

Note:  I also tossed in the spellcheck.bas file I was using for speed checking as I altered the sort routine.  Feel free to play around with it all you want, or to use it as a start for any project you have that might could make use of a nice spelling list. 
http://bit.ly/TextImage -- Library of QB64 code to manipulate text and images, as a BM library.
http://bit.ly/Color32 -- A set of color CONST for use in 32 bit mode, as a BI library.

http://bit.ly/DataToDrive - A set of routines to quickly and easily get data to and from the disk.  BI and BM files

  • Print