I posted this QuickBASIC/QBasic MD5 implementation a few days ago at forum.qbasicnews.com. I wanted to post it here also, but was afraid the forum formatting would make the code unreadable. If it's OK, here's a link to the original thread:
Thanks for the link, qbguy ! There's some nice time- and space- saving ideas in there. I don't think I could compete with speed, but perhaps incremental hashing could be adapted.
Using VARPTR without VARSEG is dangerous. And using either of them on strings is not portable to QB7.1, in which you need to use SADD and SSEG instead.
Pointers are ugly in DOS, and QBasic isn't a pointer-friendly language. Avoid them if you can.
*Ceteris paribus*, the "unrolled" implementation that you used might actually be faster. But I think my speed hacks make a huge difference.
If you want me to explain any magic in my code, just say the word...
Yes, I tried to minimize the use of variable-length strings, and limit high-level modification of them to "in-place" operations only (no reallocations). As you know that may not be enough to prevent QB from invalidating pointers, and of course there's nothing to stop a user from extensive variable-length string use.. I don't have PDS 7.1, so I can't really design for it.
My original rotation procedure used a loop similar to yours, but contained many ^'s. I spent a little time trying to get the addition to behave like yours as well, but after a short time I gave up. Now that I see a working implementation it does make sense. This is probably my biggest bottleneck, and I can see huge advantages to at least inlining the round operations and folding ^ constant operands where possible. Thanks for your comments.
The issue is that using VARPTR() on a string is in fact using undocumented behaviour, behaviour which was in fact changed between QB4.5 and QB7.1.
In QB4.5, VARPTR() gets you the address of the actual string data whereas in QB7.1 it gets the address of a string descriptor (within which can be found a pointer to the actual string data). This change was done to enable strings to live in far memory, thus increasing the amount of memory available to the program.
This affects both fixed-length and variable-length strings.
Even if you take this into account (by using SADD instead of VARPTR), the strings CAN be in far memory which means you need to mind both their segment and offset when working with them. Although I have a feeling that locally-declared fixed-length strings are still allocated on the stack and hence will be in DGROUP.
Well, the code was designed to run in QBasic 1.1, though I still tested it with QuickBASIC 4.5 (interpreted and compiled), both of which seem to return the descriptor of variable-length strings using VarPtr (including string parameters of course, see MD5GetStringDataPtr%). As far as I can tell any non-static fixed-length string character data is indeed stored on the stack. It's my understanding that both will also automatically reorganize variable-length string character data as memory fragmentation increases over time, though I do not know specific details.
From Ethan Winer's BASIC Techniques and Utilities:
"In QuickBASIC programs and BASIC PDS when far strings are not specified, all strings are stored in an area of memory called the *near heap*. The string data in this memory area is frequently shuffled around, as new strings are assigned and old ones are abandoned."
I do not know the breadth of the behavior that is undocumented in QBasic/QuickBASIC, but their documentation does leave a bit to be desired in any case. As I don't have PDS 7.1, I can't, and didn't/don't, guarantee anything about its stability on that platform (as I could not guarantee its stability in Python or Haskell; I doubt it wouldn't even compile.. ;D), but one may feel free to modify the code to support that and other systems as they see fit.
My memory failed me for a bit there. Yes, all versions of QuickBasic and QBasic use string descriptors for variable-length strings, and fixed-length strings have no descriptors.
The "undocumented behaviour" is the contents of these descriptors. In QBasic, QB 4.5, and QB 7.1 compiled with near strings, these are all the same.
In QB 7.1, compiled with far strings, OR interpreted, the format of the string descriptors are different. The gory details are in Winer Chapter 2.
For the record, I tried compiling your program in 7.1 with near strings. Didn't crash.
My preferred solution is to steer clear of strings. Generally by using arrays. You *can* portably use fixed-length strings if you wrap them up in a TYPE (as suggested by Winer in Chapter 12) or be careful to ONLY pass their address, never the actual string, as an argument to a procedure.
>It's my understanding that both will also automatically reorganize variable-length string character data as memory fragmentation increases over time, though I do not know specific details.
Only if you do anything that requires the creation of a string (that is, string assignment, concatenation, or using functions that return strings like CHR$() or MID$() or MD5IReturnAString$; ASC() and LEN() are fine). If none of these things happens in between getting the address of string data and using the address, then there's nothing to worry about.
>If none of these things happens in between getting the address of string data and using the address, then there's nothing to worry about.
Yes, this is what I assumed would be the case. Though, in my experimentation, seemingly innocuous statements like "print 420" may also cause the garbage collector to run.
>In QB 7.1, compiled with far strings, OR interpreted, the format of the string descriptors are different.
I guess my best recourse is to simply state for which tool (and compiler/interpreter settings) the code is designed to be used with then.
>For the record, I tried compiling your program in 7.1 with near strings. Didn't crash.
Well, that's half the battle right there ! Wonder if it output correct digests.. ? Thanks for trying it out, anyway.