serial port programming woes        


AUGH!! i'm trying to add fault tolerance to a serial communications program that seems to break for no good reason under win98.

i put a lot of time last month into creating send and receive function that were generic and logical; the code is nice and clean, and has three #ifdef'd lines for the Win32 serial api and three #ifdef'd lines for the POSIX serial api.  the code ran flawlessly under both winnt and linux.  and then i moved the code to a win98 machine (which supposedly supports win32, right?), and i get errors that really, by all logic, shouldn't happen.  when i ask the system for the error code, the two strings i get back (seemingly in random order) are "No error," and "The operation completed successfully."  well, if the fucking operation completed successfully, the a) why did you give me a fucking error code, and b) why isn't it fucking working fucking correctly!?!?!?!?  DAMN WINBLOWS!  AUGH!  ...sorry about that...

the most aggravating part is that the errors happen at random, possibly even caused by line noise (which is hard enough to detect without the windows serial driver being so unhelpful), which means i can't fake the error input for testing.  i have to go through a real test --- compile the code, start up the test, and then wait for five to thirty minutes for the thing to fail.

and all this came about because i had two bad data points in a file of 2080 data points.  *sigh*   that's less than one percent!!  it wouldn't be so bad, except that a) the errors happen completely randomly, so in theory i could just as easily get no bad data points as all bad data points, and b) the value of bad data points is completely random (because the previous code didn't think this error *could* happen, so it simply bailed, leaving the later code to process garbage), as well, making them hard to detect and impossible to fix.

the only way i can get around it is to detect on the communications level that an error occurred, then propagate that error back to the calling layer (no trivial task in this code) so the calling layer knows to retry its query, since the response was hosed the first time.

i do not like workarounds.  i like fixes.  i cannot fix this.  therefore, it is driving me crazy.

and sitting here, listening to this persistent and noisy little robot is like chinese water torture...

Fri Apr 2 15:33:47 EST 1999


to add insult to injury, i later found out what was causing the problem that made me swear so much in the message above --- Norton Anti-Virus.  there was absolutely nothing wrong with my code, and all the debugging and searching proved completely fruitless.  everything worked perfectly when it was going slowly, and all the measures i took to be pedantic and picky to avoid errors did nothing but bloat my code.  then i noticed all the errors were aligned on DWORD boundaries --- and that seemed quite suspicious.  and i recalled that many of the win95 machine in the office had been completely crippled by the anti-virus software checking all of the compilers' internal and temporary files....  what if the andi-virus software was interfering somehow with the internals of the serial port operation?   is that even possible?  it certainly seemed likely --- i had an error while shutting down the sysmte for the weekend after the debugging session, which brought up a strange error window but wouldn't let me clear it; something to the effect of "Windows cannot close down because the following program is still running: Windows. please close the application before shutting down Windows."  hm...   well, possible or not, once i'd removed the anti-virus software from the Win98 machine, not only did the system act much more stable and stop having these bizarre errors, it was no longer so terribly sluggish, and, of course, all of the serial port errors stopped happening.  completely.

*sigh*

Sun Sep 12 17:54:28 EDT 1999