Discussion:
Scary Windows issues
Esa Ilari Vuokko
2006-08-22 00:37:32 UTC
Permalink
Hi,

There's few annoying issues I'm running into in my mingw32 builds,
and it would be nice if they got fixed for release.

One wierdness is that I've been getting this sort of errors (this one is
from testsuite, not failing on bling-builds):

! ghc.exe: panic! (the 'impossible' happened)
! (GHC version 6.5 for i386-unknown-mingw32):
! loadObj: failed
!
! Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
!
! ghc.exe: Unknown PEi386 section name `/4' (while processing: TH_spliceD1_Lib.o


I have no idea what section that is, and I don't think Linker needs to
crash from it. I quickly tried to find if there's something special in
asm, but there was nothing suspicious looking. I will update my mingw
tomorrow, next time I'd do clean build, maybe it goes away. But I
wouldn't be suprised if it stays.



The other one is the various concurrency tests that fail. The machine
in which I run these is P4 with hyperthreading, and in my experience
hyperthreading does bring out threading-related bugs that won't appear
on normal single core machines.

conc012(threaded2)
Exception (stack overflow) in a thread. This actually plain crashes.

conc014(normal,opt,optasm,prof,profasm,ghci,threaded1,threaded2)
conc015(normal,opt,optasm,prof,profasm,ghci,threaded1,threaded2)
conc017(normal,opt,optasm,prof,profasm,ghci,threaded1,threaded2)
Async exceptions. These crash in diffrent ways, depending on WAY.

conc023(normal,ghci,threaded1,threaded2)
QSemN. This one corrupts heap somehow and crashes in diffrent ways -
I have seen various crashes from panics to just read/write out of
heap.
conc053(threaded1,threaded2)
QSemN+stm

conc037(threaded1,threaded2)
conc038(threaded1,threaded2)
These just have ffi import Windows doesn't have - sleep. Is adding
CPP, and conditionally importing windows.h/Sleep a good way to fix
them?

conc056(threaded1,threaded2)


The async exception and QSemN problems are somewhat beyond me to debug,
if everyone else is busy, I'll try to debug those but I'd appreciate if
someone more experienced would look into them. I've been running some of
the exception failures in debug mode and using various RTS flags and
reading the code, but I can't spot anything wrong in the code.

Let me know if I can help somehow.

Best regards,
--Esa
Simon Marlow
2006-08-22 08:59:38 UTC
Permalink
Hi Esa,

I'll update my Windows build and see if I can reproduce these.

Cheers,
Simon
Post by Esa Ilari Vuokko
There's few annoying issues I'm running into in my mingw32 builds,
and it would be nice if they got fixed for release.
One wierdness is that I've been getting this sort of errors (this one is
! ghc.exe: panic! (the 'impossible' happened)
! loadObj: failed
!
! Please report this as a GHC bug: http://www.haskell.org/ghc/reportabug
!
! ghc.exe: Unknown PEi386 section name `/4' (while processing: TH_spliceD1_Lib.o
I have no idea what section that is, and I don't think Linker needs to
crash from it. I quickly tried to find if there's something special in
asm, but there was nothing suspicious looking. I will update my mingw
tomorrow, next time I'd do clean build, maybe it goes away. But I
wouldn't be suprised if it stays.
The other one is the various concurrency tests that fail. The machine
in which I run these is P4 with hyperthreading, and in my experience
hyperthreading does bring out threading-related bugs that won't appear
on normal single core machines.
conc012(threaded2)
Exception (stack overflow) in a thread. This actually plain crashes.
conc014(normal,opt,optasm,prof,profasm,ghci,threaded1,threaded2)
conc015(normal,opt,optasm,prof,profasm,ghci,threaded1,threaded2)
conc017(normal,opt,optasm,prof,profasm,ghci,threaded1,threaded2)
Async exceptions. These crash in diffrent ways, depending on WAY.
conc023(normal,ghci,threaded1,threaded2)
QSemN. This one corrupts heap somehow and crashes in diffrent ways -
I have seen various crashes from panics to just read/write out of
heap.
conc053(threaded1,threaded2)
QSemN+stm
conc037(threaded1,threaded2)
conc038(threaded1,threaded2)
These just have ffi import Windows doesn't have - sleep. Is adding
CPP, and conditionally importing windows.h/Sleep a good way to fix
them?
conc056(threaded1,threaded2)
The async exception and QSemN problems are somewhat beyond me to debug,
if everyone else is busy, I'll try to debug those but I'd appreciate if
someone more experienced would look into them. I've been running some of
the exception failures in debug mode and using various RTS flags and
reading the code, but I can't spot anything wrong in the code.
Let me know if I can help somehow.
Best regards,
--Esa
Simon Marlow
2006-08-30 14:11:49 UTC
Permalink
Post by Esa Ilari Vuokko
The other one is the various concurrency tests that fail. The machine
in which I run these is P4 with hyperthreading, and in my experience
hyperthreading does bring out threading-related bugs that won't appear
on normal single core machines.
I have fixed a few things that were affecting these tests, and disabled some
tests that were inappropriate. I'm left with

conc014(ghci,threaded1,threaded2)
conc015(ghci,threaded1)
conc017(ghci,threaded1,threaded2)

these all fail because in the threaded RTS on Windows, threadDelay is not
interruptible like it is on Unix. I do intend to fix this (but not for 6.6), so
I should probably mark these as expected failures.

Now, the worrying this is I'm left with some tests that do this:

conc007.exe: getMBlocks: VirtualAlloc MEM_COMMIT failed: The parameter is incorrect.

(I improved the error message). With a given RTS build it is always the same
tests that fail in this way, but if I tweak the RTS or use a debugging RTS, the
error goes away or moves to a different test. I haven't managed to reproduce it
with a debugging RTS at all, which is frustrating. It appears to have nothing
to do with -threaded, though - these failures happen with the normal way too.

conc007 & conc012 seem to trigger it more than other tests.

Esa - do you see this? Any ideas?

Cheers,
Simon
Post by Esa Ilari Vuokko
conc012(threaded2)
Exception (stack overflow) in a thread. This actually plain crashes.
conc014(normal,opt,optasm,prof,profasm,ghci,threaded1,threaded2)
conc015(normal,opt,optasm,prof,profasm,ghci,threaded1,threaded2)
conc017(normal,opt,optasm,prof,profasm,ghci,threaded1,threaded2)
Async exceptions. These crash in diffrent ways, depending on WAY.
conc023(normal,ghci,threaded1,threaded2)
QSemN. This one corrupts heap somehow and crashes in diffrent ways -
I have seen various crashes from panics to just read/write out of
heap.
conc053(threaded1,threaded2)
QSemN+stm
conc037(threaded1,threaded2)
conc038(threaded1,threaded2)
These just have ffi import Windows doesn't have - sleep. Is adding
CPP, and conditionally importing windows.h/Sleep a good way to fix
them?
conc056(threaded1,threaded2)
The async exception and QSemN problems are somewhat beyond me to debug,
if everyone else is busy, I'll try to debug those but I'd appreciate if
someone more experienced would look into them. I've been running some of
the exception failures in debug mode and using various RTS flags and
reading the code, but I can't spot anything wrong in the code.
Let me know if I can help somehow.
Best regards,
--Esa
Esa Ilari Vuokko
2006-08-30 19:46:24 UTC
Permalink
Simon Marlow wrote:

[snip: fixed and analysed test failures]
Thanks!
Post by Simon Marlow
conc007.exe: getMBlocks: VirtualAlloc MEM_COMMIT failed: The parameter
is incorrect.
Iiik! This means that address given to VirtualAlloc is either not
reserved yet, or that size is too big (ie the block-to-be-committed
isn't inside one VirtualAlloc MEM_RESERVE block.)
Post by Simon Marlow
(I improved the error message). With a given RTS build it is always the
same tests that fail in this way, but if I tweak the RTS or use a
debugging RTS, the error goes away or moves to a different test. I
haven't managed to reproduce it with a debugging RTS at all, which is
frustrating. It appears to have nothing to do with -threaded, though -
these failures happen with the normal way too.
conc007 & conc012 seem to trigger it more than other tests.
Esa - do you see this? Any ideas?
I don't see random crashes. I ran fast-mode and full concurrency-dir tests
several times.

But...this made me check the MBlock allocator once more...there appears
to be a bug in my code. Attached is a patch with fix. Although I've hard
time guessing how this bug could cause the failure - it's like something was
freeing memory. Well, hope it's this bug...in which case sorry for the
trouble.

(The patch doesn't cause additional failures for fast, but it doesn't
fix any for me, either.)

Failures I'm seeing in concurrency-dir, on machine with two cores (real, not
HT), but not using testsuite THREADS as the machine had extra load:
conc023(normal,profasm,ghci,threaded2)
This is still random, diffrent WAYS fail on diffrent runs.
conc037(threaded2)
conc039(threaded1)
conc058(normal,opt,optasm,prof,profasm)
"I'm Interruptible"

Best regards,
--Esa
Simon Marlow
2006-08-31 15:32:47 UTC
Permalink
Post by Esa Ilari Vuokko
[snip: fixed and analysed test failures]
Thanks!
Post by Simon Marlow
conc007.exe: getMBlocks: VirtualAlloc MEM_COMMIT failed: The parameter
is incorrect.
Iiik! This means that address given to VirtualAlloc is either not
reserved yet, or that size is too big (ie the block-to-be-committed
isn't inside one VirtualAlloc MEM_RESERVE block.)
Post by Simon Marlow
(I improved the error message). With a given RTS build it is always the
same tests that fail in this way, but if I tweak the RTS or use a
debugging RTS, the error goes away or moves to a different test. I
haven't managed to reproduce it with a debugging RTS at all, which is
frustrating. It appears to have nothing to do with -threaded, though -
these failures happen with the normal way too.
conc007 & conc012 seem to trigger it more than other tests.
Esa - do you see this? Any ideas?
I don't see random crashes. I ran fast-mode and full concurrency-dir tests
several times.
But...this made me check the MBlock allocator once more...there appears
to be a bug in my code. Attached is a patch with fix. Although I've hard
time guessing how this bug could cause the failure - it's like something was
freeing memory. Well, hope it's this bug...in which case sorry for the
trouble.
(The patch doesn't cause additional failures for fast, but it doesn't
fix any for me, either.)
It doesn't fix the failures I'm seeing. conc007, conc012 and conc030 all fail
for me at the moment, although the compiler itself seems pretty stable.

Should your patch be committed or not?
Post by Esa Ilari Vuokko
Failures I'm seeing in concurrency-dir, on machine with two cores (real, not
conc023(normal,profasm,ghci,threaded2)
This is still random, diffrent WAYS fail on diffrent runs.
I haven't seen conc023 fail, except the ghci way:

conc023: failed to create OS thread: Not enough storage is available to process
this command.

which is reasonable. I'm not running on an HT or MT machine though, so the
other conc023 failures might only show up with there are multiple hardware threads.
Post by Esa Ilari Vuokko
conc037(threaded2)
what happened here?
Post by Esa Ilari Vuokko
conc039(threaded1)
I believe this one fails occasionally due to non-fatal reasons (performGC_:
interrupted). I should fix the test really.
Post by Esa Ilari Vuokko
conc058(normal,opt,optasm,prof,profasm)
"I'm Interruptible"
conc058 shouldn't be failing now.

Cheers,
Simon
Esa Ilari Vuokko
2006-08-31 23:59:40 UTC
Permalink
Post by Simon Marlow
conc023: failed to create OS thread: Not enough storage is available to
process this command.
That's the error I get as well.
Post by Simon Marlow
which is reasonable. I'm not running on an HT or MT machine though, so
the other conc023 failures might only show up with there are multiple
hardware threads.
I think the Windows has boot-time constant for maximum number of threads,
and probably my machine has more software running, or yours is "server"-tuned
and has more space for them.
Post by Simon Marlow
Post by Esa Ilari Vuokko
conc037(threaded2)
what happened here?
Ah, it seems to be a newline is interleaved in a funny way.
Happens everytime for me, here's the output diff:
! newThread started
! mainThread
! newThread back again
! 1 sec later
!
! shutting down
--- 1,6 ----
! newThread startedmainThread
!
! newThread back again
! 1 sec later
!
! shutting down
Post by Simon Marlow
Post by Esa Ilari Vuokko
conc039(threaded1)
I believe this one fails occasionally due to non-fatal reasons
(performGC_: interrupted). I should fix the test really.
Okey. That's the way it fails for me as well, and seems to fail each
run.
Post by Simon Marlow
Post by Esa Ilari Vuokko
conc058(normal,opt,optasm,prof,profasm)
"I'm Interruptible"
conc058 shouldn't be failing now.
Right, it isn't now (unexpectedly, that is).

That leaves me with no unexplained (concurrency/) failures. I've now run the
tests conc007, conc012 and conc030 lots of times, trying to get diffrent
conditions, but they just won't fail on my machines (one dual core, one
HT).

Reading the allocation code once more, I found one more bug. It shouldn't
affect the failures you see in any way. Attached a fix, and my previous patch
as well. They introduced no regressions for fast and full concurrency/.
They both fix border-case bugs, one should lead into fatal error (the first
patch fixes that one) and other might just lose reserved memory (allthough
currently that might not ever happen).

Anyway, if there is yet more bugs in MBlock allocation code, I am probably
blind to them.

As a straw for debugging mblock allocator, I've tried changing the direction
mblock allocator gets addresses from OS, but that didn't affect tests.
If it introduces an error, it means a bug in mblock allocator.
336c336
< VirtualAlloc(NULL, rec->size, MEM_RESERVE, PAGE_READWRITE);
---
Post by Simon Marlow
VirtualAlloc(NULL, rec->size, MEM_RESERVE | MEM_TOP_DOWN,
PAGE_READWRITE);

Best regards,
--Esa
Bulat Ziganshin
2006-09-01 09:15:53 UTC
Permalink
Hello Esa,
Post by Esa Ilari Vuokko
Ah, it seems to be a newline is interleaved in a funny way.
! newThread startedmainThread
!
it's because hPutStrLn is not primitive operation, it's implemented as
hPutStr+hPutChar '\n', where hPutStr and hPutChar locks handle during
their internal operation
--
Best regards,
Bulat mailto:***@gmail.com
Simon Marlow
2006-09-01 15:11:28 UTC
Permalink
Post by Esa Ilari Vuokko
That leaves me with no unexplained (concurrency/) failures. I've now
run the tests conc007, conc012 and conc030 lots of times, trying to
get diffrent conditions, but they just won't fail on my machines (one
dual core, one HT).
These also are failing for me with the MEM_COMMIT error in my latest
run:

memo001(opt,optasm,threaded2)
ioref001(normal,threaded2)
list003(opt,optasm,threaded2)

and probably more.

Cheers,
Simon
Simon Marlow
2006-09-01 15:17:34 UTC
Permalink
Post by Simon Marlow
Post by Esa Ilari Vuokko
That leaves me with no unexplained (concurrency/) failures. I've now
run the tests conc007, conc012 and conc030 lots of times, trying to
get diffrent conditions, but they just won't fail on my machines (one
dual core, one HT).
These also are failing for me with the MEM_COMMIT error in my latest
memo001(opt,optasm,threaded2)
ioref001(normal,threaded2)
list003(opt,optasm,threaded2)
and probably more.
BTW, many of the tests I see failing have something in common: they have +RTS
-Ksomething, which means they probably allocate a multi-megabyte block in one
go. See also stableptr004, for example.

Cheers,
Simon

Continue reading on narkive:
Loading...