[info-mcl] process crash
Alexander Repenning
ralex at cs.colorado.edu
Mon Aug 3 13:02:00 CDT 2009
have not done much testing but for the simple cases seems to work.
Thanks Gary!
Would people mind/object/protest/welcome if RMCL would be checked into
Google Code? It would make updating and tracking RMCL (via svn) quite
a bit simpler. I doubt there will be many changes but even just a
handful of updates would be tricky to track as diff fragments
distributed in various emails.
Alex
On Aug 1, 2009, at 1:09 PM, Gary Byers wrote:
> I'm fairly sure that adding a couple of lines near the end of the LAP
> function %SAVE-STACK-GROUP-CONTEXT (in "ccl:level-1;PPC;ppc-stack-
> groups.lisp")
> will fix the problem that causes PROCESS-RUN-FUNCTION to sometimes
> crash,
> as suggested by the following diff:
>
> Index: level-1/PPC/ppc-stack-groups.lisp
> ===================================================================
> --- level-1/PPC/ppc-stack-groups.lisp (revision 223)
> +++ level-1/PPC/ppc-stack-groups.lisp (working copy)
> @@ -1685,6 +1685,8 @@
> (svset data sg.ts-overflow-limit sg t)
> ; Prevent stack overflow when we reenter this code (may not be
> necessary)
> (set-global rzero cs-overflow-limit)
> + (set-global rzero vs-overflow-limit)
> + (set-global rzero ts-overflow-limit)
> (set-global rzero db-link))
> (blr))
>
> The problem (at least the problem that I'm aware of) has to do with
> how
> stack-overflow is detected in RMCL. An (R)MCL thread has 3 stacks,
> where
> the thread's "control stack" is the hardware stack (addressed by r1
> on the
> PPC) and the "value" and "temp" stacks are used for (roughly) fixed-
> and
> variable-sized lisp objects. In native MCL, overflow on the temp and
> values stacks could be detected by write-protecting some guard pages
> at the end of the stack and handling the resulting exception. Since
> exceptions don't work under Rosetta, in RMCL it's necessary to check
> for overflow in software (by comparing the appropriate stack pointer
> to a global variable and signaling a stack-overflow if the stack
> pointer
> is "unsigned less than" the limit.
>
> Only one thread can run at a time in (R)MCL, and part of context-
> switching
> between threads (stack-groups, actually ...) involves copying some
> global
> state into the outgoing thread, making the global state "thread
> neutral",
> and then copying the incoming thread's state to the global
> variables. (These
> global variables include the current thread's stack overflow limits.)
>
> One of the first things that a new thread does is to try to
> determine what
> its stack overflow limits should be. Until it's done this (and set
> the
> appropriate global variables), any software stack-overflow checks
> that the
> new thread does have to use the "thread-neutral" value 0 (no stack
> pointer
> can be "unsigned less-than" 0.) Because the outgoing thread neglected
> to zero out the global temp- and value-stack limits (in the +-
> prefixed lines
> in the patch above), the first few stack-overflow checks in a new
> thread
> compared the stack pointer to the previously active thread's limit.
> This
> is completely wrong, but it has a 50% chance of being accidentally
> right
> (depending on the relative addresses of the outgoing thread's stack
> and
> the incoming thread's.) Roughly half the time, the new thread would
> get a spurious stack overflow before it'd even finished initializing
> itself and this generally led to an immediate crash.
>
> AFAICT, this doesn't have anything to do with event-processing per se,
> but it may be the case that switching to a new thread from the event
> thread would fail and switching to the new thread from (e.g.) the
> listener thread would succeed, and this has to do with the more-or-
> less
> arbitrary relative addresses of the incoming and outgoing threads'
> stacks.
>
> I haven't seen PROCESS-RUN-FUNCTION fail since making the 2-line
> change
> above. That's not conclusive (since I hadn't seen it fail until
> Andrew
> told me about the discussion on this list a few weeks ago), but the
> explanation
> above seems to be consistent with the (somewhat unpredictable)
> behavior
> that people here have reported, and the 2-line fix above seems to
> fix the
> problem.
>
>
> On Sat, 1 Aug 2009, Terje Norderhaug wrote:
>
>> On Jul 20, 2009, at 4:22 PM, Alexander Repenning wrote:
>>> Just wondering... has anybody found any work around the process
>>> problems in
>>> RMCL. At least so far I have not been able to use process-run-
>>> function in a
>>> way that is NOT causing a crash.
>>
>> The enclosed patch attempts to work around the process problems in
>> RMCL, as a
>> potential remedy until we have a proper fix. It exploits the
>> observation that
>> processes can be started in the event handler without crashes. The
>> patch
>> advises process-run-function and should not require changes to any
>> other
>> code.
>>
>> -- Terje Norderhaug
>>
> _______________________________________________
> info-mcl mailing list
> info-mcl at clozure.com
> http://clozure.com/mailman/listinfo/info-mcl
Prof. Alexander Repenning
University of Colorado
Computer Science Department
Boulder, CO 80309-430
vCard: http://www.cs.colorado.edu/~ralex/AlexanderRepenning.vcf
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clozure.com/pipermail/info-mcl/attachments/20090803/9fa99781/attachment.html>
More information about the info-mcl
mailing list