[info-mcl] process crash

Alexander Repenning ralex at cs.colorado.edu
Mon Aug 3 13:02:00 CDT 2009


have not done much testing but for the simple cases seems to work.  
Thanks Gary!

Would people mind/object/protest/welcome if RMCL would be checked into  
Google Code? It would make updating and tracking RMCL (via svn) quite  
a bit simpler. I doubt there will be many changes but even just a  
handful of updates would be tricky to track as diff fragments  
distributed in various emails.

Alex


On Aug 1, 2009, at 1:09 PM, Gary Byers wrote:

> I'm fairly sure that adding a couple of lines near the end of the LAP
> function %SAVE-STACK-GROUP-CONTEXT (in "ccl:level-1;PPC;ppc-stack- 
> groups.lisp")
> will fix the problem that causes PROCESS-RUN-FUNCTION to sometimes  
> crash,
> as suggested by the following diff:
>
> Index: level-1/PPC/ppc-stack-groups.lisp
> ===================================================================
> --- level-1/PPC/ppc-stack-groups.lisp	(revision 223)
> +++ level-1/PPC/ppc-stack-groups.lisp	(working copy)
> @@ -1685,6 +1685,8 @@
>      (svset data sg.ts-overflow-limit sg t)
>      ; Prevent stack overflow when we reenter this code (may not be  
> necessary)
>      (set-global rzero cs-overflow-limit)
> +    (set-global rzero vs-overflow-limit)
> +    (set-global rzero ts-overflow-limit)
>      (set-global rzero db-link))
>    (blr))
>
> The problem (at least the problem that I'm aware of) has to do with  
> how
> stack-overflow is detected in RMCL.  An (R)MCL thread has 3 stacks,  
> where
> the thread's "control stack" is the hardware stack (addressed by r1  
> on the
> PPC) and the "value" and "temp" stacks are used for (roughly) fixed-  
> and
> variable-sized lisp objects.  In native MCL, overflow on the temp and
> values stacks could be detected by write-protecting some guard pages
> at the end of the stack and handling the resulting exception.  Since
> exceptions don't work under Rosetta, in RMCL it's necessary to check
> for overflow in software (by comparing the appropriate stack pointer
> to a global variable and signaling a stack-overflow if the stack  
> pointer
> is "unsigned less than" the limit.
>
> Only one thread can run at a time in (R)MCL, and part of context- 
> switching
> between threads (stack-groups, actually ...) involves copying some  
> global
> state into the outgoing thread, making the global state "thread  
> neutral",
> and then copying the incoming thread's state to the global  
> variables.  (These
> global variables include the current thread's stack overflow limits.)
>
> One of the first things that a new thread does is to try to  
> determine what
> its stack overflow limits should be.  Until it's done this (and set  
> the
> appropriate global variables), any software stack-overflow checks  
> that the
> new thread does have to use the "thread-neutral" value 0 (no stack  
> pointer
> can be "unsigned less-than" 0.)  Because the outgoing thread neglected
> to zero out the global temp- and value-stack limits (in the +- 
> prefixed lines
> in the patch above), the first few stack-overflow checks in a new  
> thread
> compared the stack pointer to the previously active thread's limit.   
> This
> is completely wrong, but it has a 50% chance of being accidentally  
> right
> (depending on the relative addresses of the outgoing thread's stack  
> and
> the incoming thread's.)  Roughly half the time, the new thread would
> get a spurious stack overflow before it'd even finished initializing
> itself and this generally led to an immediate crash.
>
> AFAICT, this doesn't have anything to do with event-processing per se,
> but it may be the case that switching to a new thread from the event
> thread would fail and switching to the new thread from (e.g.) the
> listener thread would succeed, and this has to do with the more-or- 
> less
> arbitrary relative addresses of the incoming and outgoing threads'  
> stacks.
>
> I haven't seen PROCESS-RUN-FUNCTION fail since making the 2-line  
> change
> above.  That's not conclusive (since I hadn't seen it fail until  
> Andrew
> told me about the discussion on this list a few weeks ago), but the  
> explanation
> above seems to be consistent with the (somewhat unpredictable)  
> behavior
> that people here have reported, and the 2-line fix above seems to  
> fix the
> problem.
>
>
> On Sat, 1 Aug 2009, Terje Norderhaug wrote:
>
>> On Jul 20, 2009, at 4:22 PM, Alexander Repenning wrote:
>>> Just wondering... has anybody found any work around the process  
>>> problems in
>>> RMCL. At least so far I have not been able to use process-run- 
>>> function in a
>>> way that is NOT causing a crash.
>>
>> The enclosed patch attempts to work around the process problems in  
>> RMCL, as a
>> potential remedy until we have a proper fix. It exploits the  
>> observation that
>> processes can be started in the event handler without crashes. The  
>> patch
>> advises process-run-function and should not require changes to any  
>> other
>> code.
>>
>> -- Terje Norderhaug
>>
> _______________________________________________
> info-mcl mailing list
> info-mcl at clozure.com
> http://clozure.com/mailman/listinfo/info-mcl

Prof. Alexander Repenning

University of Colorado
Computer Science Department
Boulder, CO 80309-430

vCard: http://www.cs.colorado.edu/~ralex/AlexanderRepenning.vcf


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://clozure.com/pipermail/info-mcl/attachments/20090803/9fa99781/attachment.html>


More information about the info-mcl mailing list