Kunagi becomes unavailable

Kunagi becomes unavailable in "silent mode". Users not logged in - are waiting for "Loading...." which never ends. Users already logged in can work - but no data is saved - user is not informed and not aware that data is not saved, visible crash was after pull to sprint - can not remember exception shown.
We had SPM and hour of working on backlog went to trash...Really frustrating

version 0.21.1
Please let me know if logs are needed. In logs there are a lot of
"Requested entity not found" entries

Statement from Kunagi Team

I have already seen this behavior in a company where Kunagi was running for a long time without restart. This could be a problem caused by OutOfMemoryError.

I suggest to restart Kunagi (Tomcat) every night to prevent this until we have a real solution.

Fixing iss728 should help with frustration.

Status

Issue is closed for Release 0.22.3.

Comments

Thu, Nov 10, 2011, 17:27 by Witek (SM,T)

@Krzysztof: For how long was your Kunagi instance running?

It would be very nice, if you could send me the logs (wi@koczewski.de). Otherwise I have no idea how to analyse the problem.

Thu, Nov 10, 2011, 17:59 by Krzysztof

Problem occured again few minutes ago. I did restart hour ago tomcat and problem occured again. In logs is silent - I am still logged in and can work - but catalina.out does not grow up. I will send logs in few minutes. Thanks for fast response.

Fri, Nov 11, 2011, 10:27 by Witek (SM,T)

Thank you for this info. Only thing I can recommend for now, ist to increase memory (heap and permgen) for your JVM.

Fri, Nov 11, 2011, 11:03 by Witek (SM,T)

@Krzysztof: Did you restart Kunagi by restarting Tomcat? Or did you just a "redeploy"?

Please do always a Tomcat restart, it seams we have some issues with threads, which do not stop correctly.

Fri, Nov 11, 2011, 23:35 by Krrzysztof

Even more, I had to kill -9 Tomcat process, as shutdown did not work on Tomcat running Kunagi. Before current startup I set up JVM parameters Xmx to 3GB and permgen to 256MB. We will observe situation next week, as now more people (about 15) will be using this instance. I will set up suggested Tomcat restart in cron next week.

I have question about how much is Kunagi storing in User Session context? Does it store all opened entities, what about displayed entities lists? I'm wondering how many people would be able to work on one Kunagi instance in parallel, what are boundaries here. When we should think about installing second instance... Currently on this instance works in parallel about 15 people and this number is to be increased...

Wed, Nov 23, 2011, 14:45 by Krzysztof

Yesterday and today - the same problem, Everynight restart was configured - did not helped. Please advice what can we do.

Should we install more kunagi instances? Question is how many parallel users and active projects one Kunagi instance can handle?

Tomcat shutdown does not work again (had to kill-9) - lots (>100) of following entries at log.

Nov 23, 2011 2:12:37 PM org.apache.coyote.http11.Http11Protocol pause
INFO: Pausing Coyote HTTP/1.1 on http-8080
Nov 23, 2011 2:12:38 PM org.apache.catalina.core.StandardService stop
INFO: Stopping service Catalina
Nov 23, 2011 2:12:41 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/kunagi] appears to have started a thread named [ilarkesto.core.logging.Log] but has failed to stop it. This is very likely to create a memory leak.
Nov 23, 2011 2:12:41 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/kunagi] appears to have started a thread named [app:kunagi] but has failed to stop it. This is very likely to create a memory leak.
Nov 23, 2011 2:12:41 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/kunagi] is still processing a request that has yet to finish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the u
nloadDelay attribute of the standard Context implementation.
Nov 23, 2011 2:12:41 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/kunagi] is still processing a request that has yet to finish. This is very likely to create a memory leak. You can control the time allowed for requests to finish by using the u
nloadDelay attribute of the standard Context implementation.

and on the end of file lots of:

Nov 23, 2011 2:12:41 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
SEVERE: The web application [/kunagi] appears to have started a thread named [<no context>] but has failed to stop it. This is very likely to create a memory leak.
Nov 23, 2011 2:12:41 PM org.apache.catalina.loader.WebappClassLoader clearThreadLocalMap
SEVERE: The web application [/kunagi] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@18fdd2b]) and a value of type [ilarkesto.di.Context] (value [app:kunagi]) bu
t failed to remove it when the web application was stopped. This is very likely to create a memory leak.

Wed, Nov 23, 2011, 15:10 by Witek (SM,T)

I think this is a load problem. Fewer users/projects per instance should be a workaround. Just copy the complete content of kunagi-data into a new instance. Then delete half of the projects in each instance.

If you could email (wi@koczewski.de) me your kunagi-data directory, I could try to reproduce the problem on my own computer. Otherwise please give us some infos, how much data you have on your instance. How many users and how many projects. And how big is your kunagi-data/entities/ directory.

Did you already upgrade to 0.22?

Wed, Nov 23, 2011, 15:47 by Krzysztof

I have noticed that Kunagi took 50% processor time in period that was "hanged".
Next time it happens I will send SIGQUIT to process running Kunagi and send you process stack. Maybe it can help in resolving issue.

Wed, Nov 23, 2011, 20:22 by Krzysztof

It looks quite small.
Our kunagi-data directory:
11M core
12M entities
140M entities-rescue
14M files

Active daily users: ~20 from 35 registered
Projects: daily active 3 from 9 created

Wed, Nov 23, 2011, 20:25 by Krzysztof

We did not upgrade, but I will do it today.

Wed, Nov 23, 2011, 22:11 by Witek (SM,T)

Our own instance runs with more entities without problems. Based on your info about processor usage, I would say it is a problem with your data constilation. Perhaps some infinite loop running.

Did you delete users or projects? Or removed users from projects?

Fri, Nov 25, 2011, 12:27 by Krzysztof

Yes, as I can remember all of mentioned operations were performed in this Kunagi instance. How can I check if data consistency is preserved? Are there any way to clean up data?

Fri, Nov 25, 2011, 14:09 by Witek (SM,T)

Kunagi "repairs" itself automatically on startup. This functionality sometimes caused infinite loops in the past.

Mon, Nov 28, 2011, 15:57 by Witek (SM,T)

We have just published a bugfix release which contains a new administrators page. It shows active tasks, threads, sessions, memory usage, etc.

Hope it helps with identifying the cause of the problem.

The new page is accesible when logged in as system administrator from the header bar or directly from /kunagi/admin.html

Mon, Nov 28, 2011, 22:11 by Krzysztof

I've just updated instance to 22.2 and will observe these statistics in daily manner.
Problem is that in case of unavailability - probably I will not be able to login as admin...
So far no more hangups from 23.11

I will let you know in case I will have any updates.

Tue, Nov 29, 2011, 08:27 by Witek (SM,T)

The login is a plain Java Servlet. The new admin/monitoring page is a simple Servlet too. After login Kunagi redirects to the GWT application. If this fails, you can try to call /kunagi/admin.html directly.

Thu, Feb 23, 2012, 16:59 by Krzysztof

Just for update. We did not have any more problems like described above. We perform restart every night and everything is working ok.

Post a comment



optional
optional