I've been rather busy lately finishing my first production that was supposed to go online instead of being forgotten in the abyss of my hard drive.
Magic Fluids is an Android application with an option to use it as animated wallpaper. It draws animations of fluid based on Navier-Stokes equations and adds particles on top of that. It's been well received by Android users and after three weeks on Google Play, paid version is close to 3000 downloads and free version has almost 40000.
The goal was to use graphics programming skills and make something small, while learning Android development and Google's store intricacies. Since I'm currently in the process of completing my Master's degree, it still took 5 months to finish...
Technically, it's written in Android native code almost exclusively with thin Java layer (UI controls and input). It makes use of multithreading so capabilities of new multicore devices are exploited nicely.
Of course, with Android device market fragmentation, it couldn't go without problems. There are issues on less common devices and a few people wrote in comments that the application restarts or hangs their device. I had to exclude some older phones that apparently didn't like the way I handled live wallpaper mode (using OpenGL in Android live wallpapers is quite tricky and AFAIK there is no commonly accepted way of doing this - see e.g. here).
Now, obviously this wasn't as fun as writing a game. I am hoping for a steady stream of revenue for at least a few months (here where I live $1000/month is actually more than enough for a comfortable living if you don't have any debts and don't expect to make savings) and I'm going to spend this time writing a game that I want to release on both Android and Apple devices (and maybe Ouya?). It's really difficult not to go overboard with scope and features, but I'll try my best not to spend more than 6-8 months on it (which in game producers language means one year at best - that would be fine too). I might post something here when I actually have something to show.
Thursday, April 4, 2013
Friday, January 4, 2013
Android: running native code on multiple CPU cores
Recently I've been trying to speed up some C++ code running on my dual-core Android device. The problem was that two threads that I used were not necessarily running on two cores in parallel (some insight here). I'm going to briefly describe my situation and how I managed to fix the problem.
My device is Ainol Aurora II tablet. I don't know if methods described here will work with yours. It probably depends on CPU model and/or OS version. Maybe you won't even have such problems in the first place.
One thread is the default thread that is called through JNI from onDrawFrame method. Other threads (in case of my dual-core device only one, but possibly more) are spawned as worker threads when application is brought to foreground. Such thread spends part of it's time waiting on a condition variable (I'll write CV from now on) for signals from main thread. Signal tells it that it should start working. When finished, it signals back through another CV that work is completed and starts waiting for another piece of work.
When work pieces are reasonably big (taking 5-10 milliseconds), two threads seem to run on separate cores for most of the time. Not ALL time (every few seconds there is an increase in frame processing time indicating that one core is idle and another is overloaded), but it might be acceptable.
In general, it seems that when thread spends significant (~50%) part of it's time on a CV and doesn't do huge chunks of work, but rather multiple small chunks separated by waiting on a CV, scheduler doesn't bother assigning it to separate core for all (or at least most of) the time.
Unfortunately, work pieces that I have happen to be small - more like half of a millisecond. The most important part of processing is iterative algorithm and after each iteration all threads need to be synchronized (wait for each other). When they wait using CVs, problem described above happens.
(I'm not sure locks are necessary here, but if I wanted to avoid them I would probably need to dive deep into CPU memory models to make sure everything is ordered properly.)
I think it's usually advised to use CVs for this kind of stuff, but well... now it works. I'm very lame in concurrency and so rather hesitant to draw conclusions, but it seems like in this case thread is perceived by the scheduler as working all the time (opposite to situation when thread waits on CV for another piece of work), so it considers it appropriate to assign it separate core.
Seems like fast and easy solution, but there is a tricky part here. Initially, I only called this code when threads were spawned. It did not work out very well and I was close to dumping this idea, but after closer examination I noticed that it was slightly better than before (second core was used a little more often).
I then modified affinity setting code to run EVERY FRAME. Apparently it's important, because from then on both cores started to be used consistently all the time.
Using busy waiting as described above powers up all of the cores and the execution time drops down to ~5ms. Not the perfect solution, since all cores' usage is now 100%.
I am going to multithread a couple more functions and see if running more code in four threads will convince the scheduler to give the app more power. My last app uses the affinity setting solution and it runs on four Nexus cores very well, probably because things are executed on multiple threads basically all the time (which also results in almost 100% CPU usage, but at least it's time well spent :) ).
Otherwise, I might just make the app user choose in runtime whether he wants to use all available cores or save the battery.
My device is Ainol Aurora II tablet. I don't know if methods described here will work with yours. It probably depends on CPU model and/or OS version. Maybe you won't even have such problems in the first place.
Problem
Application is realtime and needs high, consistent framerate, so I need predictable threads behaviour. If for 90% of the time app runs faster thanks to multithreading, but for remaining 10% entire computation is squeezed into one core and framerate drops by one third, then this kind of multithreading is useless for me.One thread is the default thread that is called through JNI from onDrawFrame method. Other threads (in case of my dual-core device only one, but possibly more) are spawned as worker threads when application is brought to foreground. Such thread spends part of it's time waiting on a condition variable (I'll write CV from now on) for signals from main thread. Signal tells it that it should start working. When finished, it signals back through another CV that work is completed and starts waiting for another piece of work.
When work pieces are reasonably big (taking 5-10 milliseconds), two threads seem to run on separate cores for most of the time. Not ALL time (every few seconds there is an increase in frame processing time indicating that one core is idle and another is overloaded), but it might be acceptable.
In general, it seems that when thread spends significant (~50%) part of it's time on a CV and doesn't do huge chunks of work, but rather multiple small chunks separated by waiting on a CV, scheduler doesn't bother assigning it to separate core for all (or at least most of) the time.
Unfortunately, work pieces that I have happen to be small - more like half of a millisecond. The most important part of processing is iterative algorithm and after each iteration all threads need to be synchronized (wait for each other). When they wait using CVs, problem described above happens.
Solution 1 - busy waiting
First solution that seems to work most of the time is very simple. Instead of waiting on condition variables between subsequent chunks of work, I made threads do the following:while(true) { pthread_mutex_lock(&t.MutexSignal); if(t.WorkToDo) break; pthread_mutex_unlock(&t.MutexSignal); }
(I'm not sure locks are necessary here, but if I wanted to avoid them I would probably need to dive deep into CPU memory models to make sure everything is ordered properly.)
I think it's usually advised to use CVs for this kind of stuff, but well... now it works. I'm very lame in concurrency and so rather hesitant to draw conclusions, but it seems like in this case thread is perceived by the scheduler as working all the time (opposite to situation when thread waits on CV for another piece of work), so it considers it appropriate to assign it separate core.
Solution 2 - thread affinity
OK, but busy-waiting is not always that good I suppose. Additionaly, this solution may not be very general. Another solution is to assign threads to separate cores manually. Given that apparently sched_setaffinity() is still not working correctly on Android, I successfully used code given in this stackoverflow topic to set thread affinities.Seems like fast and easy solution, but there is a tricky part here. Initially, I only called this code when threads were spawned. It did not work out very well and I was close to dumping this idea, but after closer examination I noticed that it was slightly better than before (second core was used a little more often).
I then modified affinity setting code to run EVERY FRAME. Apparently it's important, because from then on both cores started to be used consistently all the time.
Update
I'm working on a new application and it turns out setting thread affinity doesn't always work. I got my hands on Nexus 7 with quad-core CPU. Two of the four cores are powered off most of the time and the syscall that is supposed to set the affinity returns with an error claiming that there are only two cores available! (Well it doesn't say exactly that, but you can infer that from the error code and the circumstances.) I have a procedure that takes ~17ms (so most of the frame time) and even though the execution is split between four threads with their affinities updated every frame, only two cores are actually used (I test it with Usemon). This still gives a nice speedup: ~9ms.Using busy waiting as described above powers up all of the cores and the execution time drops down to ~5ms. Not the perfect solution, since all cores' usage is now 100%.
I am going to multithread a couple more functions and see if running more code in four threads will convince the scheduler to give the app more power. My last app uses the affinity setting solution and it runs on four Nexus cores very well, probably because things are executed on multiple threads basically all the time (which also results in almost 100% CPU usage, but at least it's time well spent :) ).
Otherwise, I might just make the app user choose in runtime whether he wants to use all available cores or save the battery.
Subscribe to:
Posts (Atom)