Discussion:
JVM CPU queueing
(too old to reply)
Tim Crowley
2014-10-21 13:15:40 UTC
Permalink
Raw Message
We are in the process of upgrading our ERP system from RPG to Java.
We are having some performance issues with the Java based system and I'm
using Job Watcher from iDoctor to evaluate.
The results from Job watcher show a very large percentage of time is spent
on CPU queueing the breakdown is 50% Dispatched CPU, 40% CPU queueing and
10% paging, journaling and IO. The CPU utilisation is around 30 - 40%
sometime spiking to 80%.
In our RPG system very little time is sent on CPU queueing.
Is it normal JVM's to spend a large percentage of time CPU queueing
Grattan House, Lower Mount Street, Dublin 2, Ireland | Telephone: +353-1-661 9599
The Irish Dairy Board Cooperative Limited | Registered in Dublin No. 3221R

IDB website: http://www.idb.ie | Kerrygold website: http://www.kerrygold.com
The Irish Dairy Board was proud to receive a Ruban d'Honneur for Ireland in the 2013/2014 European Business Awards
Please consider the environment before printing this e-mail The information contained in this e-mail (and any attachments ) is confidential and may not be used by anyone other than the addressee. If you are not this intended recipient please notify us immediately. We do not accept any responsibility for the accuracy or otherwise of this message.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Charles Wilt
2014-10-21 13:40:16 UTC
Permalink
Raw Message
You might try the JAVA400 list...

I'm neither a Java nor performance guru...

But from what little I know, yeah 40% CPU queueing doesn't sound good. I
know java takes more memory and more CPU than RPG. In general, I'd be
hesitant to move from RPG to Java without a hardware upgrade.

What are your hardware specs and OS version?

Charles
Post by Tim Crowley
We are in the process of upgrading our ERP system from RPG to Java.
We are having some performance issues with the Java based system and I'm
using Job Watcher from iDoctor to evaluate.
The results from Job watcher show a very large percentage of time is spent
on CPU queueing the breakdown is 50% Dispatched CPU, 40% CPU queueing and
10% paging, journaling and IO. The CPU utilisation is around 30 - 40%
sometime spiking to 80%.
In our RPG system very little time is sent on CPU queueing.
Is it normal JVM's to spend a large percentage of time CPU queueing
Grattan House, Lower Mount Street, Dublin 2, Ireland | Telephone: +353-1-661 9599
The Irish Dairy Board Cooperative Limited | Registered in Dublin No. 3221R
http://www.kerrygold.com
The Irish Dairy Board was proud to receive a Ruban d'Honneur for Ireland
in the 2013/2014 European Business Awards
Please consider the environment before printing this e-mail The
information contained in this e-mail (and any attachments ) is confidential
and may not be used by anyone other than the addressee. If you are not this
intended recipient please notify us immediately. We do not accept any
responsibility for the accuracy or otherwise of this message.
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Tim Crowley
2014-10-21 14:28:04 UTC
Permalink
Raw Message
Charles,

We have already done the hardware upgrade to a Power 7 720 14,000 CPW and
128GB memory so resources should not be an issue at the current phase of
the project but I concerned about when we migrate more companies onto the
new system. We are using OS 7.1

Grattan House, Lower Mount Street, Dublin 2, Ireland | Telephone: +353-1-661 9599
The Irish Dairy Board Cooperative Limited | Registered in Dublin No. 3221R

IDB website: http://www.idb.ie | Kerrygold website: http://www.kerrygold.com
The Irish Dairy Board was proud to receive a Ruban d'Honneur for Ireland in the 2013/2014 European Business Awards
Please consider the environment before printing this e-mail The information contained in this e-mail (and any attachments ) is confidential and may not be used by anyone other than the addressee. If you are not this intended recipient please notify us immediately. We do not accept any responsibility for the accuracy or otherwise of this message.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Mark S Waterbury
2014-10-21 15:10:34 UTC
Permalink
Raw Message
Tim:

How many "cores" do you have licensed and activated within your IBM i
partition?

Java applications tend to use a lot of threads, so this can make a
difference ... too many active threads may cause "queuing" for the
(limited) CPU resources.

Mark S. Waterbury
Post by Tim Crowley
Charles,
We have already done the hardware upgrade to a Power 7 720 14,000 CPW and
128GB memory so resources should not be an issue at the current phase of
the project but I concerned about when we migrate more companies onto the
new system. We are using OS 7.1
Grattan House, Lower Mount Street, Dublin 2, Ireland | Telephone: +353-1-661 9599
The Irish Dairy Board Cooperative Limited | Registered in Dublin No. 3221R
IDB website: http://www.idb.ie | Kerrygold website: http://www.kerrygold.com
The Irish Dairy Board was proud to receive a Ruban d'Honneur for Ireland in the 2013/2014 European Business Awards
Please consider the environment before printing this e-mail The information contained in this e-mail (and any attachments ) is confidential and may not be used by anyone other than the addressee. If you are not this intended recipient please notify us immediately. We do not accept any responsibility for the accuracy or otherwise of this message.
______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Tim Crowley
2014-10-21 15:17:17 UTC
Permalink
Raw Message
Mark,

2 physical cores and the LPAR has 2 virtual cores.
I have uncapped the processor so in theory this LPAR could use 300% CPU.
I have 0 machine pool faulting with very little user pool faulting also the
disk busy is less than 10%.
Grattan House, Lower Mount Street, Dublin 2, Ireland | Telephone: +353-1-661 9599
The Irish Dairy Board Cooperative Limited | Registered in Dublin No. 3221R

IDB website: http://www.idb.ie | Kerrygold website: http://www.kerrygold.com
The Irish Dairy Board was proud to receive a Ruban d'Honneur for Ireland in the 2013/2014 European Business Awards
Please consider the environment before printing this e-mail The information contained in this e-mail (and any attachments ) is confidential and may not be used by anyone other than the addressee. If you are not this intended recipient please notify us immediately. We do not accept any responsibility for the accuracy or otherwise of this message.

______________________________________________________________________
This email has been scanned by the Symantec Email Security.cloud service.
For more information please visit http://www.symanteccloud.com
______________________________________________________________________
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Nathan Andelin
2014-10-21 16:31:57 UTC
Permalink
Raw Message
Is it normal JVM's to spend a large percentage of time CPU queueing?
Yes it's normal, and I believe we've seen this in virtually every published
Java benchmark on virtually every hardware / OS platform, no matter how
many threads a JVM / application server may be configured to run
concurrently. The only way for a JVM to fully utilize a multi-core system
is to run multiple JVM / application server instances, and to deploy your
applications across each instance.

IBM schentist Ronald Luijten commented in the September 2014 issue of IBM
Systems Magazine, "We also have the multi-core programming wall. All of our
computer science students learn how to program in Java, which doesn't
support multi-cores at all. Intel may put 128 cores on a chip, but people
coming out of universities are only going to use one of them."

So if you're going to switch your application architecture from the native
virtual machine to a JVM, I would recommend that you still use RPG to
implement data validation, RI constraints, and business rules - rather than
moving that kind of logic into a JVM environment. Use the JVM if you will
for browser I/O.

Nathan.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
John Yeung
2014-10-21 18:36:16 UTC
Permalink
Raw Message
Post by Nathan Andelin
The only way for a JVM to fully utilize a multi-core system
is to run multiple JVM / application server instances, and to deploy your
applications across each instance.
This isn't true in general. If this is true for JVM on IBM i, then
that is an IBM-specific thing, not a Java-specific thing.
Post by Nathan Andelin
IBM schentist Ronald Luijten commented in the September 2014
issue of IBM Systems Magazine, "We also have the multi-core
programming wall. All of our computer science students learn
how to program in Java, which doesn't support multi-cores at all.
Intel may put 128 cores on a chip, but people coming out of
universities are only going to use one of them."
I'm not sure if he was quoted out of context, or if he was talking
about Java only on the i. (And I'm still not convinced that it's even
a limitation for Java on the i.) But you sure as hell can use
multiple cores in Java:

http://embarcaderos.net/2011/01/23/parallel-processing-and-multi-core-utilization-with-java/

Note the above link is from 2011. It demonstrates how to utilize
multiple cores on a Windows PC using Java.

The quote about "people coming out of universities are only going to
use one of [the available cores]" I think is mostly true, but not
because of any limitation of Java. It's because multithreading and
multiprocessing are relatively difficult concepts, intrinsically, and
not everyone is going to pick these up in school. Not every school is
going to even bother teaching them, because there is actually a lot of
very useful, productive programming you can do without it.

There are currently quite few languages that make it easy to write
programs which explicitly utilize multiple cores (though this number
is increasing). Java is at least average in this regard, and probably
above average. The holy grail of multicore programming is to get to a
point where compilers, CPUs, and the operating system handle all of
that for us, and we just program the way we traditionally have for
single-processor systems.

For "big" systems, like IBM midrange, I think the idea is still the
same as it always has been: The server is designed to serve multiple
users and multiple processes, and the way to increase utilization is
to serve enough users and enough "single-processor" processes that the
system is taxed (without being overtaxed).

All of which is to say that I agree with the final conclusion, which
is to keep using RPG on the i wherever practical, because RPG is very
well optimized to run on the i.

John Y.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Nathan Andelin
2014-10-21 22:21:56 UTC
Permalink
Raw Message
But you sure as hell can use multiple cores in Java.
Thanks for the reference, John. And I agree that Java can run pools of
parallel tasks via the "Callable" interface and "consume" CPU on multiple
cores. But it appears that even your reference illustrates the futility of
that interface.

In the example cited:

A "Task" appends a character to a string in a loop 20K times order to
consume CPU. When running a pool of 50 tasks sequentially each instance
completes in an elapsed time of 1.27 seconds which includes 1 second of
"sleep" time. When run in parallel, each instance of the pool completes in
approximately 11 seconds.

Why would a programmer consciously "throttle" tasks which ordinarily
require essentially .27 seconds of CPU time and make them take longer
(effectively 40+ times longer) to complete, just to prove a point about
Java's ability to allocate work to multiple cores?

In the example cited, it took a pool of 50 Callable (submitted) Tasks to
drive 8-cores to 100% utilization. Why couldn't Java drive 8 cores to 100%
with a pool of just 8 Callable Tasks?

Should application programmers take responsibility for allocating work to
multi-core servers? Isn't that the responsibility of the OS?

Regarding Ronald Luijten's comment about Java not supporting multi-cores at
all, no that didn't have anything to do with IBM i. It was just an
observation about Java.

I understand that the total elapsed time to complete 50 Task instances is
greater when run sequentially, than in parallel (submitted). But how might
that apply to the question at hand, in Tim's original post?

All benchmarks of Java web workloads indicate that you must run multiple
application server instances to fully utilize multiple cores. The ratio is
pretty much one to one, even though the application server may be
configured with say 100 active threads.

Nathan.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
John Yeung
2014-10-22 04:41:14 UTC
Permalink
Raw Message
Post by Nathan Andelin
Thanks for the reference, John. And I agree that Java can run pools of
parallel tasks via the "Callable" interface and "consume" CPU on multiple
cores. But it appears that even your reference illustrates the futility of
that interface.
A "Task" appends a character to a string in a loop 20K times order to
consume CPU. When running a pool of 50 tasks sequentially each instance
completes in an elapsed time of 1.27 seconds which includes 1 second of
"sleep" time. When run in parallel, each instance of the pool completes in
approximately 11 seconds.
Why would a programmer consciously "throttle" tasks which ordinarily
require essentially .27 seconds of CPU time and make them take longer
(effectively 40+ times longer) to complete, just to prove a point about
Java's ability to allocate work to multiple cores?
I'm not clear what you are referring to as "throttling". The 1-second
delay is a deliberate artifice to simulate a remote call. Basically,
in the real world, few applications are anywhere near 100% pure CPU.
They have to wait for I/O, they have to wait for other resources to
become available, etc. He was just trying to make the example more
realistic.

I don't know what the "effectively 40+ times longer" is supposed to
mean. Where do you get that from? 11 seconds is about 40+ times
longer than .27 seconds, but I believe he's saying that running all 50
tasks in parallel took a total of 11 seconds. Running them serially
took about 65 seconds. So using 8 cores produced a speed-up of a
little less than 6 times over using a single core.
Post by Nathan Andelin
In the example cited, it took a pool of 50 Callable (submitted) Tasks to
drive 8-cores to 100% utilization. Why couldn't Java drive 8 cores to 100%
with a pool of just 8 Callable Tasks?
It absolutely could. He spent some of the article talking about how
the exact nature of the tasks affects how you'll want to configure
your pools. At the extreme of a completely CPU-bound, perfectly
8-way-parallelizable application, 8 threads for 8 cores would indeed
be the way to go.
Post by Nathan Andelin
Should application programmers take responsibility for allocating work to
multi-core servers? Isn't that the responsibility of the OS?
The traditional way has been for the OS to do all the allocation.
It's still the most efficient if all you have on the system are
single-threaded batch jobs. Giving the application programmer the
ability to do work allocation is for flexibility and finer control
over resources. In principle, the application programmer knows what
parts of the application can run in parallel, what parts need to wait
for other parts, etc.; things which would be difficult or impossible
for the OS to know without the programmer telling it.
Post by Nathan Andelin
Regarding Ronald Luijten's comment about Java not supporting multi-cores at
all, no that didn't have anything to do with IBM i. It was just an
observation about Java.
Then his observation was just plain wrong.
Post by Nathan Andelin
I understand that the total elapsed time to complete 50 Task instances is
greater when run sequentially, than in parallel (submitted). But how might
that apply to the question at hand, in Tim's original post?
I don't know. I really latched onto what I found were
misunderstandings of the capabilities of Java, and wanted to correct
them.

I don't know exactly what "CPU queuing" means, in the OP's context.
Post by Nathan Andelin
All benchmarks of Java web workloads indicate that you must run multiple
application server instances to fully utilize multiple cores. The ratio is
pretty much one to one, even though the application server may be
configured with say 100 active threads.
I don't doubt this, but I also don't see the relevance to CPU queuing.
(Again, maybe I would if I knew what it was.)

John Y.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Nathan Andelin
2014-10-22 06:40:18 UTC
Permalink
Raw Message
John,

My understanding is that CPU queueing is a wait period that occurs after a
task has been dispatched to the CPU. If dispatched then what is it waiting
for?

I'll try to be more clear about my observations about the article
referenced in your earlier message concerning Java multi-core utilization
and parallel processing.

As you know, the author performs progressive testing of a pool of 50
threads, each performing a CPU bound workload (string concatenation). He
points out that in the first test the threads run serially and use only 1
core as opposed to running parallel and using the 8 cores on the server.

The author asserts that the 1 second sleep time simulates remote
processing, but in these series of tests the sleep period evidently allows
50 threads to be instantiated concurrently, prior to having to run their
CPU bound workload.

The author records that the elapsed run-time for each thread is apx. 1.277
seconds in the first test, which includes the 1 second sleep. The total
elapsed time for the 50 threads was 65.298 seconds. 1.277 times 50 =
63.850, so the total elapsed time is consistent with a serial workload.

That begs the question. If you instantiate 50 threads in Java, why would
they run serially by default? Why wouldn't the OS dispatch them to multiple
cores?

As a relevant aside, I am aware of folks who have tried to configure Java
app servers to run hundreds of threads concurrently with the intent of
dispatching Servlet instances across multiple cores - but never got it to
work. Multi-core servers remained woefully underutilized when stressed no
matter the number of threads allocated. Cross reference those findings with
published benchmarks which show very strong correlation between the number
of app server instances and the number of cores on the benchmark platform.

That sets the stage for the author to explain additional support in
java.util.concurrent
which allows the thread pool to be dispatched to multiple cores, which
leads to the results displayed in Images 7 and 8 which shows task having an
elapsed time of approximately 11 seconds each, and 8 cores used 100% each.

That begs the question, what programmer would implement an interface that
would cause a thread which normally executes in .277 seconds, to run in an
environment that extends the elapsed time to 11 seconds?

Image 11 of the Profiler shows the threads alternating between "run" states
and "wait" states over a long elapsed period before the threads complete.
That's what I meant by "throttling".

If you remove the "sleep" and just let the 50 threads run serially, they
would only use 1 core and complete in approximately 15 seconds. Contrast
that with burning 8 cores over an elapsed time of 11 seconds. Who would do
that?

Nathan.
Post by John Yeung
Post by Nathan Andelin
Thanks for the reference, John. And I agree that Java can run pools of
parallel tasks via the "Callable" interface and "consume" CPU on multiple
cores. But it appears that even your reference illustrates the futility
of
Post by Nathan Andelin
that interface.
A "Task" appends a character to a string in a loop 20K times order to
consume CPU. When running a pool of 50 tasks sequentially each instance
completes in an elapsed time of 1.27 seconds which includes 1 second of
"sleep" time. When run in parallel, each instance of the pool completes
in
Post by Nathan Andelin
approximately 11 seconds.
Why would a programmer consciously "throttle" tasks which ordinarily
require essentially .27 seconds of CPU time and make them take longer
(effectively 40+ times longer) to complete, just to prove a point about
Java's ability to allocate work to multiple cores?
I'm not clear what you are referring to as "throttling". The 1-second
delay is a deliberate artifice to simulate a remote call. Basically,
in the real world, few applications are anywhere near 100% pure CPU.
They have to wait for I/O, they have to wait for other resources to
become available, etc. He was just trying to make the example more
realistic.
I don't know what the "effectively 40+ times longer" is supposed to
mean. Where do you get that from? 11 seconds is about 40+ times
longer than .27 seconds, but I believe he's saying that running all 50
tasks in parallel took a total of 11 seconds. Running them serially
took about 65 seconds. So using 8 cores produced a speed-up of a
little less than 6 times over using a single core.
Post by Nathan Andelin
In the example cited, it took a pool of 50 Callable (submitted) Tasks to
drive 8-cores to 100% utilization. Why couldn't Java drive 8 cores to
100%
Post by Nathan Andelin
with a pool of just 8 Callable Tasks?
It absolutely could. He spent some of the article talking about how
the exact nature of the tasks affects how you'll want to configure
your pools. At the extreme of a completely CPU-bound, perfectly
8-way-parallelizable application, 8 threads for 8 cores would indeed
be the way to go.
Post by Nathan Andelin
Should application programmers take responsibility for allocating work to
multi-core servers? Isn't that the responsibility of the OS?
The traditional way has been for the OS to do all the allocation.
It's still the most efficient if all you have on the system are
single-threaded batch jobs. Giving the application programmer the
ability to do work allocation is for flexibility and finer control
over resources. In principle, the application programmer knows what
parts of the application can run in parallel, what parts need to wait
for other parts, etc.; things which would be difficult or impossible
for the OS to know without the programmer telling it.
Post by Nathan Andelin
Regarding Ronald Luijten's comment about Java not supporting multi-cores
at
Post by Nathan Andelin
all, no that didn't have anything to do with IBM i. It was just an
observation about Java.
Then his observation was just plain wrong.
Post by Nathan Andelin
I understand that the total elapsed time to complete 50 Task instances is
greater when run sequentially, than in parallel (submitted). But how
might
Post by Nathan Andelin
that apply to the question at hand, in Tim's original post?
I don't know. I really latched onto what I found were
misunderstandings of the capabilities of Java, and wanted to correct
them.
I don't know exactly what "CPU queuing" means, in the OP's context.
Post by Nathan Andelin
All benchmarks of Java web workloads indicate that you must run multiple
application server instances to fully utilize multiple cores. The ratio
is
Post by Nathan Andelin
pretty much one to one, even though the application server may be
configured with say 100 active threads.
I don't doubt this, but I also don't see the relevance to CPU queuing.
(Again, maybe I would if I knew what it was.)
John Y.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Wilson, Jonathan
2014-10-22 11:25:26 UTC
Permalink
Raw Message
Post by Nathan Andelin
That begs the question, what programmer would implement an interface that
would cause a thread which normally executes in .277 seconds, to run in an
environment that extends the elapsed time to 11 seconds?
Image 11 of the Profiler shows the threads alternating between "run" states
and "wait" states over a long elapsed period before the threads complete.
That's what I meant by "throttling".
If you remove the "sleep" and just let the 50 threads run serially, they
would only use 1 core and complete in approximately 15 seconds. Contrast
that with burning 8 cores over an elapsed time of 11 seconds. Who would do
that?
You also have to remove the sleep from the second test, which would
reduce the elapsed time accordingly.
Post by Nathan Andelin
From my understanding the "sleep" is just to simulate the threads "doing
other stuff" such as waiting for a DB/disk/etc. to respond,

I think the article is showing, unless I'm missing something, that while
the length of run for the 50 threads is, at a single item level, longer
the time elapsed for the total 50 threads is shorter than if each were
run one after the other.

If the speed of completion for a single thread was the most important
aspect (as other stuff was waiting for its completion so its results
could be used, ie other seperate tasks, DB writes, etc.) then single
threading would be most applicable. If however the most important factor
was the time of work for all 50 items to complete then multi threaded
would be the way to go.

If you were serving up web pages (as a very rough example) for 50 users
and used the multi threaded system then the wait "per user" would be 11
seconds. If you used a single thread the wait for the first person would
be 1.277 seconds, 2.554 for the second and so on... If however you
initiated 50 unique programs serving up 1 page per program, forgetting
about the initial start up/destruction time, then each person could be
served in 1.277 seconds.
Post by Nathan Andelin
Nathan.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Nathan Andelin
2014-10-22 14:31:03 UTC
Permalink
Raw Message
Post by Wilson, Jonathan
You also have to remove the sleep from the second test, which would
reduce the elapsed time accordingly.
From my understanding the "sleep" is just to simulate the threads "doing
other stuff" such as waiting for a DB/disk/etc. to respond,
No, if you review the code in the Task class, you'll see that sleep() is
run only once at the entry point of the call() method - just prior to
executing the 20K loop. So the run / wait states illustrated in the
profiler evidently only account for the string concatenation performed in
the loop - which is very tight CPU bound code.

The sleep period evidently just enables all 50 Task thread instances to be
loaded so that they may run concurrently.

The author stated that he was using JProfiler, which is a JVM profiler,
which may not measure the actual CPU run / wait states, but rather measures
what the JVM is allowing to run, when.

The tests referenced in the article indicate that support provided by
java.util.concurrent
can "force" a JVM to utilize multiple cores - but at a HORRIBLE cost.

A Java nut might say, "Look ma, I can take a multi-threaded CPU bound
workload that would normally use 60% of one core for 15 seconds and force
it to use 100% of 8 cores for 11 seconds, and accomplish the same thing."

Then the nut would have to duck as his mother threw the rolling pin at his
head.

If this is the best Java can do, then I would have to agree with Ronald
Luijten that Java doesn't support multi-cores "at all".

Nathan.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Nathan Andelin
2014-10-22 15:36:39 UTC
Permalink
Raw Message
I found a link to the article featuring the work of Ronald Luijten. See his
response to the question "Why were you interested in developing a
microserver?"

http://www.ibmsystemsmag.com/linuxonpower/Trends/What-s-New/pizza-box-data-center/

Nathan.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
John Yeung
2014-10-22 17:38:28 UTC
Permalink
Raw Message
Post by Nathan Andelin
Post by Wilson, Jonathan
You also have to remove the sleep from the second test, which would
reduce the elapsed time accordingly.
From my understanding the "sleep" is just to simulate the threads "doing
other stuff" such as waiting for a DB/disk/etc. to respond,
No, if you review the code in the Task class, you'll see that sleep() is
run only once at the entry point of the call() method - just prior to
executing the 20K loop.
What you have said is exactly correct, but it sounds like you've
interpreted it completely wrong. Each and every task has to wait 1
second, and then do the loop.

Individually, each task averages less than 22% CPU utilization (about
1 second spent at 0% and about .28 seconds spent at nearly 100%).

If the wait were inside the loop, then the normal runtime for each
task (taken in isolation) would be about 20,000.28 seconds. Run
serially, 50 such tasks would take about 1,000,014 seconds. Clearly,
that's an impractical (and unrealistic) test.
Post by Nathan Andelin
A Java nut might say, "Look ma, I can take a multi-threaded CPU bound
workload that would normally use 60% of one core for 15 seconds and force
it to use 100% of 8 cores for 11 seconds, and accomplish the same thing."
I didn't see where he did any test which used only 60% of one core for
15 seconds. He did tests which used about 60% of all 8 cores for
16.73, 16.21, and 15.756 seconds. This was to complete all 50 tasks.

Then he said "hey, let's try to use more than 60%" and that's when he
got it down to 10.92 seconds, again to complete all 50 tasks. In this
configuration, each task took longer (about 10.9 seconds from start to
finish), but they *all* ran in parallel, with a combination of
multiprocessing (the tasks were distributed over 8 processors) and
multitasking (each processor was swapping tasks in and out).

Note that this last configuration gets the entire workload of 50 tasks
done quicker, but each task is less efficient (not just
longer-running). The reason is that this last test forced the
greatest degree of multitasking. Multitasking incurs the overhead of
saving the state of a job being swapped out and restoring the state of
a job being swapped in.

John Y.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
John Yeung
2014-10-22 17:53:22 UTC
Permalink
Raw Message
Post by John Yeung
Each and every task has to wait 1
second, and then do the loop.
If the wait were inside the loop, [....]
I was careless in my word choice. I should have said "sleep" instead
of "wait" in both cases above.

John Y.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Nathan Andelin
2014-10-22 18:35:08 UTC
Permalink
Raw Message
I made a mistake in my previous message regarding the first test consuming
60% CPU. It was actually 5%. And the total elapsed time was 65 seconds -
not 15.

An analysis of the runtime numbers presented in the article, plus any
extrapolations we might make based on them gets a little complicated. But
if you will bare with me...

The only relevance of the sleep() in my opinion is that it provides an
opportunity to test the runtime performance of the 50 threads running
concurrently by allowing the JVM to complete thread instantiation first,
before the threads go into their 20K loops.

Without sleep() the first threads instantiated would have completed their
loops before subsequent threads were instantiated.

There is some blocking (waiting) associated with the string appending
within the loop, so it is quite unlikely that 100% CPU utilization can be
attributed to a single thread's looping.

My point about removing sleep() from the first test (and this is just my
extrapolation) is that the test would likely complete in about 15 seconds
(.278 times 50) plus some wait time, and would consume something less than
100% of one CPU, perhaps say 60%.

You can compare that with later tests which "submit" the threads, using a
Java interface which consumed 100% of 8 cores for 11 seconds. The JVM is
evidently handling the multitasking at a huge CPU overhead cost, rather
than letting the OS handle the multitasking. Nobody in their right mind
would do that with a real workload.

Nathan.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
John Yeung
2014-10-22 23:48:33 UTC
Permalink
Raw Message
Post by Nathan Andelin
The only relevance of the sleep() in my opinion is that it provides an
opportunity to test the runtime performance of the 50 threads running
concurrently by allowing the JVM to complete thread instantiation first,
before the threads go into their 20K loops.
Without sleep() the first threads instantiated would have completed their
loops before subsequent threads were instantiated.
The sleep really *is* there to simulate non-CPU-bound activity, as a
real-world application would likely have; and it doesn't have anything
to do with giving the later threads a chance to be instantiated before
the earlier ones finish.

50 tasks isn't a lot to instantiate. It's definitely quicker on most
systems than 20,000 string concatenations. If this doesn't seem
intuitive to you, try it yourself. I did. I copied the code from the
article, dropped the sleep down to 1 millisecond, and threw in some
more print statements to show the instantiation of tasks. The 50th
task was instantiated before the first result came back. (This was
run on a 4-core PC.)
Post by Nathan Andelin
My point about removing sleep() from the first test (and this is just my
extrapolation) is that the test would likely complete in about 15 seconds
(.278 times 50) plus some wait time, and would consume something less than
100% of one CPU, perhaps say 60%.
My results were similar to the article. Each of my cores is faster,
but the guy in the article has more cores. In SerialTest, my loops
ran in the .17 second range, for a total time of about 59 seconds (for
the 1-second sleep per task), and very low CPU utilization. The heavy
CPU simulation with 1 ms sleep reduced the run time to about 8.6
seconds, as expected. Total system CPU utilization hovered around
25%, with almost all the activity hitting one core. My guess is he'd
get about 12.5% system CPU use, with one taxed core.
Post by Nathan Andelin
You can compare that with later tests which "submit" the threads, using a
Java interface which consumed 100% of 8 cores for 11 seconds. The JVM is
evidently handling the multitasking at a huge CPU overhead cost, rather
than letting the OS handle the multitasking. Nobody in their right mind
would do that with a real workload.
There are actually several levels of threading, especially these days.
You can think of them as varying levels of weight or courseness.
Jobs, like what are submitted by SBMJOB, are pretty heavy. There are
"native" threads which are provided by the OS, and several of these
can run in a single job or process. Many languages, including Java,
provide an interface to these. So the OS is still managing that level
of threading. There are "green" threads which are provided by a VM,
and so may have different performance characteristics than native
threads (for example, on some systems green threads could be quicker
to activate and synchronize but might be slower at I/O and context
switching), and may be the only option if the OS doesn't provide
native threads.

I can assure you that Java is not particularly bad at threading. It's
probably not the best in existence, but it's actually competent. Much
more relevant to whether multithreading is going to be a successful
strategy are the nature of the workload (some are better suited to
threading than others), the OS, and the underlying hardware.

John Y.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Nathan Andelin
2014-10-23 18:21:20 UTC
Permalink
Raw Message
John,

I have used threads in Java while writing simulators for a robotics
company. I'm pleased that you downloaded the code from the article and
tested it. That shows your interest in Java's runtime characteristics.
While I too have an interest in the runtime characteristics of virtual
machines and similar environments, I feel that we may be doing a disservice
in this thread by bantering over minutia which is not relevant to the
original post. No matter what one's view of the interfaces provided by
java.util.concurrent
might be - they don't appear to be applicable to the original poster's
concerns.

Perhaps the most practical suggestions came from Dieter Bender:

- too many concurrent threads for too few cpus.
- too few activity levels in a subsystem.

Nathan.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
dkimmel
2014-10-22 02:27:54 UTC
Permalink
Raw Message
Well that's total nonsense.

JVM was optimized for multicore processing in Java 6 and Java 7 and the IBM J9 jvm works very well in concurrency. Look at some of my posts on configuring subsystems for java performance on the Java400 archives. 

<div>-------- Original message --------</div><div>From: Nathan Andelin <***@gmail.com> </div><div>Date:10/21/2014 11:32 AM (GMT-06:00) </div><div>To: Midrange Systems Technical Discussion <midrange-***@midrange.com> </div><div>Subject: Re: JVM CPU queueing </div><div>
</div>>
Is it normal JVM's to spend a large percentage of time CPU queueing?
Yes it's normal, and I believe we've seen this in virtually every published
Java benchmark on virtually every hardware / OS platform, no matter how
many threads a JVM / application server may be configured to run
concurrently. The only way for a JVM to fully utilize a multi-core system
is to run multiple JVM / application server instances, and to deploy your
applications across each instance.

IBM schentist Ronald Luijten commented in the September 2014 issue of IBM
Systems Magazine, "We also have the multi-core programming wall. All of our
computer science students learn how to program in Java, which doesn't
support multi-cores at all. Intel may put 128 cores on a chip, but people
coming out of universities are only going to use one of them."

So if you're going to switch your application architecture from the native
virtual machine to a JVM, I would recommend that you still use RPG to
implement data validation, RI constraints, and business rules - rather than
moving that kind of logic into a JVM environment. Use the JVM if you will
for browser I/O.

Nathan.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-***@midrange.com
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-***@midrange.com
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-***@midrange.com
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-***@midrange.com
Before posting, please take a moment to review t
D*B
2014-10-22 06:16:28 UTC
Permalink
Raw Message
... some additional remarks to the discussion:
- cpu queuing in this context is: waiting for a free cpu, might be caused
by:
-- 100% used
-- too many concurrent threads for too few cpus
-- misconfiguration (to few activity levels in a subsystem)
- one JVM using multiple CPUs:
-- java uses native threads since 1.2 and it's up to the OS to dispatch a
thread to whatever cpu the OS wants
- one JVM or multiple JVMs:
-- good application design will use more than one JVM anyway. UI will need
one and the core business layer might use an appserver and maybe the
persistance layer another.
-- the database layer will run on as/400 in native jobs anyway (QZDASOINIT
and friends)

One interesting question to the OP would be, what software they are
migrating to and how the architecture is looking like. I have seen quite a
lot of bad designed java software, designed by programmers, thinking in RPG.
Second question would be, how the new hardware was calibrated, java for sure
would need much more processor ressources (CPU and memory) compared to rpg,
and it will make efficient use of it (in other words: many problems with
response times could be solved by hardware - that's very diffrent to RPG.

Dieter
--
This is the Midrange Systems Technical Discussion (MIDRANGE-L) mailing list
To post a message email: MIDRANGE-L-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
To subscribe, unsubscribe, or change list options,
visit: http://lists.midrange.com/mailman/listinfo/midrange-l
or email: MIDRANGE-L-request-Zwy7GipZuJhWk0Htik3J/***@public.gmane.org
Before posting, please take a moment to review the archives
at http://archive.midrange.com/midrange-l.
Loading...