Discussion:
[DBCP] Connection pool not behaving as I expect
(too old to reply)
Shawn Heisey
2018-03-01 22:33:37 UTC
Permalink
We have been having some problems lately where our MySQL server hits the
max connection limit (600) and then everything breaks.  When I look into
the problem, I find that our application servers have each made nearly a
hundred connections to the DB and haven't closed any of them for hours.

I'm also using connection pooling in my programs, with the latest DBCP
version.  Those servers don't open nearly as many connections, and have
idle eviction to keep the connection count down.  But when the limit is
reached, these programs suddenly stop working too.

Investigating these problems, I manage to get connected and kill off the
surplus of idle connections, and everything starts working.

Today, a couple of days after the last incident, I realized that we
should *NOT* be having these problems -- because we're using connection
pooling.  The application has open and idle connections to the DB server
... so why is trying to open MORE connections (and obviously failing)
instead of using one of the perfectly good connections that's already
sitting there, unused?

I'm writing here specifically for DBCP on my programs, so I know you
guys probably can't help with Tomcat's connection pooling ... but for
either case my question stands:  Why isn't connection pooling doing its job?

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@commons.apache.org
For additional commands, e-mail: user-***@commons.apache.org
Gary Gregory
2018-03-01 23:46:51 UTC
Permalink
Post by Shawn Heisey
We have been having some problems lately where our MySQL server hits the
max connection limit (600) and then everything breaks. When I look into
the problem, I find that our application servers have each made nearly a
hundred connections to the DB and haven't closed any of them for hours.
I'm also using connection pooling in my programs, with the latest DBCP
version. Those servers don't open nearly as many connections, and have
idle eviction to keep the connection count down. But when the limit is
reached, these programs suddenly stop working too.
Investigating these problems, I manage to get connected and kill off the
surplus of idle connections, and everything starts working.
Today, a couple of days after the last incident, I realized that we
should *NOT* be having these problems -- because we're using connection
pooling. The application has open and idle connections to the DB server
... so why is trying to open MORE connections (and obviously failing)
instead of using one of the perfectly good connections that's already
sitting there, unused?
I'm writing here specifically for DBCP on my programs, so I know you
guys probably can't help with Tomcat's connection pooling ... but for
either case my question stands: Why isn't connection pooling doing its
job?
I do not think this is a question I, or anyone here, can answer
generically. I can read between that lines that you must feel frustrated
and I certainly empathize with that. I think you might want to debug your
application and come up with some parameters for us to start helping you. A
reproducible example is always best but I understand it might be hard to
provide in this particular case.

Gary
Post by Shawn Heisey
Thanks,
Shawn
---------------------------------------------------------------------
Shawn Heisey
2018-03-02 03:46:00 UTC
Permalink
Post by Gary Gregory
I do not think this is a question I, or anyone here, can answer
generically. I can read between that lines that you must feel frustrated
and I certainly empathize with that. I think you might want to debug your
application and come up with some parameters for us to start helping you. A
reproducible example is always best but I understand it might be hard to
provide in this particular case.
There is a lot of frustration.  Until today all of it was directed at
our developers, for creating programs and configs that make way too many
connections to the DB.

But then today, I had that small eureka moment, thinking "wait a minute
... how can this even be happening at all, if the connection pool has
connections that the DB server says are active and idle?"

Reiterating something I said before: I know you can't help me with the
pools that the Tomcat servers are creating for our webapps.  So I'll
limit the rest of the discussion to my own program, which uses DBCP, and
has the same problems.

Please tell me what information you'd like me to provide. Anything that
is in my power, I will get it to you.

This is how I set up DBCP in my code:

  /*
   * Create a datasource (connection pool) for the master database server.
   */
  ConnectionFactory cfMaster = new
DriverManagerConnectionFactory(masterUrl, dbUser, dbPass);
  PoolableConnectionFactory pcfMaster = new
PoolableConnectionFactory(cfMaster, null);
  pcfMaster.setValidationQuery(validationQuery);
  pcfMaster.setValidationQueryTimeout(Const.FIVE_SECONDS / 1000);
  opMaster = new GenericObjectPool<>(pcfMaster);
  opMaster.setMaxWaitMillis(Const.THIRTY_SECONDS);
  opMaster.setMaxIdle(numShards);
  opMaster.setMaxTotal(numShards * 5);
  opMaster.setNumTestsPerEvictionRun(numShards * 5);
opMaster.setTimeBetweenEvictionRunsMillis(Const.FIVE_SECONDS);
  opMaster.setMinEvictableIdleTimeMillis(Const.ONE_MINUTE * 5);
  opMaster.setTestOnCreate(true);
  opMaster.setTestOnBorrow(true);
  opMaster.setTestOnReturn(true);
  opMaster.setTestWhileIdle(true);
  pcfMaster.setPool(opMaster);
  dsMaster = new PoolingDataSource<>(opMaster);

The JDBC driver we use is MySQL.  As of a few weeks ago, it was the
newest stable version available, 5.1.something.  Also at that time, I
was using the latest DBCP and POOL versions.  If any new versions have
come out very recently, I probably don't have them yet.

Typically the numShards value we're using is 6, to help with
understanding the code above.

Observations: When the MySQL server has reached its connection limit, at
least one of the idle connections is from this program using DBCP.  But
when the program attempts to use the DB, it gets the "Too many
connections" error response -- which means that it must be opening a
brand new connection, despite the fact that there SHOULD be at least one
that is ready and sitting in the pool.

The code that uses the DB is basic JDBC code.  It calls getConnection()
on the dataSource, verifies that the connection is valid, creates a
statement, executes it, and if it was a query, processes the resultset. 
Then it closes any resultset, closes the statement, and closes the
connection.  As I understand it, that close should return the connection
to the pool, still open, and ready for re-use.  This all happens within
a single thread.  I went through this code pretty closely for another
issue on this mailing list.  It's possible that I missed something, but
it looks very clean.

I was going to add some debug logging to my code, but I can't see any
way with PoolingDataSource to get the number of active and idle
connections, just to make SURE that the pool really has what I think it
does.

I have a code change ready to switch everything to BasicDataSource and
add the debug logging. It's generally less verbose code, and looks to
be just as configurable as PoolingDataSource. Would that change be a
good idea?

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@commons.apache.org
For additional commands, e-mail: user-***@commons.apache.org
Matt Sicker
2018-03-02 03:48:52 UTC
Permalink
Take a look inside commons-pool for the instrumentation (e.g., JMX). You
can also track usage on borrow and other leaks.

Also, Tomcat uses DBCP as it is.
Post by Gary Gregory
I do not think this is a question I, or anyone here, can answer
generically. I can read between that lines that you must feel frustrated
and I certainly empathize with that. I think you might want to debug your
application and come up with some parameters for us to start helping you. A
reproducible example is always best but I understand it might be hard to
provide in this particular case.
There is a lot of frustration. Until today all of it was directed at our
developers, for creating programs and configs that make way too many
connections to the DB.
But then today, I had that small eureka moment, thinking "wait a minute
... how can this even be happening at all, if the connection pool has
connections that the DB server says are active and idle?"
Reiterating something I said before: I know you can't help me with the
pools that the Tomcat servers are creating for our webapps. So I'll limit
the rest of the discussion to my own program, which uses DBCP, and has the
same problems.
Please tell me what information you'd like me to provide. Anything that is
in my power, I will get it to you.
/*
* Create a datasource (connection pool) for the master database server.
*/
ConnectionFactory cfMaster = new DriverManagerConnectionFactory(masterUrl,
dbUser, dbPass);
PoolableConnectionFactory pcfMaster = new PoolableConnectionFactory(cfMaster,
null);
pcfMaster.setValidationQuery(validationQuery);
pcfMaster.setValidationQueryTimeout(Const.FIVE_SECONDS / 1000);
opMaster = new GenericObjectPool<>(pcfMaster);
opMaster.setMaxWaitMillis(Const.THIRTY_SECONDS);
opMaster.setMaxIdle(numShards);
opMaster.setMaxTotal(numShards * 5);
opMaster.setNumTestsPerEvictionRun(numShards * 5);
opMaster.setTimeBetweenEvictionRunsMillis(Const.FIVE_SECONDS);
opMaster.setMinEvictableIdleTimeMillis(Const.ONE_MINUTE * 5);
opMaster.setTestOnCreate(true);
opMaster.setTestOnBorrow(true);
opMaster.setTestOnReturn(true);
opMaster.setTestWhileIdle(true);
pcfMaster.setPool(opMaster);
dsMaster = new PoolingDataSource<>(opMaster);
The JDBC driver we use is MySQL. As of a few weeks ago, it was the newest
stable version available, 5.1.something. Also at that time, I was using
the latest DBCP and POOL versions. If any new versions have come out very
recently, I probably don't have them yet.
Typically the numShards value we're using is 6, to help with understanding
the code above.
Observations: When the MySQL server has reached its connection limit, at
least one of the idle connections is from this program using DBCP. But
when the program attempts to use the DB, it gets the "Too many connections"
error response -- which means that it must be opening a brand new
connection, despite the fact that there SHOULD be at least one that is
ready and sitting in the pool.
The code that uses the DB is basic JDBC code. It calls getConnection() on
the dataSource, verifies that the connection is valid, creates a statement,
executes it, and if it was a query, processes the resultset. Then it
closes any resultset, closes the statement, and closes the connection. As
I understand it, that close should return the connection to the pool, still
open, and ready for re-use. This all happens within a single thread. I
went through this code pretty closely for another issue on this mailing
list. It's possible that I missed something, but it looks very clean.
I was going to add some debug logging to my code, but I can't see any way
with PoolingDataSource to get the number of active and idle connections,
just to make SURE that the pool really has what I think it does.
I have a code change ready to switch everything to BasicDataSource and add
the debug logging. It's generally less verbose code, and looks to be just
as configurable as PoolingDataSource. Would that change be a good idea?
Thanks,
Shawn
---------------------------------------------------------------------
--
Matt Sicker <***@gmail.com>
Shawn Heisey
2018-03-02 05:50:38 UTC
Permalink
Post by Matt Sicker
Take a look inside commons-pool for the instrumentation (e.g., JMX). You
can also track usage on borrow and other leaks.
Also, Tomcat uses DBCP as it is.
I know I can get those numbers from the object pool, but at the point
where I needed them, I don't actually know *which* DataSource is being
used.  It could be one of two.  So I don't know which object pool to
look at.

Interim solution before I switch to BasicDataSource:  Log info from BOTH
pools.  I did this, and here's some info that gets logged on the dev
server where I installed the change:

WARN  - 2018-03-01 21:59:00.313;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.314;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.315;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.316;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.318;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.319;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.320;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.321;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.322;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.323;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.324;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1
WARN  - 2018-03-01 21:59:00.325;   371; c; CP: main-active=0,
main-idle=1, master-active=0, master-idle=1

This logging happens just before a connection is obtained from the
pool.  The info logged is the same -- zero active, one or more idle. 
Except during program startup, when there are some lines where both are
zero.  And once an hour, there is a background thread that does a query
that takes nearly a minute to run, and while that's happening, I do see
main-active=1.

I did just now think of one possibility that might explain why this
program misbehaves when the problem happens, even if DBCP is working
right.  Based on the logging I've enabled, and the fact that nobody has
said "oh, we see that all the time, and the problem is probably X" ... I
think DBCP probably is working right.

====
Basic info about my program: All reads are done via the "main" pool,
which connects to a slave, unless connections there are not working,
then reads switch to the "master" pool for the next 1000 connections. 
All writes are done via the "master" pool.

When the master server reaches its connection limit, a completely
separate process that adds information to the monster table in the DB
cannot work.  It's not Java-based, and doesn't have connection pooling
available.

When there are no new docs in the monster table, my program doesn't have
anything to do, so it doesn't need to make any changes to its control
table -- it's not going to be using the master pool.  That probably
results in the idle connection to the master server being evicted five
minutes after the additions to the database stop.  Then because there's
now a connection available on the server, the other system can suddenly
add some documents.  So my program notices the new docs via its main
pool, processes them, and then it would need to update its control table
on the master server.  There's no idle connection because it got
evicted, so it tries to make a new one, and we get an explosion.
====

I still do have the separate problem of why our app servers explode with
"Too many connections" exceptions, when the DB pools there have dozens
of active connections.  I didn't write that code.  Maybe their DB access
(which is primarily through hibernate, a library I don't know much
about) is not properly releasing connections back to the pool, so the
pool does not think they're actually idle.  I will need to see what
logging they can add.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@commons.apache.org
For additional commands, e-mail: user-***@commons.apache.org

Phil Steitz
2018-03-02 03:36:46 UTC
Permalink
Post by Shawn Heisey
We have been having some problems lately where our MySQL server hits the
max connection limit (600) and then everything breaks. When I look into
the problem, I find that our application servers have each made nearly a
hundred connections to the DB and haven't closed any of them for hours.
I'm also using connection pooling in my programs, with the latest DBCP
version. Those servers don't open nearly as many connections, and have
idle eviction to keep the connection count down. But when the limit is
reached, these programs suddenly stop working too.
Investigating these problems, I manage to get connected and kill off the
surplus of idle connections, and everything starts working.
Today, a couple of days after the last incident, I realized that we
should *NOT* be having these problems -- because we're using connection
pooling. The application has open and idle connections to the DB server
... so why is trying to open MORE connections (and obviously failing)
instead of using one of the perfectly good connections that's already
sitting there, unused?
I'm writing here specifically for DBCP on my programs, so I know you
guys probably can't help with Tomcat's connection pooling ... but for
either case my question stands: Why isn't connection pooling doing its job?
Best to start by posting your pool config. One thing to bear in mind is that “idle” from The dB perspective does not mean the same thing as idle from the pool’s perspective. In addition to genuinely abandoned connections, another way that applications can hog dB connections is to check them out and hold them for a long time while not using them. On the dB engine side these will show as idle, but as they are checked out to clients they are not available in the pool. When you are having the problem, what does dB one report as numidle?

Phil
Post by Shawn Heisey
Thanks,
Shawn
---------------------------------------------------------------------
---------------------------------------------------------------------
To unsubscribe, e-mail: user-***@commons.apache.org
For additional commands, e-mail: user-***@commons.apache.org
Loading...