Dirsync Problem Summary and Update
Problem Summary
Updates
Technical Information
Summary
Dirsync is a process present in all components of the iPlanet mail
system that maintains a local copy of the enterprise LDAP directory. It
accomplishes this by querying the LDAP directory for all users once a
day, and querying for differential updates at regular intervals
throughout the day.
When dirsync pulls the LDAP directory, it does so using a number of
extremely broad and inefficient queries. These queries are saturating
the disk channels on our LDAP servers, and are causing a number of other
performance problems.
MST has been working with Sun/iPlanet to address what we thought was
a problem with dirsync. After several consultations, we've reached the
conclusion that dirsync is functioning as designed - it's just a
terribly inefficient and resource intensive process. We now know that
the problem is one of optimization to support dirsync in addition to our
normal application load.
MST is now working on mechanisms that can support these dirsync
operations without the performance problems they're currently causing.
This will involve optimizing the LDAP servers for a different class of
queries (with extremely large result sets) that are counter to the types
of queries we currently support. We therefore need to spend some testing
and experimentation time to find the right balance between optimizing
for dirsync and for other applications.
Since dirsync introduces a new dynamic, we need to take a look at the
types of optimizations we can perform through software and OS tuning. If
software and OS tuning is inadequate, we'll need to look at hardware or
other configuration options. Testing is underway to determine what
software and OS tuning can achieve, and how this will relate to the
purchase of two new servers for LDAP/MMP separation.
Updates 12/27/2001 - First cut of
baseline data Baseline data has been assembled, but pointed out
that there still appears to be some configuration difference between the
ITE and Prod. It appears that there is a dirsync process running against
the secondary ITE LDAP server, where this is not the case in production.
The differences between production and test must be ironed out before we
can complete our baselining. 1/8/2002 - Baseline information
complete All of our baseline information is complete and
proportional between production and test. We'll start with our first
round of changes tonight. 1/8/2002 - Cache Changes Two
changes implemented:
- Moved LDAP server cache to ramdisk
- Tuned LDAP server cache down to 512k and EntryCache to 0,
effectively disabling both of them.
LDAP server performance appears to have remained constant despite
these changes (good thing). The slapd process size is far smaller and
disk i/o is greatly reduced. We still have a proportionally high
number of writes on Oin and Gloin, but this is likely due to the fact
that the cache changes alleviated enough pressure on the machines that
usage patterns are now showing up in the trend data (ie - reads are
low because there is no load on the ITE). 1/10/2002 - Review
We haven't seen any negative side effects in the ITE and machine stats
still look good. Writes are still proportionally high, but we've
verified that this is due to the lack of load in the ITE. Next step is
to plan this change for production. 1/13/2002 - Cache Changes
rolled to prod LDAP cluster We moved our cache changes to the
production LDAP cluster and restarted the LDAP processes. The machines
seem to be much less loaded and disk I/O has gone down substantially.
This is probably a clear enough indicator to let the replacement
hardware order proceed.
Technical Resources
Machine Baseline Data
First Round of Changes (Cache
Mods to ITE)
Prod Implementation (Cache Mods to
prod LDAP cluster) |