We load about half of our physical memory with data (30 MB), and
even though there should be plenty memory available, we experience lots
of paging to disk. I would appreciate it if you could enlighten us as
to what is the problem.
--Antony Jankelowitz, (firm indeterminate)
My question is: Is there anyway to change the time slice on Solaris?
I don't know what the default is. But for example, if it is 20ms, we
change it to 50 ms for the first process and 40ms for the second
process. By doing this, there will be less swapping and better turn
round. Do you think this will make any difference?
--Jay, (firm indeterminate)
I ran se2.4 on both systems and it reported nothing wrong. They have
quite different profiles : One is an NFS server, one runs six Oracle
instances (DB server).
--Alexis Grandemange, (firm indeterminate)
here there are 568 switches but only 57 are involuntary.
Cumbersome, because you may have to crossmount libraries on a slow
network, or just put a separate copy on each computer. Again, the
alternative is to use just one computer for compilation and then take
the executable elsewhere. Impossible, because you cannot put each and
every runtime library on every computer, and not always is there a
network available -- just think of a laptop with limited disk space
away from the office.
In summary, dynamic linking may be better if you do not mind paying
good money for license fees, disks and networks. It is definitely
better if you are the person selling those licenses, disks and
networks. Otherwise, better take a second look at static linking.
--Hubert Meitz, (firm indeterminate)
A: It's a script, you can make it do anything you like.
If you want it to stop beeping you can use
virtual_adrian.se 30 -
to disable the audioplay command.
Does the current edition of your book cover SunOS 5.5?
Q:
Does the current edition of your book cover SunOS 5.5?
--(name and firm indeterminate)
A:
The book was written before Solaris 2.5 was released but there were few changes
and everything still applies.
The performance changes in Solaris 2.5 are the
subject of my "Sun Home Page" performance column this month.
What flags in /etc/system are related to security?
Q:
While the article on kernel and /etc/system parameters there are still
several things missing. In Solaris 2.4 there was a fix in /etc/system
set nfs:nfs_portmon=1
which made nfsd only accept connections from low numbered ports. This
is the same as running rpc.mountd on sunos4.x with a -n flag.
Unfortunately this same parameter does not exist on solaris 2.5. What
I need is a list of parameters such as this one which have nothing to
do with performance tuning but everything to do with securing my
system. I'd like a reference source for these sorts of tuning
parameters. They are not autoconfigured based on available resources.
A:
I concentrate on performance related parameters, I'm not a security expert.
However you will find set nfssrv:nfs_portmon=1 works in Solaris 2.5.
There were changes due to the integration of NFS3 into the system that seem
to have rearranged the nfs modules slightly.
I found that it still existed by running /usr/ccs/bin/nm on /dev/ksyms, then
located the module it contains in /kernel.
/usr/ccs/bin/nm /kernel/misc/nfssrv | grep portmon
[29] | 1676| 4|OBJT |LOCL |0 |3 |nfs_portmon
The manual page for nfsd(1m) also documents this change.
If the NFS_PORTMON variable is set, then clients are
required to use privileged ports (ports < IPPORT_RESERVED)
in order to get NFS services. This variable is equal to
zero by default. This variable has been moved from the
"nfs" module to the "nfssrv" module. To set the variable,
edit the /etc/system file and add this entry:
set nfssrv:nfs_portmon = 1
I need to reboot my Solaris 2.4 machines every two weeks. Why?
Q:
Hi, I have three or four SPARC 5 machines running Solaris 2.4 that have
to be rebooted every two weeks or even every week depending on what the
users are doing (e.g., every week if they continually run Matlab
simulations along with the usual programs such as text-editing, email;
longer for email and text-editing until they start running more
resource demanding programs). What happens is that the machines slow
down for a while and then completely freeze because they are paging in
and out. The users could not do anything, even to just move the mouse
from one window to another.
I ran some performance statistics using sar, ps and vmstat. On one of
the machines which is a SPARCprinter II server, I was convinced that
there was a memory shortage. So I added another 32MB RAM expanding the
total RAM to 64MB. It still slows down when it's printing, but the
system is able to recover from the paging activity unlike before when I
had to reboot the system. In this case, I did not think of a possible
kernel memory leak, so I did not collect sufficient statistics for
analysis (sar -k). Here are the statistics output and notes on the
analysis:
(Editor's note: extensive tables and statistics deleted)
I believe that all machines displaying the above-mentioned behavior
are experiencing some kind of memory leak somewhere. I tried to set
maxusers=40 in /etc/system, but that did not work. I would try setting
shared memory (shmsys), but I could not find any helpful hints on it.
Please help.
I appreciate any comments or thoughts on this issue.
--Cindy Doehr, (firm indeterminate)
A: It sounds like a kernel memory leak bug, have you tried loading the latest
set of kernel patches for Solaris 2.4?
I think swap size doesn't affect performance.
Q:
Most of your article on performance tuning is just fine, but
I disagree when you state that so long as applications fit,
swap size doesn't affect performance. I used to think
that, too, but then I learned about fragmentation, and
observed user complains going away when I increased their
swap.
Also, you mention ncsize and ufs_ninode. These are
discussed in Sun's old performance tuning overview. I'd
be interested in an updated discussion of these as pertains
to SunOS 5.3-5.
--Anthony D'Atri, (firm indeterminate)
A: The only mechanism I can think of is that the pageout swapfs clustering
could be limited if free swap space becomes fragmented into small
blocks, this might delay the availability of free pages a bit, but if
you are paging in and out at the same time to an overloaded disk you
are already in a slow performance mode. Adding swap space on a separate
disk helps.
Regarding ncsize and ufs_ninode, this is discussed in the book, and I
have touched on it a few times in my Web articles.
From further discussion it seems that your experience of performance
improvement may have been on SunOS 4.x systems. Solaris 2 has a completely
different swap space layout policy that avoids this problem as far as I
can tell.
How do I improve my Web
server's performance?
Q:
Just wanted to drop you a quick email to add to your no doubt groaning
emailbox, to thank you for the articles on performance on the web.
I am currently a system admin. at a small internet provider (but
growing fast) in the UK, which means I do everything ;-)
Our Web server, sparc 20, sol 2.5, has just died performance-wise
over the last week.... (seem to be loads of time_waits) and whilst
doing everything else to keep the provider running, Ive got to look at
it urgently before all our customers leave :-(
Anyway, I asked for a list of /etc/system parms and was recommended
to read your articles.. I've only just started going through them, but
already I can see I have to order your book ;-)
I can see this week will be spent reading everything you've written,
that I can get my hands on, to try and solve this problem, so thought I
should at least email you to say thanks for the info.
--Keith Pritchard, (firm indeterminate)
A:
This is the subject of my March column.
In this particular case it turned out that the workload had crept up
until there was no more Internet bandwidth left to/from this system.
Increasing the speed of the Internet link fixed it.
What's the latest
recommended version of Solaris for older SPARC
computers?
Q:
Currently we have an installed base of Sun SPARC 4/110, 4/330
and 4/75s. I was wondering whether running Solaris 2.4/5 is
compatible with these old hardware platforms, and what are the
performance implications by running just Solaris and/or any
simple X-based application.
--Ioannis M. Kyratzoglou, Mitre
A: Solaris 2.4 is the last release that works on a 4/110 or 4/260 or
4/330, or 4/490 -- we have upgraded our lab systems to 4/600 CPU boards
with SuperSPARC modules that will run the latest OS.
All others run 2.5, the latest releases are faster and smaller than
older Solaris 2 versions. Solaris 2.5 uses more RAM than 4.X, but most
things are faster than 4.X, a few operations are slower because
extra functionality has been added.
If you have enough RAM I'd upgrade them, if any are marginal don't
bother. 32 MB is probably the minimum: you can't waste CPU cycles
paging on a slow system, so you need more RAM to keep up with more
recent hardware.
Should I use the SSA NVSIMM
and Presto NVSIMMs together?
Q:
I have heard and read conflicting views of using SSA NVSIMM and Presto
NVSIMM items together. While I understand the configuration problem
of putting one logically before the other in order to have an orderly
recovery after a system crash, I still am not clear that using a
Presto-NVSIMM with an SSA-NVSIMM gives you anything over just the
SSA-NVSIMM alone.
--Kerry P. Boomsliter, Knight-Ridder Information
A: The bandwidth to/from Presto NVRAM is perhaps 10 times that of SSA
NVRAM. The CPU does more work with Presto as the data is copied to
and from the Presto NVRAM, but doesn't bottleneck on the disk interface
like the SSA NVRAM.
That is the main difference. Either one gets you most of the
performance boost by changing disk accesses into memory accesses, but
the higher bandwidth and capacity of Presto gets better response times,
and the SSA NVRAM has higher throughput. Both together is the best option
for maximum performance. The recent introduction of 16MB SSA NVRAMa (up from
4MB in older SPARCstorage Arrays) also helps performance for write-intensive
applications.
Q:
In your answer to the person whose NNTP server
had overworked disks, you said adding an NVRAM SIMM and
Legato's Prestoserve software was the best thing to do.
I had always associated Prestoserve with NFS service.
Does your advice imply these pieces of hardware and
software are for use in more than just NFS servers?
Should I consider putting them in any machine with a high
I/O load?
--(name and firm withheld)
A: Directory and inode updates are synchronous, also files are flushed
when they are closed. The NVSIMM defers and coalesces all synchronous
I/Os, and has no effect on regular writes to a local filesystem. NFS
writes are also synchronous, which is why it helps NFS. News and mail
do a lot of directory and inode updates, and create/move/delete small
files which is why NVSIMM helps a lot.
Q:
You seem to be very free in recommending NVSIMMs, everywhere
filesystem performance is mentioned :-)
Having recently gotten my greedy little hands on ODS4.0,
I am playing with the "metatrans" device, known to the rest
of the world as journaling filesystems. It seems to speed
up filesystem access quite nicely, although I have never
had NVSIMMs to play with, to compare.
Could you provide some juicy details comparing the pros
and cons of both approaches?
--Philip Brown, (firm indeterminate)
A:
Bandwidth to an NVSIMM is greater than 100MB/s, SBus Presto is 30MB/s?,
log disk is 2-5MB/s, random writes go at 400KB/s on a good day.
Converting random writes to any of the others is faster, NVSIMM is
the lowest latency, lowest overhead option.
Remember that the data must be written to the log (low latency as far as
user is concerned) then must be read back and written to the filesystem
(extra overhead and throughput needed, not latency sensitive).
Philip Brown replies:
All the data has to be written there?? This seems strange to me. How is it
then, that it speeds up ufs throughput as much as it does?
If it only does that.. it would seem to me, that it would only become
equal to ufs speed, not exceed it.
Adrian answers:
Only the synchronous writes go to the log, inode updates and
synch writes on NFS servers.
It speeds up allocation of blocks by forcing them to be sequential,
regardless of how many inode and indirect block updates are needed
on the way.
Any tips for Solaris X86 administrators?
Q:
(Your) performance tuning articles are excellent, but I wonder
if anyone has any performance tips or gotchas for the vaguarities
of the Intel platforms running Solaris x86?
What special concerns come up on those hardware platforms?
--(name and firm indeterminate)
A:
I don't have any Solaris x86 systems (I work in SMCC remember :-)
but almost everything I say about Solaris on SPARC applies also
to Solaris x86. The SE toolkit supports Solaris x86.
Is there a Solaris 2.4
kernel tuning parameter that stop unfriendly programs from taking over
a system?
Q:
Is there a Solaris 2.4 kernel tuning parameter (like maxuprc)
that would allow sysadmins to stop unfriendly programs from
taking over a system? The problem we have sometimes seen is
a poorly written program forking off infinite copies of itself
until the machine dies or hits its process limit. We want to
be able to limit a user's total to, say, 100 processes.
Is this possible under Solaris 2.4?
--Lance Nakata, Stanford University
A: The same maxuprc variable does this for you in Solaris 2.X
set maxuprc=100 in /etc/system and reboot
We see a "Allocation errors, kmap full?"
message on SPARCstation 20 with 512 megabytes. Why?
Q:
Hello Adrian. I have read your book a dozen times and use your tools.
Excellent. I have a question about an "Allocation errors, kmap full?"
message we received last week on one of our production servers. It is
a SS20 with 512 megabytes of RAM. For some weird reason, it started
canceling telnet and rlogin connections and I have a feeling they were
during the same time we received the kmap full error messages. Could
you explain? Every once in a while we would receive mutex contention
errors as well.
I know this is in the dark work without the specs on the system,
processes running, system configuration and things like that. But, you
are an expert and I figured you could point me in the right direction. Thanks!
--Neil Greene, Sr Oracle DBA / Unix Administrator,
SHL Systemhouse
A:
If the kernel can't grab memory it will cause a login or telnet to fail
and you will get allocation errors.
If it persists, the machine stops working, and you need a reboot to fix,
it means the kernel got too big. To fix this reduce maxusers to 200 or
so, set bufhwm to 4000, upgrade to 2.5 (which has more kmap on sun4m)
or upgrade to SS1000 or UltraSPARC systems that have much bigger kmap.
If it comes and goes, then the free list was empty so no pages for
the kernel to grab. Set lotsfree to 512 and desfree to 256, leaving
minfree alone. Increase slowscan to 500.
This is a fairly common problem with 512MB SS20's.
Help me program asynchronous I/O.
Q:
I was wondering if you could point me in the right direction on which
Solaris 2.4 patch will enable me to do asynchronous I/Os (aio_read and
aio_write). The man page says that async. support is a future release,
but in one of your articles you mentioned a patch that would allow
async. I/O. I just installed the latest jumbo patch (101945-34) but
the routines still return -1 (errno set to ENOSYS). Any help would be
greatly appreciated.
Thanks,
--Chuck Williams,
Senior Telecommunication Systems Engineer, Loral Test & Information Systems
A: You need to look on the second CD that comes with 2.4 or in the Patches
directory on the main 2.4 CD. Kernel async I/O was shipped with the 2.4
release but was not installed by default. There is probably an updated
release of that patch to look for once you know its number.
In the meantime, the aioread calls should work with no patches, the KAIO
fast path in the patch is only really needed for Sybase on raw disks.
My guess is that you are not using the API correctly in some way.
How should I partition my hard disk?
Q: I always face hot comments when I suggest to bundle /, /usr
and /var under one large partition... I do not think that
having separate partitions is needed anymore. Am I right?
Is there any good reason to split them these days ? I know
that in the past it was needed because of small disks but
now ? It is an issue I would like to close once and for all.
What are your thoughts about this?
--Benoit Gendron, (firm indeterminate)
A:
My book
Sun Performance and Tuning: SPARC and Solaris
contains my thoughts on this subject. I recommend one partition
for desktops, and keeping /var separate on servers only -- so
that /var/mail can have Prestoserve acceleration.
Also makes upgrades much easier.
Will I/O be faster on a
64-bit file system, especially on a database application like
Oracle?
Q:
You have not focused on 64-bit file systems and performance
in your article. Will I/O be faster on a 64-bit file system
and special on a database application like Oracle?
--(name and firm indeterminate)
A:
64-bit file sizes and file systems can and will be implemented on any
system. Solaris already supports 1 TB file systems, and 2 GB files.
Oracle runs best on a raw disk setup, and there are no 64-bit
features that would speedup file system accesses.
What's better for a Web server: UltraSPARC or hyperSPARC?
Q:
We have a SS20/712 as an applications layer firewall that is
(at times) completely CPU bound w/ all the http traffic
going through it. We are in the process of enhancing the
http-proxy w/ all the recommendations made on this web page.
However, until that is done we want to increase the throughput
via hardware, i.e., faster processor. We are looking at
the 100-MHz HyperSPARC setup, but don't know what the optimal
cache size would be. We have a choice of 1M or 256K. Please
help. In a networked environment what would be the preferred
(fastest) for us?
--Mike McPherson, (firm indeterminate)
A: I would spend the money on an UltraServer 1 Model 170
Solaris 2.5 is a bit more efficient than 2.4, and the faster CPU and
system bandwidth will probably work better than a dual CPU SS20.
I don't think HyperSPARC systems run kernel code as well as
SuperSPARC systems. I found the 125-MHz 256KB HyperSPARCs were about
the same as 60-MHz SuperSPARCs for running commercial applications like
database backends that do a lot of kernel work.
Is the http proxy forking for every request? If so, a preforked or
threaded proxy would be much better -- i.e., Netscape or phttpd or
Apache, but not CERN or NCSA.
Why shouldn't I run
CacheFS on a read-write filesystem?
Q: I attended a seminar you gave
at a Computer Literacy in San Jose a while back and remember you
mentioning a caveat about using CacheFS.
I remember you saying something like "it's not a good idea to use
CacheFS on a r/w filesystem." What I can't remember is WHY.
Is it because writes through CacheFS are slower, or is it because
writes through CacheFS are unreliable? Or does having an r/w fs mounted
through CacheFS cause performance of CacheFS to drop in general?
Also do you have any suggestions for CFS option settings for read-mostly
filesystems ?
--Jim Burwell, Systems/Network Admin., Broadvision
A:
Read-mostly is fine. If you only read the data once don't bother caching it,
if you keep changing a lot of it it is a waste of time caching it.
If you have a few updates, but mostly read the data it should give a
good speedup. /var/mail is a really bad choice, /home
is usually OK, /export/local (or whatever you mount
applications on) is a good idea.
How do I interpret the w column in vmstat?
Q: We have a SPARCserver 1000E/Solaris 2.4 with four CPUs and 610 megabytes of RAM
as a dedicated Sybase server. vmstat 5 is used to monitor the
system at all times. Recently, the third column 'w' of
'procs' in vmstat's output started to report a value of around
20 and rarely changed. This value shows up again even after
rebooting the system many times during the past month. This
seems to indicate we have a memory shortage because
swapping occurred. But my questions are:
- Why does the free swap space still show a big value (i.e., 201356)
indicating we have plenty of it? (Physical swap space
is 100 megabytes)
- Why does the value in 'w' column stay the same regardless the
load on the system? (Ten users and 300 users produce the same value.)
- Sun tech support tells me there is no problem on our system
as long as it runs OK, but Mike Loukides's book System Performance Tuning
tells me that when swapping occurs, my sysadmin needs to find the problem because
it may be the tip of the iceberg. To whom should I listen?
- Sun tech support tells me swapping and paging are the same
thing. I disagree. Who is right?
- Sun tech support tells me that we should have a swap space
at least as big as our memory size. Again, I disagree --
based on your book and my own experience. Who's right?
--(name and firm indeterminate)
A: vmstat w reports the number of processes that are currently swapped out.
Those 20 processes all are idle ones. This is not a performance
problem. The Loukides book is rather out of date in places, and is not
particularly relevant to Solaris 2.
If you run vmstat -S and see lots of si and so, you
might have a problem. Here's a reminder of what vmstat -S looks like:
vmstat -S 5
procs memory page disk faults cpu
r b w swap free si so pi po fr de sr f0 s0 s1 s2 in sy cs us sy id
0 0 0 137392 15608 0 0 2 2 5 0 55 0 0 0 0 132 319 69 2 1 98
Swapping moves whole processes to the swap space, and paging is
done a page at a time. Page-outs occur in
large clusters, so the net effect is not all that different.
Swap space size is not a performance issue. If you have enough to run
your apps reliably without running out at peak loads then you should be
happy. If you want to collect crash dumps you might need more. That is
one reason why SunService recommends setting swap equal to RAM.
How do I tune the Solaris kernel?
Q:
We are veteran Interactive users and used to tuning the kernel
using kconfig (mtune and stune files). We
are now porting to Solaris x86 (Base Server) and need to be able to
make equivalent tuning changes. In particular, we need to increase the
various values associated with IPC queue (MSGMAX, MSGTQL, etc.). We
have found one cryptic way to do this by hacking at system(4). Is there
a better way and is there any comprehensive documentation source on
tuning kernel parameters under Solaris x86? Thanks.
--Ken Robbins, firm indeterminate
A: The "better way" involves editing /etc/system.
The performance manual section offers little help, but does list
some parameters. My book (Sun Performance and Tuning) contains
more details, including the algorithms that are being tuned.
Your question is a common one. I will probably address the
question "What is the list of tunables in Solaris?" in a future
column. There is no easy answer, unfortunately.
How can I time-out orphaned processes in Solaris?
Q:
At Brown & Root, we run both Solaris and AIX servers.
On all servers, we have Oracle as our database. On
occasion, some clients' Oracle processes
remain active even after they have logged off. In AIX,
we have found two parameters, tcp_keepidle and tcp_keepalive,
that help us timeout these orphaned processes. Is
there anything comparable in Solaris?
--Jacques Dejean, Brown & Root
A:
Your looking for the Solaris ndd command, find a description of it
and the values it can be assigned in appendix E of
TCP/IP Illustrated, Volume 1 by W. Richard
Stevens. This book is also a complete reference to TCP/IP and how it works.
Make sure you understand the implications of any TCP tweaks. You can
easily mess up the standard algorithm if you set it up wrong.
What causes slow rlogin?
Q:
What are likely causes of extremely slow rlogin both to and from a
machine? The machine in question is seldom busy. It takes about 60
seconds to do rlogin from or to the machine. Once rlogin is completed,
response is fine.
--Mike Kelly, firm indeterminate
A:
Check for:
- Incorrect routing setup -- use ping -sRv to check route to NFS servers,
etc.
- NIS, NIS+ or DNS server problems.
- Automount or NFS server problems.
- Bad directories in set path= in .cshrc or similar files.
- Symbolic links in home directory to /net/system/somewhere.
To diagnose this problem, use etherfind or snoop (Solaris 2) on a third system,
capture all packets in and out of the slow machine, and look at the
sequence and timestamps to see which part of the sequence is taking a
long time.
I get this problem myself, normally due to routing foul-ups. It may also help
to put "file" at the start of the name server lookup path for hosts and password.
I use "file nisplus dns" for hosts in /etc/nsswitch.conf, as I
find that the system boots much more quickly if it looks up
system identities for its main routers and servers in the /etc/hosts file.
Q:
This is in response to your December Performance Q&A
column and the question about what causes slow rlogins.
The most common reason I see (and hear about) a slow login
is the remote site using daemons or protocol wrappers that
use the ident protocol to lookup who is trying to connect.
I use a TCP wrapper that logs user names on a daily to
weekly basis. The lookup can cause a login delay of up to
a configurable timeout (2 seconds) if the client machine
is not running an identd daemon.
Another common cause on a busy machine is when the remote
site does not have enough physical memory and must swap to
get the login daemons or the shell loaded and running.
Hope this helps...
--Michael Johnson, CS Undergrad,
Oregon State University
Any performance tuning hints for Solaris 2.5?
Q:
I'm hoping that when Solaris 2.5 comes out you can dedicate
an article or a series of articles to the improvements and
kernel /etc/system parameters that should or should not be
set for 2.5. In reading your book, you gave different
hints for different types of systems (i.e., servers vs.
hosts), and the hints varied depending upon the version of
Solaris being used. I'm guessing that when 2.5 comes out,
it'll be different from 2.4, so it'd be nice to know what
changes have been made and what performance tuning hints
are applicable for 2.5.
--Blair Zajac, firm indeterminate
A:
Since I'm writing this before Solaris 2.5 is officially released, I can't offer
much guidance yet. I'll cover tunables soon. It takes a while to
figure out how to tweak a new OS release. There are a few new NFS V3
variables. The rest is basically identical to 2.4.
Editor's Note: Solaris 2.5 was announced at the end of October. Shipment for the SPARC and Intel versions of the new OS has just begun; Solaris for PowerPC recently entered beta testing and is expected to ship early next year.
Why do some login IDs in SunOS 4.1 accounting files change?
Q:
I hope you can help me out. Lately, I had to look at the
/usr/adm/acct/fiscal/fiscrptxx files. I found that some login IDs
had two entries per file, while other login IDs had one.
Can you tell me what the problem is, or at least give
me a hint? I need to use the files for performance evaluation
purposes. Do I have to add up the entries corresponding to a given
login ID per file?
--Halim M. Khelalfa, AI Division, CERIST
A: I'm not sure, and I haven't used accounting on 4.1.3 for many years.
One guess: Perhaps some users changed their group ID, keeping the same
user ID, during the month.
Why doesn't my virtual memory monitoring program
add up?
Q:
I now have a hard copy of your System Performance Monitoring
article and will read it soon. First, I am going to take advantage
of your offer to answer questions about this subject.
One of my personal monitoring programs presents physical memory
utilization, which I calculate based on the following method. I
believe it works and the assumptions are correct, but I'd like
your opinion on its accuracy.
First I get some static facts:
V = total virtual memory size (everything is in kilobytes)
R = total real (physical) memory size
Next, I get 1-second snapshots of transient facts:
A = allocated (in-use) virtual memory
F = free (available) virtual memory, in the form of free
resident memory pages (now in the physical memory)
IF ( A + F ) >= R
THEN U = 100%
ELSE U = (100% * A) / (R - F)
There are cases when A < R but I report 100 percent because of the
free pages that inhabit physical memory, forcing some allocated
pages to be swapped out. I am not concerned about this because
there will be (at least a potential for) thrashing, and that's
what 100-percent physical memory utilization is supposed to indicate.
What I am concerned about is when ( A + F ) <= R yet there is a
potential for thrashing -- and I don't know why -- because there
is something missing from my equations.
Notes:
- Sun does not present non-zero "avm" (active virtual memory) values
from a vmstat report, so I must get A from pstat -s. The
V and R values are from dmesg.
- When I asked a Sun performance person why that was
so, I got led off on a tangent about how unnecessary
my calculations were. ("Why do you want to know the
physical memory utilization?") I am hoping that your
answer is more to the point.
--Alex Vrenios, EMTEK Health Care Systems
A: Look at my Unix Insider column entitled "Help! I've lost
my memory!" Then it may become clear why your calculation does not
work. The VM system is far more complex than your simple equation presumes.
I don't think the available data is sufficient to
model memory use. In particular, the only data available on a per-process basis
is the size of the address space for the process, and the amount that
has valid memory mappings. These values can be seen (measured in kilobytes)
via the old-style ps command, in the SZ (process size) and RSS
(process resident set size) fields:
% /usr/ucb/ps uax
USER PID %CPU %MEM SZ RSS TT S START TIME COMMAND
root 2026 3.0 2.1 1424 1284 pts/6 O 23:14:29 0:01 /usr/ucb/ps uax
adrianc 2021 0.7 4.1 3444 2500 ?? S 23:14:26 0:00 /usr/openwin/bin/c
adrianc 1785 0.6 11.110048 6840 console S 20:50:55 1:12 /usr/openwin/bin/X
adrianc 2024 0.3 1.4 980 856 pts/6 S 23:14:27 0:00 /bin/csh
...
Unfortunately for your calculation, the RSS excludes pages that are in
memory but do not have valid mappings, and it includes pages that are shared
by other processes. Your calculation also doesn't consider the memory used by
files that are cached. To obtain this data, kernel code
would have to be written that traverses many data structures and tallies
the pages. This is not available in the base release, or in any commercial
performance tools that I am aware of.
I think it would be useful to have more information about memory usage,
and it is on my list of things I'd like to see added to Solaris.
Are kernel memory allocation errors worth worrying
about?
Q:
While recently monitoring a SPARCcenter 2000E with RuleTool, I noted that
the system was regularly experiencing kernel memory allocation errors.
I tried to find some info on the seriousness of this, but wasn't
able to find much other than it possibly being caused by a
memory leak. A call to SunService seemed to indicate that as
long as the frequency was very low (it was, approx. 5 per day)
it wasn't a cause for concern.
I would like more info on this in a future article (or in your next
book Performance Tuning: The Sequel). I'd like to congratulate
you on your book and I assume it's doing well considering it fills a
void that's been around for years. (I purchased two copies myself, and
have influenced several others in purchasing the book.)
--Greg Wells, firm indeterminate
A: If the system can't grab memory when it needs it and can't wait, then
you can get problems like a stream or login attempt
failing. There are other reasons why allocation failures occur; in most
cases, the system finds a way to retry the operation and succeed.
This problem happens mostly on Solaris 2.4 and 2.5 multiprocessor systems,
not so often on Solaris 2.3 or uniprocessor systems.
If you see kmem allocation errors (sar -k 1), then increase
the free list so that it is less likely to hit the endstop.
Set lotsfree to 128 * the number of CPUs you have or set up
virtual_adrian.se
to run every time you reboot and it will set this for you.
Adding more RAM doesn't help, as the free list size is not scaled.
As you can see below, I've had a few errors on my Ultra 1, but I take
this as a warning, not a serious worry. It is useful to track it in
case something else fails at the same time as a new kmem error.
% sar -k 1
SunOS eccles 5.5 Generic sun4u 11/05/95
23:30:18 sml_mem alloc fail lg_mem alloc fail ovsz_alloc fail
23:30:19 4046848 3611540 0 7536640 6492776 8 5373952 0
How can I improve my Web server's http performance?
Q:
My issue deals with a problem I have been seeing on more
and more Solaris 2.4 systems running as WWW servers.
First, I do realize that the http protocol was not designed
to work with TCP/IP. In fact, it butchers it, but since
it's a growing phenomena, we need to tune the system for it!
Now, the problem I have been seeing. When dialup users
connect to these WWW servers via SLIP/PPP, Solaris apparently
drops a lot of packets, and a lot of retransmissions are occurring
as shown from the results of the netstat -s command.
What I discovered is that the default setting of 200 for
tcp_rexmit_interval_min is too low. Setting this up to 10000
finally gives good performance results. However, as you are well
aware, this will increase the amount of time the system waits
before a retransmission takes place after a packet is dropped.
Catch-22! ;)
I also noted that the listen backlog parameter set by ndd:
tcp_conn_req_max is set to 5 and allows a maximum value of 32.
How can I optimize Solaris 2.4 to perform well as a WWW server?
More and more clients are asking me to improve WWW performance
on Sun.
--Boni Bruno, Data Systems West
A: There is an excessive retransmit bug that is fixed in Solaris 2.4 patch
101945-34 (Sun's recently released kernel jumbo patch) and Solaris 2.5.
You will still see retransmit levels of 10-30% on machines with
direct Internet connections. You can reduce them by setting the initial
retransmit interval to a second or so (1000, as the units are ms).
Most packets seems to take at least a second to get to their destination
and get an acknowledgement back over the Internet! You should not set
it to much more than a second.
The limit for tcp_conn_req_max should be set to 32
in 2.4, and can be set up to 1024 with ndd in 2.5 if you have enough
memory to hold all those pending connections. A setting of 128 seems
to work well on Solaris 2.5, and is being used on some big internet sites.
Add these lines to /etc/init.d/inetinit
ndd -set /dev/tcp tcp_rexmit_interval_initial 1000
ndd -set /dev/tcp tcp_conn_req_max 32
We also have fast name service caching in 2.5, so DNS (Domain Name
System) lookups get cached (see the nscd man page). In general
2.5 is a much faster Internet server than 2.4, even though there are
several areas where tuning work is still underway.
Does Solaris offer a vmtune-like tool?
Q:
I've recently started using Suns. With Sequent Dynix/ptx (based on AT&T
V3.2), a vmtune utility controls virtual memory (VM) management.
- Does Solaris have a vmtune-like virtual memory tool?
- Is there Maximum Resident Set size for each process on Solaris?
If so, how can one modify this value?
- Is there "swapout" on Solaris? In my environment, there is no
swapping and free memory is always more than 25 megabytes. I think
Solaris was designed to avoid swapping. If Solaris doesn't avoid
swapping, when does swap occur?
--MyungSuk Yoo, Bombardier Regional Aircraft
A:
There are no controls on resident set size per process in any of the mainstream
versions of Unix. Why not? Well, it's hard to get a default behavior that works any
better than the current system over a wide range of system sizes
and workloads. Also, implementing a working set pager requires a
lot more overhead, in terms of both CPU use and kernel data storage.
In Solaris 2.4, swapouts of large idle processes occur if free memory
stays well below its normal level for several seconds.
Why are my news spool disks overloaded?
Q:
I am running a news server and I am getting very poor
performance from it. It is running on a SPARCserver 1000 with
640 megabytes of RAM. The news software (INN) resides on /opt (sd1)
and Solaris 2.4 resides on sd0. iostat -x 30 indicates that at least
one of my bottlenecks can be attributed to my disks, primarily the spool.
I am striping 3 disks (sd15 sd37 sd7)
using Online DiskSuite. The stripe has an interlace value of 16 blocks.
Below is some of the output from iostat -x 30. As you
can see, most of the load is caused by writes to the spool.
disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b
sd0 0.0 3.3 0.0 19.9 0.0 0.1 34.6 0 5
sd1 0.0 15.7 0.0 99.3 0.0 0.4 24.7 0 39
sd15 1.4 18.0 3.6 97.9 45.1 46.9 4737.6 44 79
sd37 0.7 16.5 3.2 99.9 10.5 7.1 1025.1 9 22
sd7 0.5 16.5 2.7 98.7 9.7 6.8 972.8 9 20
extended disk statistics
disk r/s w/s Kr/s Kw/s wait actv svc_t %w %b
sd0 0.0 3.8 0.0 24.6 0.0 0.1 38.2 0 5
sd1 0.0 15.9 0.0 101.1 0.0 0.4 23.9 0 38
sd15 1.1 17.5 2.3 100.3 14.0 54.2 3656.0 36 73
sd37 0.8 15.5 3.3 96.9 9.0 6.6 961.8 8 21
sd7 0.6 15.5 3.2 97.0 8.9 6.1 929.2 8 18
--(name and firm indeterminate)
A:
Those disks are dead meat! A slow service time is 50 milliseconds; 4737 ms is
glacier-like speed. As you can see, there are 47 active commands inside
the disk drive,
and 45 commands waiting to be sent to the drive. Each new command you send to
the drive has to wait for 92 other commands to finish first. Thus it takes almost
5 seconds to service each I/O. Dividing down, 4737 ms/92 commands = 51 ms for
each I/O at the disk drive. This indicates a lot of long seeks -- probably
random seeks between inodes and data in many parts of the disk drive.
The problem is lots of files being created, touched, and destroyed; lots of
inode updates; and Directory Name Lookup Cache (DNLC) activity (i.e., a busy NNTP
[Network News Transfer Protocol] server).
The best fix: Add non-volatile (NV) SIMMs and Legato's Prestoserve software. This will help
a lot more than anything else. If the disks are still too busy, you need more
of them, and you need an NVRAM disk cache. A SPARCstorage Array (SSA) with 12 or so
disks would give you a wider stripe. The SSA NVRAM is a reasonable substitute
for the Prestoserve NVSIMMs, but both together is even better. Note that you do not need the storage
capacity of 12 disk drives, but you look as if you need the random I/O
performance of them. Twelve disks may seem extreme, but so does a 4700-ms
service time!
Increasing ncsize and ufs_ninode to 34000 in /etc/system
may help a little.
With 640 megabytes of RAM, maxusers should be at 640 already, and the
caches will already be quite large. If you have set maxusers
directly to some low value then you should remove it from /etc/system
and let it size automatically.
Where can I find the HP-UX version of SymbEL?
Q: I've printed out your article to read while working on my HP-UX system.
Of course, my question is do you have a version for 9.X on a 9000/735?
--dave, (firm indeterminate)
A: It is difficult to build a useful SE language on HP-UX, AIX or
other OSes. The trick on Solaris 2 is that the /dev/kstat interface is
a readonly, nonpriviledged interface, note that vmstat etc are no longer
setuid commands in Solaris 2. This means that you can write easy scripts
that can get at almost all the performance data, without running as
root, and without making the binary setuid.
That said, Rich Pettit has recently been looking at the performance
interfaces on other platforms. They are effectively undocumented on HP-UX
as far as we can tell, and without sourcecode, it is hard to work out
what to do.
Overall, one of my aims was to make Solaris 2 a better OS to work with
than other OSes, so I'm not very motivated to spend time working on ports.
I also have way too many unimplemented ideas for Solaris 2 to work on.
Which is better, SPARC or Pentium?
Q: My organisation is arguing about SPARC servers versus
UNIX pentium-based servers. What I really have to prove
to them is that the SPARC motherboard is faster, more
reliable and robust, and is a market leader.
Could you please mail me relevant information QUICKLY before
the management decides to scrap Sun? Thanks.
--Jean-Pierre, (firm indeterminate)
A:
This is really a job for your local Sun sales team to take on.
What I can say is that we have been benchmarking Solaris on Intel 166MHz
Pentium's and on Ultra1's, for network server workloads the Ultra's are two to
three times faster. The PC motherboard and ethernet hardware does not have
the bandwith to compete. We actually get throughput on SBus of over 100MBytes/s
on an Ultra 1, the PCI bus on a Pentium PC is rated at 100MB/s or so in theory,
but in practice you are lucky if you get over 10MB/s.
The PC hardware is designed down to a price, and most of the I/O adaptors
are also not designed for demanding applications.
Some Web server benchmarks backing this up should be put up on www.sun.com
soon, but they are not available yet.
The other issue is support, and since the hardware and software are from
the same company, we can give better integrated support with less finger-
pointing between hardware/OS/IOcard suppliers when something isn't working.
Why doesn't 32 megabytes
seem like enough?
Q: The short question is, "why does it seem that a SPARC 4
running solaris 2.4 w/ 32 megabytes of RAM has too little memory?"
I have such a system and it seems almost hopeless to have
emacs, gdb, netscape, and g++ running -- unless you like
to hear the disk spinning (paging).
I think your columns are great, so thanks!
--Dave, (firm indeterminate)
A:
See Adrian's May column for an indirect answer to this
question.
Is there a way to measure
the amount of CPU used by AIO "waiting" methods?
Q: Based on the way kaio works in Solaris, the question has
come up regarding the possibility of being CPU bound because
you are i/o bound. The thought process on this is based on
the asumption that Solaris is asynchronously "waiting" for
the i/o to complete by doing a SIGIO with a polling timeout
versus the synchronous method of using the aiowait function.
Can you help to clear this up ? Is there a way to measure
the amount of CPU used by these "waiting" methods ?
--Marty Carangelo, Amdahl
A: Solaris doesn't poll. The user application might if it was using
aio, but Solaris waits for the interrupt to wake up the thread that
issued the I/O.
How many syscalls are too many?
Q: I don't find a threshold mentioned for the three fault
categories. Here is a sample of output from a SC2000
r b w swap free re mf pi po fr de sr s1 s1 s1 s3 in sy cs us sy id
0 0 0 2502796 1136148 0 0 0 0 0 0 0 0 0 0 0 172 103473 498 61 29 10
0 0 0 2502796 1136148 0 0 0 0 0 0 0 0 0 0 0 137 104002 487 62 29 9
0 0 0 2502796 1136148 0 0 0 0 0 0 0 0 0 0 0 154 104144 463 62 29 9
0 0 0 2502796 1136148 0 0 0 0 0 0 0 1 0 0 0 109 104189 467 63 28 9
It seems to me the fault/sy numbers are high. What do you consider high?
--Kenneth Woods, US Navy
A: syscalls are a byproduct of an application doing work. The more the better,
as it means you are doing more work more quickly.
In absolute terms 100000 is quite reasonable for an SC2000, especially
since you have usr:sys times in a 2:1 ratio which is quite healthy.
What can I do when the kernel memory button in ruletool goes black?
Q: We have a SPARCcenter 2000 with 8 cpus, 3 gb of RAM, 150 gb of disk.
We are running a lot of processes, a dbms, and have quite a few users.
I tuned the system file as near as I can tell according to the
guidelines in your book and using the output of ruletool. The box keeps
coming to its knees with kmem allocation errors - no kmem available
(from ruletool). Of course - the kernel mem button in ruletool goes
black and the box can't recover. I end up having to stop-a and reboot.
Any help you can provide would be greatly appreciated as I am about to
be ran out of Dodge by the local town folk.
--Ted Regan, EDS
A: fixed in Solaris 2.5.1 with an algorithm change and a bigger free list.
In the meantime set slowscan=500, and keep doubling
lotsfree and desfree until the alloc fails go away.
try this first (assuming lots - 3GB of RAM)
set slowscan=500
set lotsfree=4000
set desfree=2000
How large can a process be in Solaris?
Q: You say a SPARC Center 2000 can support 5gig of ram, is there
still a 4gig per process limit on this system?
Does Solaris 2.5 support more than 4 gig per process?
Does Solaris 2.5.1 support more than 4 gig per process?
--Chris Krebs, (firm indeterminate)
A: Note that the Enterprise 6000 machine can support 30GB of RAM, and both it
and the SC2000 are limited by DRAM density, not address space.
The SC2000 has a 32bit virtual address space that maps to a 36bit
physical address space. That is how lots of 4GB processes can share
a much larger amount of RAM.
The answers to questions posed here are those of the author, and do not represent the views of Sun Microsystems Inc.