MOO-cows Mailing List Archive

[Prev][Next][Index][Thread]

[janus@cam.org: [SERVER] [BUG] Waiting for network I/O?]



------- Start of forwarded message -------
Return-Path: nop@ccs.neu.edu
X-Sender: janus@198.168.100.7
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Date: Sun, 9 Mar 1997 14:13:37 PST
To: moo-cows@parc.xerox.com
From: Richard Godard <janus@cam.org>
Subject: [SERVER] [BUG] Waiting for network I/O?
Sender: MOO-Cows-Errors@parc.xerox.com
Precedence: bulk
Resent-From: clue-cows <nop@nop.com>
Errors-To: clue-cows <nop@nop.com>

Hi,

A few days ago a MOO went in a bad/strange state (server is 1.8.0p5 running
a Sparc Ultra with the latest version of the OS/patches/... it has a bunch
of stuff in extensions.c but nothing that has to do with networking; no
server patches)
The MOO was up & running but it was not possible to login nor use the web
interface, looking at the log:

[... stuff deleted ...]
Mar  3 17:32:28: DISCONNECTED: #-7509 on port 7777 from 128.18.20.14, port 1995
Mar  3 17:32:28: *** Waiting for network I/O: Bad file number
Mar  3 17:32:28: *** Waiting for network I/O: Bad file number
Mar  3 17:32:28: *** Waiting for network I/O: Bad file number
Mar  3 17:32:28: *** Waiting for network I/O: Bad file number
Mar  3 17:32:28: *** Waiting for network I/O: Bad file number
Mar  3 17:32:28: *** Waiting for network I/O: Bad file number
[...and so on...]

I don't have a clue what the *** Waiting for network I/O: Bad file number
is and to what it is related.

Anyway the server log file was 500 Mb big and filling all the partition, I
had to delete it. Last stats from top before I killed the moo process:

last pid:  2360;  load averages:  0.05,  0.05,  0.05
22:26:02
41 processes:  39 sleeping, 1 running, 1 on cpu
Cpu states:  0.0% idle,  0.0% user,  1.4% kernel, 98.6% iowait,  0.0% swap
Memory: 30M real, 536K free, 50M swap, 76M free swap

  PID USERNAME PRI NICE  SIZE   RES STATE   TIME   WCPU    CPU COMMAND
  299 moo       33    0   76M   25M run   218:27  1.62%  1.28% moo_180p5_FUP1

I restarted the db from the last sucessful checkpoint and it seems to works
like a charm.

When the problem happened the server was up & running since 1 month and a
half. It's now 5 days and a half the MOO has been restarted and no problem
occured.

Has anyone any suggestions about what it could be?

Thanks in advance.

Richard
------- End of forwarded message -------


Home | Subject Index | Thread Index