Trouble Shooting FAQ

Send DMail Support the right things FIRST time!

  1. The server is dying (core dump or DrWatson), what should I do ?
  2. What to Send DMail Support?
  3. I tried to upgrade but it did not work . . .
  4. What does this log (error) message mean?
  5. I am having a problem with the users ...
  6. I got a bounce message from DSMTP ...
  7. What does this DPOP error message mean?
  8. 'Database Down' or 'Out of Sync' message with External User Database ...
  9. On Windows, DMAdmin just shows lines like 'Lost connection to DSMTP (Select failed () Connection Refused)'...
  10. What does the following System Administrator message mean?
  11. A message about messages looping...
  12. We are having problems with 'try again later' or 'too many simultaneous connections' messages in DSMTP ...





  1. The server is dying (core dump or DrWatson), what should I do?

    (or, What should I send to DMail Support?)

    If one of the DMail servers (DPOP, DSMTP or DList) is dying, it will be evident in a number of ways,

    • users cannot connect to the server
    • the dwatch resurrector may be emailing you as the system administrator.
    • you may get 'core' files appearing in the server directories, e.g. /usr/local/dmail (UNIX platforms only)
    • you may get a DrWatson Window popping up

    Note: dwatch is supposed to restart the servers when this happens, by default it only does this 5 times and then gives up watching that server.

    When one of the servers is dying, we at DMail support will of course want to know about it because it means that there is a serious bug in our software.

    See the next faq for suggestions on what to send DMail Support...

  2. What to Send DMail Support:

    So here is a list of the things that it might be appropriate to send us. But please don't send us a huge email with lots of large attachments, just pick the best information that you have. Sending us your config file and a log or back trace is usually sufficient. Don't forget to tell us your platform and the version you are using.

    • Your dmail.conf file - almost always send us this
    • The log file (on debug log level if possible, and maybe with log_data true)
    • A 'ded' file from the dwatch directory
    • A DrWatson log file, e.g. \winnt\drwtsn32.log
    • A back trace from a core dump (don't send a core file)
    • A trace.log file from the dwatch_path directory (check that the date is valid)

    And email those to dmail-support@netwinsite.com

    Here are some pointers on gathering the above information.

    • Set your logging level to debug as soon as you are supicious that something is going wrong. In order to do this, edit your dmail.conf file,
      /etc/dmail.conf or \winnt\system32\dmail.conf
      so that the setting log_level looks like this:
      log_level debug
      save the file and then reload both DPOP and DSMTP
    • If the bug is something to do with the TCPIP connections on DSMTP, you may want to set
      log_data true
      so that DSMTP logs all TCPIP connections.
      (or 'log_data some' on 2.7 and above versions so that your log does not end up filled with attachment information)
    • Send us the relevant log file, e.g. dsmtp.log which you will find in the log_path directory.
    • Send us the relevant 'ded' file, e.g. d_1dsmtp.ded from the dwatch_path directory. These are the log files as copied by dwatch when it noticed that the server had died. If a server has crashed a number of times then a couple of these are useful to see if the last thing in the log is the same each time - i.e. they can answer the question, is the server dying on the same thing each time?
    • The most useful thing is a back trace. This shows us which function within our program was being run when it died.

      Getting a back trace on NT:
      DrWatson will create a back trace and put it in the file, \winnt\drwtsn32.log, if it notices any program dying. NB: It may ask you whether you want a log to be created, which you should make it do.
      DrWatson should be on by default, but you can turn it on in the DMAdmin utility. Click on, Config Dwatch, then select any server and click on the 'Set DrWatson as debugger' button in the pop up window.
      NB: If the drwatson pop up box comes up and waits for you to click OK, then dwatch will not notice that the server has died and so will not restart it until you click on the OK button.
      So click 'don't popup window when any program dies', then DrWatson will be set so that it automatically creates the log file for you and then closes the dying program. This allows DWatch to restart the server, but you still get the log.

      Getting a back trace on UNIX based platforms

      Hopefully if one of our programs dies you will find a file named 'core' (or core.program on some platforms) in the same directory that the program is running in. So look in the following default server path directories,
      /usr/local/dmail for DSMTP and DPOP
      /usr/local/dmail/dlist for DList
      /usr/local/dmail/dwatch for dwatch itself

      PLEASE do not send us the core file. Valid information can only be read from it by analysing it on the machine on which it was created.

      So, in order to analyse the core file and get the back trace, here are a couple of common examples:

      Most Boxes (usng DSMTP as an example):
      1. cd to the program directory,
      cd /usr/local/dmail
      2. run gdb with arg1 being the process and arg2 being the name of the core file,
      gdb /usr/local/dmail/dsmtp /usr/local/dmail/core
      3. now that gdb should be running enter,
      bt
      this should display a back trace. Send us a cut and paste of the whole gdb session rather than just the back trace bit.
      4. enter quit to close gdb

      On AIX:
      Same as above but use, 'dbx' instead of gdb. You can also use the '-a pid' option to attach to a running process.

      On Solaris:
      Same as above. Most customers seem to be able to install 'dbx' pretty easily but it is also quite common to have, 'adb' which has a '-c' option that may be the one to use.

       

      On some platforms we had forgotten the compile flag, -g, in versions before 2.8. So the back traces will be useless, e.g. a message like, 'no symbols found' will appear.

      Sometimes it is useful to send us a truss of the program, as you can run this while the program is still running, (truss -p pid). Note that this only shows us the system calls (like disk access) that the server makes (as far as we know - someone tell us if we are missing something :-) ). So it is not as good as a back trace.

       

    • Send us a trace.log file if the death is in DSMTP. This is a very basic back trace which DSMTP generates when it dies, but it is not nearly as good as a real back trace. DSMTP puts this file in the dwatch_path directory, usually, /usr/local/dmail/dwatch or \dmail\dwatch. Note: you should delete the trace.log file as soon as you have copied somewhere else, as DSMTP will not always overwrite it if the death happens again.
    • Lastly, check dates on files and look inside them to see that they contain information from the time of the crash


  3. I tried to upgrade but it did not work . . .

    Normally, if something does not upgrade correctly then it means that the installation utility, dmsetup, was not able to stop that part of the server in order to copy over it with the new version.

    So, in order to do the upgrade, you must stop that server or program and then manually copy the new executable over the old one - make sure you find the correct old executable to overwrite!

    A few notes that might help:

    1. On NT, remember to exit from DMAdmin before you do the upgrade.

    2. On NT, if you want to stop the servers, and DMAdmin is not responding, you must stop the DWatch service that controls them - you can do this from the control panel, 'services' dialog. If this doesn't work then you must disable the DWatch service (in the same dialog) and restart the machine, so that when you restart, the servers are not running. At that point dmsetup should be able to upgrade everything without any problems.



  4. What does this log (error) message mean?

    See Deciphering Log Files.



  5. I am having a problem with the users ...

    The following list is of things to try, given that you are having problems with the user database. It assumes that you are using NWAuth, but most of what is says applies to whichever database you are using.

    I can add users (with NetAuth or whatever) but the servers don't recognise them:

    The most likely problem is that the users are being added in the wrong form, i.e. with the wrong prefix or suffix. You should open up your user database (for NWAuth that is nwauth.txt and/or nwauth.add) with a text editor and see the form of username that has been added there. Then compare that with the username that DSMTP and DPOP are looking up in the appropriate log file - obviously the two will need to match.

    In order to get the log files that you need, edit the dmail.conf setting log_level to read,
    log_level debug
    Then reload the servers (tellsmtp reload and tellpop reload) and then send in a message to that user, or login to DPOP as that user. In the dsmtp.log file (in the log_path directory) you are looking for the line:
    "lookup username ..."
    In the dpop.log file you are looking for:
    "check username ...". It is the username that should match with the username in the user database.

    If they don't match, there are a number of settings in dmail.conf that effect the prefix and suffix of a username in the user database. In dmail.conf these are either vdomain(the prefix parameter) and vdomain_separator OR authent_domain. The NetAuth manual has a 'Mail Server authentication setup' section with all of the possible settings and what you have to set in each product. Note: if you are using DMAdmin to enter the usernames, you have to enter them exactly as you want them to appear in the nwauth.txt file.

    If the usernames do match, either NWAuth is returning a bad response, or DPOP and DSMTP are not running the same NWAuth as you are.

    So run NWAuth from the command line. e.g. Assuming that you have a user called bob, that his password is 'pass' and that your authent_process setting in dmail.conf is c:\dmail\nwauth.exe, enter:
    c:\dmail\nwauth.exe
    lookup bob
    check bob pass
    exit
    The response should be '+OK ...' in each case.

    You can check that DSMTP is running the authentication process that you have just run by entering,
    tellsmtp config authent_process
    It should respond with the value of that setting.

  6. I got a bounce (Delivery Status Notification) message from DSMTP ...

    DSMTP creates a number of messages for sending back to the sender of a message, explaining a delivery problem or notifying of delivery success. These are called, DSNs (Deliver Status Notification) messages, and are generally identified by the fact that the sender of the message is the 'postmaster@your_domain'.

    There is a section of the manual on these, Bounces and DSNs.

    Here is the start of a list explaning some common ones ... (ask us to add to this list if you get a DSN that is not listed)

  7. What does this DPOP error message mean?

    DPOP returns quite a small set of error messages when it does not allow a user to log in. Good email clients pass these messages through to the email client, but note that some do not. Therefore, you should always check the dpop.log file to see the real reason that a user cannot connect to the pop server.

    NB: a number of the DPOP error messages are simply the messages returned by an external authentication module - this should be obvious in the dpop.log file if it is the case. We're happy to edit the error responses of any of our authentication modules if you wish to make suggestions.

    Here is the start of a list explaning some common ones ... (please ask us to add to this list if you get a message that is not listed)

    • Cannot open or lock drop file
      This error message is a general one, meaning that DPOP cannot take control of the user's mail box (drop file). This is normally because of some sort of file access problem. If this happens every so often then just ignore it. If it happens consistanly, or occurs every time that a user tries to retrieve their mail, then you need to examine the dpop.log file (found in the log_path directory) for details. Remember to set log_level debug to get more detailed information in the log file. If the answer is not obvious then you will need to email the log file, along with your dmail.conf file, to DMail Support.

      NB: a common problem in an old version of DPOP was that it did not create the directories leading up to the user's drop file. So try sending a message to the user if you are getting this error, because then you can be assured that DSMTP has created the path for that user.

  8. 'Database Down' or 'Out of Sync' message with External User Database ...

    An error message in the dsmtp.log file such as,
    ...Out of sync reply from external auth (bob) isn't (fred)...
    or similarly in a bounce message or server connection error message,
    ...User database is down

    indicates that DSMTP (or DPOP) thinks that your authentication module is responding, e.g. it looked up,bob and thinks that it received back a response for, fred.

    The most likely reason for this is that your authentication module was delayed in responding to the lookup request, so that DSMTP sees the response to that request when it goes looking for the response to the following request.

    The time that it waits is set by,
    authent_timeout
    which takes a timeout setting in seconds.

    Also the settings,
    tcp_timeout (DSMTP)
    and
    pop_timeout (DPOP)
    set the timeout on TCPIP connections for DSMTP and DPOP respectively.

    You could check that your authent_timeout setting is long enough to allow any normal slow lookups by your authentication module, e.g. if your database regularly goes offline for a few minutes each day. If you are unsure of what to set it at, I would suggest that you set it at 30 seconds .

    You also need to check that your tcp_timeout and pop_timeout settings are larger than your authent_timeout setting. If they are not, the servers can drop the connection before they have finished allowing the authentication module to do the user lookup. This can cause very strange behaviour. We recommend that you leave both tcp_timeout (default 5 mins) and pop_timeout (default 10 minutes) at their default values.

    In version 2.7n (2.7q is the corresponding release version) we did some work on this so that in such a situation, DSMTP can 'get back in sync'.

    Therefore, if you are using an older version you may want to upgrade to at least version 2.7q.

  9. On Windows, DMAdmin just shows lines like 'Lost connection to DSMTP (Select failed () Connection Refused)' ...

    The messages you are seeing in DMAdmin indicate that the administration utility cannot connect to the DSMTP (and/or DPOP) server(s).

    It is important to realise that DMAdmin is just an administration utility that connects to the servers when they are running. It may well be that they are running, but DMAdmin cannot talk to them for some reason.

    You can check whether the servers are really running by entering at a command prompt,
         telnet localhost 110
         quit
    to which the DPOP server should respond if it is going.

    Similarly, entering
         telnet localhost 25
         quit
    checks for DSMTP.

    If the servers are not running, please send DMail Support your configuration file,
         dmail.conf
    (typically c:\winnt\system32\dmail.conf or /etc/dmail.conf)
    and the following log files,
         dsmtp.log
         dpop.log
    from the log_path directory (specified in dmail.conf).

    NB: the most common cause of this is that there is another Mail server running! So please do check that you do not have another SMTP or POP server running. When you do the telnet tests above, the DSMTP and DPOP servers will respond with a line including the word, 'DSMTP' and 'DPOP' respectively so that you can tell that they are the server responding. Other servers will respond with similar lines but of course will not mention the names of our products.

    If some other servers are running then you need to shut them down and re-run the dmsetup installation utility (which will do an upgrade, 2, this time). You will find dmsetup in the dmtemp directory.

    If the servers are running, and they are indeed the DMail servers, then DMAdmin is probably just having trouble connecting to the servers. So send DMail Support the same files as above, but also add the dwatch.log file (from the log_path directory) and you can click on the 'debug output' check box on the dwatch tab in dmadmin and send us the resulting dmadmin.log file (dmadmin will log to screen the name of the log file it is using).

  10. What does the following System Administrator message mean?

    Many System Administrator messages are simply copies of bounce or DSN messages, so in addition to any messages listed below check the FAQ above,
    I got a bounce message from DSMTP ...

    (no system admin messages doc'd at present - email DMail Support if you want one explained)

  11. We are having problems with 'try again later' or 'too many simultaneous connections' messages in DSMTP ...

    > we are having problems with people getting 'try again later' SMTP
    > errors. I know there is a setting somewhere to limit the number of
    > connections, but is there anything I should be watching out for so that I can be sure
    > I'm not just addressing the symptom of a greater problem?

    The setting you want to change is,
         tcp_max
    (default is 200)
    from which you need to take away half of
         max_send
    (default is 10)

    DSMTP will never allow more incoming channels than,
         tcp_max -(max_send/2)
    so by default,
         200 - (10/2) = 195

    You are correct that you need to monitor the use of your channels closely, rather than just raising tcp_max.

    The first thing to do is grep your logs for the error message that corresponds to,
         try again later
    which the users see. I just went to look this up and found that there are various errors with that ending, and all the ones with 'try again later' as opposed to 'try later' seem to be because of authentication module problems. So definitely do check the log files to see the reason for the rejections.

    For the rejection message,

    435 Sorry only %d simultaneous users permited, try later
    the log line is,
    User rejected because too many users already connected
    the other place to check is at the end of a tellpop status, where DSMTP displays counts of various error conditions. The text to look for there is,
    Connection refused: too many simultaneous connections
    and the number beside it is a 'count' of how many times DSMTP has given that response.

    Also, if you give the command,

    tellsmtp showchans
    then DSMTP will list all channels that are currently or have been active. You may want to set up a cron job to get copies of that a few times a day, and/or save the output of that command (pipe to file) when you know it is in a period of rejecting connections.

    You may be able to see problems in the showchans outputs, but feel free to send them to us to be checked. The same goes with the dsmtp.log files.