Too many open files on Tomcat
The other day, one of my websites was not available anymore. Looking at the log files, I found the following exception:
Dec 7, 2011 1:22:39 AM org.apache.jk.common.ChannelSocket acceptConnections WARNING: Exception executing accept java.net.SocketException: Too many open files at java.net.PlainSocketImpl.socketAccept(Native Method) at java.net.PlainSocketImpl.accept(PlainSocketImpl.java:384) at java.net.ServerSocket.implAccept(ServerSocket.java:450) at java.net.ServerSocket.accept(ServerSocket.java:421) at org.apache.jk.common.ChannelSocket.accept(ChannelSocket.java:307) at org.apache.jk.common.ChannelSocket.acceptConnections(ChannelSocket.java:661) at org.apache.jk.common.ChannelSocket$SocketAcceptor.runIt(ChannelSocket.java:872) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.java:690) at java.lang.Thread.run(Thread.java:595)
This was the first time I was getting this exception. What is even stranger is that I didn’t change anything on this application for quite a while!
Anyway, I first did what I usually do with Tomcat: restart it! This fixed the issue but only for a few hours before it crashed again.
After some investigation, it seems that Tomcat was reaching the limit of open file descriptors allowed in this machine (1024 in my case).
To get the maximum number of open file descriptors, simply type the following command:
ulimit -n
It is possible to increase this value by editing the file /etc/security/limits.conf and adding the new limit for the user running Tomcat. However, this is not recommended as 1024 should be sufficient.
The second thing I did was to check the list of open files used by the Tomcat process:
lsof -p
What I found by running this command was a bit odd. It seems that Tomcat was having a multitude of opened connections to one of the web services used by the application. So it looks like the connections between my website and the web service were never closed! ![]()
Because I didn’t change the code on my side, I asked the third party who owns the web service to check their code. I don’t know what was the root cause of the problem but they fixed it on their side and it is now working fine.
In conclusion, if you get the same exception, try to find where the problem is coming from before increasing the maximum number of open file descriptors.
Samba access problem with Mac OS X 10.6+
This is a problem I encountered when I upgraded Mac OS X from the version 10.5 (Leopard) to 10.6 (Snow Leopard). One of my friend also got a similar problem when she upgraded to the version 10.7 (Lion).
This issue was affecting the access to the network shares set up with Samba (version 3.0.24) on my D-Link DNS-323. For some reason, I wasn’t able to authenticate on the shares as soon as I upgraded to Snow Leopard!
Here is the error message I was getting:

After browsing a few forums on the web, I finally found a solution. ![]()
I simply had to change the security mode in the Samba configuration file (smb.conf) to read:
security = USER
Note that this property can be found under the [global] section.
For more information about the Samba security mode, please read the following article by Jack Wallen:
Understanding Samba security modes
S3 command failed if the time is not synced
This is already the second post about the s3sync ruby program. The first article was focused on monitoring s3sync with Zabbix.
I will talk on this one about an error I got when running the S3 synchronisation:
S3 command failed: list_bucket prefix /data max-keys 200 delimiter / With result 403 Forbidden S3 ERROR: # s3sync.rb:290:in `+': can't convert nil into Array (TypeError) from s3sync.rb:290:in `s3TreeRecurse' from s3sync.rb:346:in `main' from ./thread_generator.rb:79:in `call' from ./thread_generator.rb:79:in `initialize' from ./thread_generator.rb:76:in `new' from ./thread_generator.rb:76:in `initialize' from s3sync.rb:267:in `new' from s3sync.rb:267:in `main' from s3sync.rb:735
As you can see, this error is not very human-friendly!
The only thing we know is that the S3 command failed because of the error can't convert nil into Array. It looks to me like an internal error within s3sync…
But after some investigation, it appears it is simply because the system date on the server is not correct. I cannot tell you how much time I spent on this one!
Anyway, if you are doing automatic backups as describe on John Eberly’s blog, you need to add the following code at the top of your upload.sh script:
# update the system date /usr/sbin/ntpdate 3.uk.pool.ntp.org 2.uk.pool.ntp.org 1.uk.pool.ntp.org 0.uk.pool.ntp.org
NB: please find below the command lines I use to install ntpdate on a Debian server:
apt-get install ntpdate dpkg-reconfigure tzdata
Monitor s3sync with Zabbix
s3sync is a ruby program that easily transfers directories between a local directory and an S3 bucket:prefix. It behaves somewhat, but not precisely, like the rsync program.
I am using this tool to automatically backup the important data from Debian servers to Amazon S3. I am not going to explain here how to install s3sync as it is not the purpose of this article. However, you can read this very useful article from John Eberly’s blog: How I automated my backups to Amazon S3 using s3sync.
If you followed the steps from John Eberly’s post, you should have an upload.sh script and a crontab job which executes this script periodically.
From this point, here is what you need to do to monitor the success of the synchronisation with Zabbix:
- Add the following code at the end of your
upload.shscript:# print the exit code RETVAL=$? [ $RETVAL -eq 0 ] && echo "Synchronization succeed" [ $RETVAL -ne 0 ] && echo "Synchronization failed"
- Log the output of the cron script as follow:
30 2 * * sun /path/to/upload.sh > /var/log/s3sync.log 2>&1
- On Zabbix, create a new item which will check the existence of the sentence “Synchronization failed” in the file
/var/log/s3sync.log:

Item key:vfs.file.regmatch[/var/log/s3sync.log,Synchronization failed]
- Still on Zabbix, define a new trigger for the previously created item:

Trigger expression:{Template_AmazonCloud_Debian:vfs.file.regmatch[/var/log/s3sync.log,Synchronization failed].last(0)}=1
With these few steps, you should now receive Zabbix alerts when a backup on S3 fails.
Refresh GeoIP automatically
GeoIP is a very useful tool provided by MaxMind. It can determine which country, region, city, postal code, and area code the visitor is coming from in real-time. For more information, visit MaxMind website.
This tool is also coming with an Apache module allowing to redirect users depending on their location. For example, we could redirect all users from France to the French home page of a multi-language website, or we could block the traffic to users from a specific country.
To install this module on a Debian server, you simply need to run the following command:
apt-get install libapache2-mod-geoip
But, how does this module work? How does it know where the user comes from? ![]()
It is actually quite simple: GeoIP is using a mapping file of IP address by country. On Debian, this file is stored in the folder /usr/share/GeoIP and is named GeoIP.dat.
However, the IP addresses are something which change all the time. So this file will get out-of-date very quickly. This is why MaxMind provides an updated file at the beginning of each month for free. To read the installation instructions, please click the following link: http://www.maxmind.com/app/installation
This is good and well, but who will remember or even have the time to update this file every month? And imagine if you have to do this on hundreds of servers?
The solution is to use a shell script which will download, extract and install the updated GeoIP file automatically once a month:
#!/bin/sh # Go in the GeoIP folder cd /usr/share/GeoIP # Remove the previous GeoIP file (if present) rm GeoIP.dat.gz # Download the new GeoIP file wget http://geolite.maxmind.com/download/geoip/database/GeoLiteCountry/GeoIP.dat.gz # Remove the previous GeoIP backup file rm GeoIP.dat.bak # Backup the existing GeoIP file mv GeoIP.dat GeoIP.dat.bak # Extract the new GeoIP file gunzip GeoIP.dat.gz # Change the permission of the GeoIP file chmod 644 GeoIP.dat # Reload Apache service apache2 reload
You can place this file in your root folder and set up the following crontab job:
0 0 3 * * /root/update_geoip.sh
This will execute the script automatically on the third day of every month.
Convert SQL Server to MySQL
Some time ago, I had to convert a Microsoft SQL Server database to a MySQL database. The main reason was the cost of the SQL Server license for such a small database.
Looking on the web, I found the following open source application:
| Name: | mssql2mysql |
| URL: | http://sourceforge.net/projects/mssql2mysql/ |
| Description: | mssql2mysql is a python script used to create a SQL dump from a Microsoft SQL server that is ready to use with MySQL. Supports Schema and data dumping, including Primary Keys for each table, allows to dump all data or just a small portion of it. |
To be honest, this tool wasn’t working as expected but it was a very good start!
Below is the list of changes I’ve made:
- Add support for the
booldata type; - Add support for the
datetimedata type; - Only convert tables and views of the ‘dbo’ owner;
- Add the test
columnas[6]==Trueon the primary key; - Add support for the
uniqueidentifierdata type; - Add support for the
tinyintdata type (SQL Server istinyint unsignedby default, but not in MySQL!); - Add support for the default column values;
- Add support for the
bitdata type (usetinyintinstead ofbit, go to this page for more information).
To download the amended script, please click on the following link: mssql2mysql.tar.bz2
This script worked perfectly fine for me. However, please note that my interest was focused on converting the database structure but not the content!
Make image backgrounds transparent with tolerance
In one of the projects I am working on at the moment, I needed to convert the background colour of an image to be transparent so the image looks better on a non-white background.
Looking on the Web, I found the following article from Dustin Marx:
Making White Image Backgrounds Transparent with Java 2D/Groovy
If the link is broken, please download his code from the following link: ImageTransparency.java
The method which makes the background colour transparent is called makeColorTransparent. This method works pretty well, except in some cases as shown in the example below:
|
|
| Original image |
Converted image using Dustin’s method |
This is actually quite normal. Indeed, his code is converting a specific colour (#FFFFFF in our case) to be transparent. But what if the background is not homogeneous?
This is the reason why I had to modify his method to add a new parameter called tolerance:
private Image makeColorTransparent(final BufferedImage im, final Color color, int tolerance) {
int temp = 0;
if (tolerance < 0 || tolerance > 100) {
System.err.println("The tolerance is a percentage, so the value has to be between 0 and 100.");
temp = 0;
} else {
temp = tolerance * (0xFF000000 | 0xFF000000) /100;
}
final int toleranceRGB = Math.abs(temp);
final ImageFilter filter = new RGBImageFilter() {
// The color we are looking for (white)... Alpha bits are set to opaque
public int markerRGB = color.getRGB() | 0xFF000000;
public final int filterRGB(final int x, final int y, final int rgb) {
if ((rgb | 0xFF000000) == markerRGB) {
// Mark the alpha bits as zero - transparent
return 0x00FFFFFF & rgb;
} else {
// Nothing to do
return rgb;
}
}
};
final ImageProducer ip = new FilteredImageSource(im.getSource(), filter);
return Toolkit.getDefaultToolkit().createImage(ip);
}
Such as Photoshop, the tolerance is a percentage value between 0 and 100. The higher the tolerance is, the bigger the range of colours will be.
Let’s take our previous example and apply a 50% tolerance:

That looks much better, isn’t it?
AVG function returns 0.9999
This one is a very odd bug!
For some reason, the AVG function from MySQL is always returning the value 0.9999 instead of the average value. This has been experienced on MySQL 5.5.12 hosted on Amazon RDS (Relational Database Service).
However, the exact same query executed on MySQL 5.1.28 is returning the right values.
Why is that? Is it a bug in MySQL 5.5.12?
I did a search on internet and I couldn’t find anything about it. So to be honest, I am not sure what is this bug or even if MySQL is aware of it.
Anyway, if you encounter the same problem, you can simply replace the AVG function by the combination SUM/COUNT.
For example, the following query:
SELECT student_name, AVG(test_score) FROM student GROUP BY student_name;
can be replaced by the one below:
SELECT student_name, SUM(test_score)/COUNT(test_score) FROM student GROUP BY student_name;
Differences between Amazon instances
It has now been more than a year I am using Amazon Cloud for websites hosting. I have to admit that it works pretty well and I am quite happy about their services.
However, I got a strange problem a few days ago.
I was deploying an application on two different large instances of Amazon Cloud, one would be the UAT (User Acceptance Testing) server and the other one would be the Production server.
Strangely enough, the version running on the Production server was running slower than on the UAT server!
Why?
At first, I thought I missed something with the server configurations. But no, everything was absolutely identical! So where does this difference of speed come from?
After some more investigation, I had the brilliant idea to execute the following command:
cat /proc/cpuinfo
On the Production server, this command was returning the following:
processor : 0 vendor_id : AuthenticAMD cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 2218 HE stepping : 3 cpu MHz : 2599.998 cache size : 1024 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips : 5202.40 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc processor : 1 vendor_id : AuthenticAMD cpu family : 15 model : 65 model name : Dual-Core AMD Opteron(tm) Processor 2218 HE stepping : 3 cpu MHz : 2599.998 cache size : 1024 KB physical id : 1 siblings : 1 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 1 wp : yes flags : fpu tsc msr pae mce cx8 apic mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dnowext 3dnow up pni cx16 lahf_lm cmp_legacy svm extapic cr8_legacy bogomips : 5202.40 TLB size : 1024 4K pages clflush size : 64 cache_alignment : 64 address sizes : 40 bits physical, 48 bits virtual power management: ts fid vid ttp tm stc
And on the UAT server, it was returning the following:
processor : 0 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz stepping : 10 cpu MHz : 2666.762 cache size : 6144 KB physical id : 0 siblings : 1 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc pni ssse3 cx16 lahf_lm bogomips : 5337.55 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management: processor : 1 vendor_id : GenuineIntel cpu family : 6 model : 23 model name : Intel(R) Xeon(R) CPU E5430 @ 2.66GHz stepping : 10 cpu MHz : 2666.762 cache size : 6144 KB physical id : 1 siblings : 1 core id : 0 cpu cores : 1 fpu : yes fpu_exception : yes cpuid level : 13 wp : yes flags : fpu de tsc msr pae cx8 apic sep cmov pat clflush acpi mmx fxsr sse sse2 ss ht syscall nx lm constant_tsc up pni ssse3 cx16 lahf_lm bogomips : 5337.55 clflush size : 64 cache_alignment : 64 address sizes : 38 bits physical, 48 bits virtual power management:
As you can see, these two servers actually don’t have the same CPU model name. But the biggest difference is probably the CPU MHz (2666.762 on UAT and 2599.998 on Production) and the cache size (6MB on UAT and only 1MB on Production).
So what does that mean? Two large instances of Amazon Cloud actually don’t have the same power?
The answer of this question is actually on the Amazon instance types description (http://aws.amazon.com/ec2/instance-types/) under the ‘Measuring Compute Resources’ chapter:
Amazon EC2 uses a variety of measures to provide each instance with a consistent and predictable amount of CPU capacity. In order to make it easy for developers to compare CPU capacity between different instance types, we have defined an Amazon EC2 Compute Unit. The amount of CPU that is allocated to a particular instance is expressed in terms of these EC2 Compute Units. We use several benchmarks and tests to manage the consistency and predictability of the performance of an EC2 Compute Unit. One EC2 Compute Unit provides the equivalent CPU capacity of a 1.0-1.2 GHz 2007 Opteron or 2007 Xeon processor.
In conclusion, two identical instance types of Amazon Cloud have the same number of EC2 Compute Units but because a EC2 Compute Unit is equivalent to a CPU capacity between 1.0Ghz and 1.2Ghz, the actual speed of the instance will be slightly different!
Mystery solved.