Gmail Corrupting Attachments

I recently received a report that attachments sent to Gmail from some servers were being corrupted. At first, I assumed that the reporter was mistaken, or that perhaps the problem was with the sender’s mail client or server. One of my colleagues had already conducted some tests of his own and found that PDFs and TIFFs he tested with were indeed being corrupted. I had to investigate. Some quick tests proved that the reporter and my colleague were correct.

Below is detailed information about the tests I conducted and my findings.

The Tests

The Servers

For my tests, there are three groups of servers involved: my personal mail server (we’ll call this the “PWB server”), my employer’s mail servers (the “LT servers”) and Google’s mail servers (the “Gmail servers”). The PWB server’s MTA is Postfix. The LT servers include Postfix relays and Kerio Connect mail servers. Mail sent out from the LT servers is first handled by Kerio Connect, then relayed to the outside world by Postfix.

The Attachment

I decided to limit my test to a single attachment – a TIFF file I picked out of convenience. This file is named eyes_color.tif and it is 196926 bytes in size.

The Emails

I conducted several tests, but limited this analysis to a representative batch:

  • Test 1 – An email sent from LT to PWB. The attachment arrived in-tact. The result, as saved from PWB, is in the file test-good-lt_2_pwb.mbox.
  • Test 2 – An email sent from PWB to Gmail. The attachment arrived in-tact. The result, as saved from Gmail’s web interface, is in the file test-good-pwb_2_gmail.mbox.
  • Test 3 – An email sent from LT to Gmail. The attachment was corrupted. The result, as saved from Gmail’s web interface, is in the file test-bad-lt_2_gmail-1.mbox.
  • Test 4 – Another email sent from LT to Gmail. The attachment was corrupted, but not in the same way as Test 3. The result, as saved from Gmail’s web interface, is in the file test-bad-lt_2_gmail-2.mbox.

The Results

From all four tests, I extracted the base64-encoded attachment. The results from Test 1 and Test 2 matched, and decoding those gave back the original TIFF. The SHA1 hashes verified this. This correct base64 content is in good.base64. Both Test 3 and Test 4 included corrupted bytes – just a few each. The extracted base64 content was the correct size, but each had a few bytes replaced with non-ASCII characters. The corruption was different between the two and seemingly random. Test 3’s extracted base64 content is in bad-1.base64, while Test 4’s is in bad-2.base64. Running diffs between Test 3’s attachment and the correct base64 content and between Test 4’s attachment and the correct base64 content yielded the bad-1_v_good.patch and bad-2_v_good.patch, respectively.

When viewed in Gmail’s web interface, the attachments from Test 3 and Test 4 fail to show previews and, when downloaded, are not viewable in an image viewer. The attachments downloaded from the web interface do not match the original file sent in those emails, verified by SHA1 hashes.

The Findings

These tests are representative of all of the tests I conducted. Gmail seems to corrupt attachments sent from some, but not all, servers. I do not know why, and I do not see a pattern in how the attachments are corrupted. When the same sending servers deliver mail to other servers, the attachments arrive in perfect condition. The corruption seems to be conditional upon the sending server, as I get consistent results with repeated tests from any given account. I have only used a few accounts on each server to conduct my tests, so it may be conditional upon the sending mail account, but this seems less probable.

In each corrupted attachment, a different handful of seemingly random bytes have been replaced with non-ASCII characters. I know that this corruption affects multiple file types, but I do not know if all file types are affected.

Update 1

With assistance from folks at Google, I have identified a probable source of the corruption in the network path between the affected sending servers and the Gmail servers. I do not yet know why other receiving servers are unaffected, but it may be a difference in error detection and correction behavior (like TCP checksum behavior) or a performance difference that affects the chances of corruption. If the affected network provider gets their problem fixed, I will conduct further testing.

How Not to Email Your Customers

There are many different ways that organizations can manage customer lists and deliver email to all of their customers at once. Some mailers will generate a unique email to each customer, possibly replacing fields in a form letter, while others will basically use the “BCC” field to send a one email to many recipients. An important characteristic of these methods is that the recipients will not be able to see each other’s email addresses.

Today, Cardstore.com sent out an email to customers without using either of the above methods. This email contained thousands of Cardstore.com’s customer email addresses in the email’s “To:” field, meaning that every recipient of the email could see every other email address that the message was sent to.

Database breaches have become extremely common. LinkedIn and Last.fm are both recent examples of popular websites to suffer database breaches that exposed customer details. These breaches have been the result of hackers, but Cardstore.com cut out the middle-man and just sent out their customer list themselves. This kind of breach is unacceptable and should never have been allowed to happen. Great care should always be taken in the handling of customer information, and checks should be in place to make sure that errors like this are avoided.

Something Amazing Just Happened

Something amazing happened on Wednesday. It’s doubtful you missed it, but you might not have recognized just how amazing it really was. The unprecedented Internet blackout showed us something incredible:

Major websites demonstrated the ability to quickly sway political dialogue.

It has been easy to see for many years that big media can influence political dialogue. Media slant, sometimes the result of unintentional bias and sometimes the result of direct influential efforts, has had an impact on many political discussions and legislative proceedings over the years. The impact of lobbyists employed by big media companies is even easier to identify.

On Wednesday, something happened that has never happened before: a coordinated effort by several of the most heavily-trafficked websites changed the course of political dialogue. Following Wednesday’s blackout protests, SOPA has been pulled from consideration and PIPA is effectively dead in the water. Politicians were quick to react when overwhelmed by feedback from their constituents. This is a strong message that shows just how much power new media now wields. It took a unique circumstance to bring these major Internet players together behind a common cause, but now we know that it can happen.

As much as I would like to see the general public exercise greater everyday vigilance toward lawmaking, I do not think we will see much change there as a result of this event. What we might see is a stronger and more unified political voice from new media companies interested in protecting the greatest tool for information exchange ever created. I certainly hope that this will at least remind lawmakers that they should seek more input from major Internet organizations when crafting laws affecting the Internet. I do not doubt this event will impact legislative discourse in the future.

It may prove difficult to trace the effects, but the world did change on the 18th of January, 2012, and I suspect for the better.

On SOPA, PIPA and the Internet Blackouts

Today, many websites are participating in a blackout inspired by the Stop Online Piracy Act (“SOPA”, H.R.3261) and PROTECT IP Act (“PIPA”, S.968) bills currently being considered by Congress. These two bills, which are very similar to one another, are intended to extend copyright protections and enable better defenses against copyright infringement by international websites.

These bills have caused significant uproar among Internet companies and technologists, as they raise a number of concerns with regard to Internet freedom, censorship and security. Major Internet players that have announced their opposition to these bills include AOL, eBay, Facebook, Google, imgur, LinkedIn, Mozilla, Reddit, Twitter, Wikipedia, Yahoo! and Zynga. Technology experts have also expressed concerns (PDF) over potential problems with the implementation of the bills’ measures.

SOPA and PIPA are intended to give copyright holders the tools they need to bring down and block access to websites hosting copyright-infringing materials. While it is easy to see how blocking copyright-infringing websites would be desirable, concerns include that these tools may be too broad, that they may be abused, and that the burden on websites to avoid infringing or linking to infringers could be too great.

Today’s blackout is intended to raise awareness about these two bills. Visitors to Google will see a large black box over the Google logo, accompanied by a link to information placed on the homepage. Wikipedia has blocked access to most of its English-language pages. Reddit, imgur and others have completely gone dark, replacing their homepages with messages about these bills. Today’s blackout is unprecedented in the history of the Internet.

Unanswered Questions

Could Google be expected to find and remove all of the infringing websites among the over one trillion URLs it has indexed, and to continue monitoring each and every one of them for new infringement? Many infringers are going to try hard to avoid scrutiny, and there is a very large grey area where it might be hard to decide what constitutes infringement. What happens when they accidentally identify false positives? When they miss some infringement? These are the concerns that search engines face.

What about sites like Facebook, Twitter and Wikipedia, which depend upon user-submitted content? They cannot possibly filter every single URL that passes through, and they could be deluged with takedown demands if they do not. The free flow of information and ideas that normally takes place on social networks would be stifled, and communities like Wikipedia that are driven by user contributions could be overburdened by the administrative demands.

However legitimate websites might be forced to filter their content, we are facing a form of censorship never before seen on the Internet. In spite of whatever good intentions might be behind SOPA and PIPA, the burden they place on legitimate websites and the threat they present to the freedom of the Internet cannot be ignored.

The Danger to Small Businesses and Individuals

Small businesses and individuals running websites would be most vulnerable to unintended consequences of SOPA and PIPA. Even small websites, blogs and forums could be forced to censor content or face being shut down.

Funding for small websites that host user-generated content, whether in the form of comments and discussions, videos, articles or anything else, would be harder to come by when those websites could be held liable for infringing content. Operators of such websites would also face an increased risk of lawsuits, justified or not, which could prove too expensive to fight.

Other tools used by small businesses could also be endangered. Mailing lists, code repositories, VPNs and more could pose liability concerns.

The Need

Copyright infringement is an expensive problem for American businesses. Content producers and publishers lose a great deal of money to piracy every year. Estimates on the actual losses vary greatly, but “many billions” is a good guess. Many websites that host the infringing material are outside of the United States, often in places that do not offer strong protection for intellectual property. This presents a challenge for American businesses, as it can be impossible to sue the infringers or their hosting providers, and the Internet as it is does not offer a way to shut down these sites.

SOPA and PIPA are intended to answer this need by giving copyright holders a way to cut infringing sites off from the traffic that sustains them. Most of the opponents of SOPA and PIPA recognize that defending intellectual property is important – they often depend upon it themselves. The objection is over how these bills propose to cut infringing sites off.

Google and other opponents of these bills do have an alternative in mind: the OPEN Act. The OPEN Act aims to cut off infringing sites by stopping the flow of money to them.

Status

SOPA has been temporarily halted in the House, with discussion expected to resume next month. PIPA is expected to go before the Senate on the 24th of January.

More Information

Electronic Frontier Foundation articles:

Protest Letters:

Fedora 15 with stock kernel on Rackspace Cloud

I recently found myself needing a more bleeding-edge cloud server than the Fedora 14 servers I have been running on Rackspace Cloud. Rackspace is not yet offering a Fedora 15 image for new servers, so I needed to start with a Fedora 14 system and upgrade it. I also needed the kernel to be more current than the 2.6.34.1 kernel Rackspace currently uses with Fedora images, and I am not sure the upgraded userspace would work with that kernel and init image pair anyway. This meant I needed to use PV-GRUB to use the stock Fedora kernel. What follows is a description of the process I used to get Fedora 15 running with the stock Fedora kernel on Rackspace Cloud.

Rackspace Cloud, like Amazon AWS and many other VPS providers, uses the Xen hypervisor. Under a typical configuration, custom kernel and init images are used for each VPS, rather than images stored within the VPS. The kernel used with Rackspace’s Fedora 14 image is a custom kernel built on Ubuntu. Thankfully, Rackspace does allow operators to use PV-GRUB to load other kernels.

If you are considering trying this process on a production system, seek therapy. Start with a new server and migrate your services once you’re done. If this doesn’t go smoothly, you could be left with a server that will not boot and no recourse but to restore a backup.

Quick Overview

The general process goes something like this:

  • Set up a new server with Rackspace’s Fedora 14 image
  • Configure the system to run as a Xen domU loaded by PV-GRUB
  • Install the Fedora kernel
  • Contact Rackspace to enable PV-GRUB
  • Upgrade the system to Fedora 15

Before the Kernel

The fist step is to set up a server loaded from Rackspace’s Fedora 14 image to run as a Xen domU loaded by PV-GRUB.

cat >> /etc/modprobe.d/domu.conf << EOF
alias eth0 xennet
alias eth1 xennet
alias scsi_hostadapter xenblk
EOF
sed -i 's/sda/xvda/g' /etc/fstab
echo "hvc0" >> /etc/securetty

Install the Kernel

The next step is to install the Fedora stock kernel.

yum install kernel

Take note of the kernel version. You’ll need it in the next step. You can get the exact version string you need with this command:

rpm -q kernel --qf '%{version}-%{release}.%{arch}\n'

Time to create the configuration file required by PV-GRUB. We’ll create this as grub.conf and create a symlink at menu.lst, which is how the grub configuration is normally created on Fedora. In the following, replace “$KERNELVERSION” with the correct kernel version string.

mkdir -p /boot/grub
cat >> /boot/grub/grub.conf << EOF
#boot=/boot/grub/stage1
default=0
timeout=1

title Fedora ($KERNELVERSION)
        root (hd0)
        kernel /boot/vmlinuz-$KERNELVERSION ro console=hvc0 root=/dev/xvda1 SYSFONT=latarcyrheb-sun16 LANG=en_US.UTF-8 KEYTABLE=us
        initrd /boot/initramfs-$KERNELVERSION.img
EOF
ln -s ./grub.conf /boot/grub/menu.lst

Since we're going to use the stock Fedora kernel, we can set things up for yum to handle kernel updates and have the grub.conf file updated automatically. The commented-out boot directive in our grub.conf is part of this configuration. To finish this setup, we'll need to install grub and grubby and create a few more configuration items.

yum install grub grubby
cp /usr/share/grub/x86_64-redhat/stage1 /boot/grub/stage1
ln -s ../boot/grub/grub.conf /etc/grub.conf
cat >> /etc/sysconfig/grub << EOF
boot=/boot/grub/stage1
forcelba=0
EOF
cat >> /etc/sysconfig/kernel << EOF
# UPDATEDEFAULT specifies if new-kernel-pkg should make
# new kernels the default
UPDATEDEFAULT=yes

# DEFAULTKERNEL specifies the default kernel package type
DEFAULTKERNEL=kernel
EOF

Test that grubby detects grub.

grubby --bootloader-probe

Enable PV-GRUB

At this point, it's time to contact Rackspace Support to enable PV-GRUB. They will enable PV-GRUB and reboot your server. With a little luck, you server will start up with your stock kernel. Reconnect to your server and verify.

uname -a

At this point, you should consider creating an on-demand backup.

Upgrade to Fedora 15

Now that you are running on the stock Fedora 14 kernel, you are ready to upgrade your server to Fedora 15 (and its kernel).

The general Fedora guidance is here:

https://fedoraproject.org/wiki/Upgrading_Fedora_using_yum

Because we're dealing with a cloud server with PV-GRUB, there will be a few differences. You won't be switching to a text console or changing runlevels. You will not want to install the other Base packages. You cannot write a new MBR with grub-install. Really, our process is much simpler. To be safe, we'll use screen to launch our upgrade to mitigate the risk of a disconnect. (I recommend always using screen when using yum or doing anything else critical over SSH.)

yum install screen
screen -h 10000 -S yum

This screen command will increase the default history size and name the session for easy access. If you become disconnected during the upgrade, connect to the server again and run the following command to attach to your screen session:

screen -rx yum

Time to perform the upgrade.

rpm --import https://fedoraproject.org/static/069C8460.txt
yum update yum
yum clean all
yum --releasever=15 --disableplugin=presto distro-sync
cd /etc/rc.d/init.d; for f in *; do /sbin/chkconfig $f resetpriorities; done
ln -sf /lib/systemd/system/multi-user.target /etc/systemd/system/default.target

Make sure that the new Fedora 15 kernel is configured and set as default in /boot/grub/grub.conf.

Profit

Reboot. A little more luck and your server will come back up running the new Fedora 15 kernel and userspace. You may want to create an on-demand backup at this point. You can reuse this image to create more Fedora 15 servers, but do not forget to contact Rackspace Support to enable PV-GRUB for each instance. If you created an on-demand backup after getting Fedora 14 running with PV-GRUB, you can now delete that image.

Have I missed anything? Let me know.