Pages

Thursday, February 11, 2010

The Technical Side of FTP

by Noah Fiedel, Blogger Engineering Tech Lead

Many of you have posted asking why Google doesn't continue supporting FTP publishing indefinitely or even charge money to those who would like this support. I'd like to give you some insight into our decision making and technology.

The Protocol

FTP is one of the earliest protocols on the Internet. It was drafted in 1971 before security was a concern, as the Internet at the time connected universities and research labs. Unlike nearly all other Internet protocols, it uses two insecure and unencrypted ports simultaneously. This makes securing FTP effectively impossible on both the server and network levels. FTP servers at ISPs are therefore vulnerable to attack, and your password can be 'sniffed' by anyone with access to the traffic to or within your ISP. sFTP, while more secure than FTP, still requires us to store your user credentials — which itself is undesirable from a security perspective.

Compare this to the HTTP protocol, drafted 20 years later in 1991: FTP doesn't have a mechanism to discover whether an FTP server is up, down, slow, or temporarily unavailable. HTTP supports all of these and more, and is now the basis for nearly all activity on the Internet.

Due to FTP's weaknesses, many ISPs restrict access to their FTP servers. They do this by limiting your FTP account to a list of approved Internet addresses (via an IP whitelist), which makes your account less likely to be hijacked. In the next section you will see why this also makes it difficult for Blogger to reliably provide FTP publishing.

Google Datacenters

Google runs many datacenters, and Blogger runs in several of them. Each datacenter has a different Internet address (a.k.a. "IP address") when it connects to your FTP server, so if your hosting service requires an IP whitelist, you would have to list all of the IP addresses associated with each of our active datacenters. We really don't like having a "primary" datacenter for anything, and instead prefer to let our traffic flow to the most efficient and lowest latency datacenter for our users. This makes the IP whitelist problem even worse, as some ISPs only allow a single (or very few) IPs to be whitelisted. Your ISP would need to whitelist all of Blogger's datacenters. Since they change regularly, your ISP's whitelist would have to be updated as well. This leads to a significant amount of user frustration, and regularly results in blogs failing to publish successfully. Diagnosing these issues has taken up a large part of our engineering and support team's time.

FTP Web Hosting Providers

Late last year during scheduled maintenance on a datacenter, we moved FTP publishing to another datacenter and updated our publicly posted IPs for your ISP's whitelists. Even after doing this, there were considerable complaints by users unable to publish via FTP to thousands of ISPs. Many of these ISPs maintain their own IP whitelists, often in an undocumented way. Troubleshooting this is extremely difficult and time consuming for us (and for you), as it's rarely clear where the underlying issue is. Our engineering, product and support teams often ended up directly contacting ISPs, waiting on hold, frequently without resolution. All to support a single user's report of "can't publish via FTP". In many cases the user or ISP simply entered the IP whitelist incorrectly. In other cases the hosting service's FTP server was unreachable. On more than one occasion, the ISP had set up "staging" and "production" environments without telling their users what was happening, so while Blogger was successfully publishing (to the staging server), the posts were not visible on the web and the user had no idea why their posts weren't showing up. In a great deal of cases, FTP publishing works but is extremely slow due to shared hosting plans having slow or limited network or disk per user. We have seen cases of full FTP republishes taking over a month, entirely due to the FTP server being slow.

What about sFTP?

If sFTP addresses some of the concerns with FTP, why are we shutting it down too? Fewer than 15% of users have adopted sFTP as their publishing mechanism, and many of the same challenges apply to both sFTP and FTP. Even with sFTP, a republish of a blog can take longer than a month to complete. Not all ISPs support sFTP and of those that do, many lock down FTP and sFTP with the same IP whitelist.

Engineering Effort

Blogger's current FTP support was 100% re-written for stability and maintainability in 2008. We added redundant queues on our side to make sure we never missed a file. We spent a significant amount of engineering time improving FTP support in the last two years, not including support and troubleshooting user issues. Even after this effort, approximately 10% of all FTP publishes fail.

As the original blog post mentioned, Google infrastructure is changing and would require us to again rewrite a major portion of our FTP support. Even after that rewrite, things would be no better than they are today in terms of stability, and we would be running just to stay in place.

Decision

I hope this post helps you understand the issues we face, and why it is not simply a question of money or a small bit of time. We want to deliver a best-in-class product experience for all users. Supporting a protocol with known security vulnerabilities and dependencies on downstream ISPs was preventing us from delivering the stable, reliable, and functional product we want and our users demand. It was also preventing us from doing more for the 99.5% of users who host their blogs with us, either on their own domain or on blogspot.com. While we deeply regret the impact this has on some of our users, many of whom have relied on Blogger for years, we remain confident that this was the right decision.

23 comments:

  1. Sounds reasonable to me. I do have one question. Must I wait until the Migration Tool is available in order to have the traffic from my old (FTP) blogs automatically redirected to the new (custom domain) locations, or can this tool be used after the fact? I have a lot of blogs to switch over, and the sooner I can get started the better, but not at the expense of losing Page Rank, getting penalized for duplicate content, and not having the old blog posts redirected.

    ReplyDelete
  2. This comment has been removed by the author.

    ReplyDelete
  3. You kep mentioning 99.5% and .5% but the .5% do 50% of the business with REAL blogs and REAL content.

    If you wanted to provide an unsupported solution there's plenty of webmasters and business-people here who would understand that Daddy Google won't hold their hand.

    ReplyDelete
  4. Tell the 10% who fail that it's unsupported now. The OTHER 90% who have NO PROBLEMS with ftp will thank you for it.

    ReplyDelete
  5. I am just about done my migration. I moved temporarily to a holding spot so I could keep all my images. Created a new blog hacked the template and have migrated over half the pictures. 670 posts was a lot of posts to recreate the pictures for but I get about 25% of my traffic from google image searches. Back up and running now on a custom domain here Book Reviews and More. Not 100% satisfied with the new layout but getting there. But would have prefered to just keep posting via FTP.

    Steven

    ReplyDelete
  6. First of all, please update the FAQ more often (in addition to ) scattered responses to comment posts.

    Second of all, I know this isn't the most relevant post to comment on, but going for the most recent.

    I've trying to catch up on the forced move, and I've gone through a lot of the comments, so my bad if I've missed it -- but please clear up some things for me:

    I'm fine moving www.olddomain.com/~me/blog/ to blog.newdomain.com, but if I do so, I want a clean break and I don't want to have to deal with olddomain any more *AT ALL*, and so no ugly missing hosts file for me, thanks -- I'm not that popular, so I'm not too worried about by pagerank, etc... a simple temporary redirect will work fine -- I'd rather keep all my files together in one place.

    I'm also fine with blogspot.com hosting all the image (and possibly other) files for me.

    a) Will there be an option in the migration tool upload all my image files from old-domain to be hosted at google/blogspot? and have the img tags, etc. fixed in old posts automatically? This seems like a big missing question from the FAQ.

    b) if not, will I at least be able to manually do a mass upload to blogspot of all my images at once?

    c) What about other file types (.pdf, .mp3, .wmv files, etc) -- it doesn't look like blogspot allows those, correct?

    d) if so, it looks like blogspot uses some long url for all the uploaed files http://ugly.blogspot.com/really/ugly/hash/filename.jpg -- will there be a deterministic way to determine these URLs in advance?

    Where I'm going with this is that I can write my own script to do an [[ export blog posts -> script to change link urls -> import blog posts ]] but I want to know if that's feasible.

    thanks

    ReplyDelete
  7. Hi

    Unfortunately, I have to take my blog elsewhere - thank you for the free service over the past 6 years. One of the things that would help with my migration is knowing the algorithm you use to generate the URL of a blog post please.

    Some things are obvious:
    1. It substitutes spaces " " with dashes "-"
    2. After that step if the total length of the string is more than 39 it truncates the end, ensuring the URL ends on a complete word
    3. Sometimes it removes filler words, e.g. "a"

    It would be great to have the full algorithm to help us migrate please.

    ReplyDelete
  8. FTP is time tested software.

    Hard to believe the mighty Google is afraid of FILE TRANFER PROTOCOL.

    Still not too late to come to your senses. Not only will I de-Google my Blog but I'll de-Google everthing -- except Gmail I suppose, since that would be too onerous even for me.

    And I'll urge others to do the same.

    Because if you screw us once, you're bound to screw us again and anytime you feel like it.

    ReplyDelete
  9. So you're solving problems I might have on my end (i.e. my ISP & FTP) by just eliminating what could cause the problem despite the problem not being on your end. Yeah, thanks.

    I've moved to WordPress.

    ReplyDelete
  10. Since I have no choice I need some answers as I can't understand some of the jargon mentioned in the comments of previous blog posts.

    So my blogs now on my main domain, my index so mydomain.co.uk.

    By using your migration tool it'll do everything for me? Like upload the blog to this new blog.mydomain.co.uk?

    How does one even get a domain like that? If I go to that link the page does not exist obviously but if I wanted that domain how would I create it? Does the migration tool do it?

    Basically I just want a clean jump. With only 54 posts and hardly any comments and a custom design.

    Also does that mean we have to use your webspace for anything? I'm happy hosting my stuff (images, webpages, mp3's, psd's etc) and I'd rather now mix and match some of it on your server and some on mine. I want it all on mine.

    Is that going to be possible? You see after all this news I tried another blog but it requires MYSQL which would require me to switch to a linux hosted server and that means backing loads of stuff up and reuploading which I'd rather avoid if your migration tool makes this clean and easy.

    Can you assure me this will be clean and easy? If not I want you personally to walk me through if I run into problems.

    Thanks.

    ReplyDelete
  11. @Dave Tomkins: Unfortunately, no - Blogger does not host files, only blog posts. The purpose of the missing files host is to allow you to continue to host those files with the current directory structure elsewhere, so links to those files won't break. There is no automatic way to auto-migrate arbitrary files (pdfs, gifs, jpgs, etc.) to a Google-controlled host, so you'll need to maintain those elsewhere.

    @Dr Moondog: You can use htaccess to redirect traffic from the old site to the new site. Because the specifics of htaccess are often unique to the server implementation on your webhost, we can't provide generic instructions for all cases, you'll need to work with your webhost or a programmer to ensure that you do that clearly.

    ===

    Apologies for not being here the last few days; a bout of the flu followed by a family commitment had me out for several days. Other team-members (most notably Noah and Shameela) were active in the comments, but I'm doing a sweep through all posts this morning and will be updating the FAQ later today and through the weekend.

    ReplyDelete
  12. @Habboi Creating a sub-domain is a straightforward process documented here. The migration tool explains how to do that, but you will need to work that out with your registrar.

    ReplyDelete
  13. To break it down into non-technical terms it sounds like it wasn't worth the time investment nor the money to continue to support or adapt the infrastructure to support FTP in this capacity. I believe it's an unfortunate decision, and the ones who will be truly affected are the more technical types.

    Due to the nature of this change and the fact that it doesn't suit my personal needs I've since moved on to Wordpress. However, I haven't paid a cent to use the Blogger service all these years and I thank the Blogger team for their hard work and effort.

    I will however be keeping an eye out now for changes to other Google services. I hope this isn't indicative of future changes to other services I use and have grown to depend on.

    ReplyDelete
  14. @Rick Klau... thanks for clearing that up... Don't assume us (long time) FTP'ers know how Blogger is set up... It says right there on the main intro that there are free photos in Blogger... I see now that it's automated through Picassa... a service I've never used...

    So from now on, I'll be able to put photos in Picassa via the blogger interface, but what if I want my old photos in the same place?

    so *I* would presume that a 'migration tool' would have the option of taking all my old images, putting them into a Picassa album for me, and then update the links and tags accordingly.

    This would be wishful thinking ?!?

    Okay... so assuming that's what I want to do... what's the best route for me to do this?

    again... more stuff for the FAQ

    ReplyDelete
  15. Thanks for the guide for creating a sub-domain. I use 1and1 and I was fortunate to have an easy to follow guide.

    I followed the guide to the dot and typed www for the sub-domain so does that mean my blog will still be on my main page? Still don't fully understand and I realise you're writing an FAQ soon so I hope it will answer that.

    ReplyDelete
  16. Hi all - just noting that the FAQ is updated, addressing a number of recent questions raised on this and other posts.

    ReplyDelete
  17. It's a pity you're removing Blogger's killer feature (for me at least), because 10% of that feature's users had problems and your infrastructure made it worse. Really, why not migrate 10% instead of cutting the rest of us off? If you invested all that time in the last couple of years it seems a pity to lose the effort.

    FTP allowed us to set up blogs using all the resources of a real domain - htaccess for neat URLs and hotlink protection, custom 404s, server stats, authentication for test blogs, the list goes on.

    The hosted solution doesn't provide all these benefits; so you're not just removing one feature, you're removing a whole set of features.

    Really, I understand where you're coming from - it's annoying to support FTP and the Blogger team is sick of it. Fair enough.

    I just think the replacement is missing a lot of key features enabled by FTP. Add in server-based stats (JS-based stats are incomplete); take off the 10-page limit; give us access to our subdomain's htaccess (or add the key features to the Blogger interface); allow templating for error pages; and Blogger would be an effective package again.

    ReplyDelete
  18. Hi, Rick, I live in Japan and I won't be able to create a sub-domain. I don't think I can get my provider/registrar to understand this issue, as my command of the language is not strong enough in technical terms such as CNAME and FTP.

    I'll have to move out to WP most probably, as I'll lose both access to blogger and to my old blog on my personal domain name.

    Thank you any way for 5 years of free web editing tools and free ftp.

    Eryn, totally lost in this process.

    ReplyDelete
  19. The tragedy of this imminent move is that many blogs currently accessible to readers in China no longer will be.

    The great firewall does IP based blocking as well as many other kinds, has been blocking blogspot for years, but they can't block all the ftp bloggers in one fell swoop, because it simply isn't technically feasible.

    Those blogs are now going to end up on google servers, which means it will be much simpler to block them all - and this will happen in due course, as certainly as our sun will turn into a black dwarf. This means that less information unprocessed by the central government will be available to your average Chinese person, which is basically a bad thing for humanity.

    Please rethink this strategy, for the sake of Chinese people who are trying to live informed lives.

    ReplyDelete
  20. Rick, would it not be possible simply to remove support for FTP, as opposed to removing FTP completely?

    To me it makes sense to simply publish a FAQ similar to what has been posted above, but keep FTP running. Then the 90% of keen FTPing bloggers who don't have any problems can appreciate the risks of sticking with it, but at least have a choice.

    ReplyDelete
  21. @Kiell: As stated in the original announcement, we'd have to completely rewrite the code that uses FTP in order for it to remain functional beyond the infrastructure change (which we've postponed to May 1). That's not an investment we're going to make, for the reasons originally stated.

    ReplyDelete
  22. Note: Comments are closed on this post. Please direct general questions to the FAQ page, and specific problems with the migration tool to the issue tracker. More details about support for the FTP migration are here.

    ReplyDelete