Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Scribd CTO: “We Are Scrapping Flash And Betting The Company On HTML5″ (techcrunch.com)
299 points by jasonlbaptiste on May 5, 2010 | hide | past | favorite | 120 comments


I've worked for a big Scribd competitor (bigger than Scribd by some measures) and I am blown away that they would move in this direction. Those saying that they are not taking a great risk are missing the point in a serious way. In this business, the viewer is the company. It is your whole competitive advantage. Taking apart a PDF is nothing - putting it back together in Flash with the right appearance, cropping, individual character metainfo, hyperlinks, caching, progressive loading, etc., etc., is science. To do that you need to be very long on Flash, and they just threw away all that inventory and expertise.

Which is, of course, a killer move if they can actually pull it off as they will be the first and only company to do it (again, comparing with Google's Javascript PDF viewer misses the point, as that 'just' renders images). I am just dumbstruck to see a company do such a 180 degree turn and they deserve all the credit in the world for having the guts.


Won't everybody using Flash be able to do the same thing with Flash CS5's <canvas> + FXG support?

Isn't the point of this move to support the iPad, the iPhone, and other non-Flash environments?


Great. Now I might not cringe every time I see a Scribd link.

A flash PDF reader is worse than useless. I can't use flash on my iPhone, and the Scribd reader used to break back when I used an Ubuntu machine. And it certainly never ran great on my Mac. When you consider that I already had perfectly fine, and free!, PDF readers on all three devices, Scribd was actively harmful.

But an HTML 5 PDF reader could actually be useful! It may be more lightweight than downloading a whole PDF. It will certainly be no worse than what I can currently use to render and read a PDF.

This is good. Scribd may change their image as a widely hated startup.


I get it, we'll ignore the fact that Scribd makes the web a worse place when the founders are in the thread.


Well that's a pretty ridiculous thing to say. How can a website make the web a worse place? A website either provides something you can't get anywhere else, which is intrinsically valuable, or you can just ignore the site.


>How can a website make the web a worse place?

It's easy. Someone sends you a Scribd link. Scribd is written in Flash so maybe you can't read the damn thing. Before Scribd, people would just send you PDF links. Everybody can read PDFs.

Scribd harms the web by reducing the number of people who can view its content. That's what I call "providing negative value".


I'll be interested to see how they will deal with the licensing issues regarding font embedding. Right now, there aren't a whole lot of non-free fonts that allow @font-face embedding, because embedding a font with @font-face makes the font downloadable to anyone with a little digging through CSS files for its URL.

I hope a solution is found for this that doesn't require specific allowance of CSS embedding in the license. If it does, some fonts will never be legal to embed.


The fonts are what interest me too. Not just the licensing, though (that never stopped YouTube with its videos - at least not for a few years!) but with having to convert a myriad of fonts embedded in documents into SVG fonts (otherwise they won't work on the iPad).. that's no mean feat.


Why does everyone worry about fonts so much? You can download a copyrighted image just as easily, and you can copy paste the copyrighted text too. So what's the big deal that you can copy the font?

And the most ridiculous thing is that fonts aren't even copyrightable. (You can copyright the actual file with the fonts, but not the shape of them.)


Embedding in general will be interesting... being able to ctrl+c/ctrl+v anything you want straight to your blog/website/spam site has to impact their reach.


Thank god. I like their product, but it's such a resource hog. Here's hoping that HTML5 Scribd will be a little more lightweight.


"Betting the company"? Sounds a little dramatic. It's HTML5 not Silverlight. How much risk is there really?


There's always risk in changing the technology platform of your core product.

Things to consider:

- What development tools will devs use?

- How long will it take devs to get up to speed on them?

- How does debugging work?

- How do you handle losing breakpoints?

- What does the new build process look like?

- How are you going to handle graceful degradation?

- How does font-end error reporting work?

- How long will it take to make the migration?

- What business metrics are you tracking to acknowledge you made the right user choice? The right technology choice?

- How soon will we see the needle move on those business metrics?

- How will this affect growth in the short term?

- Do your hardcore Flash employees want to be working in HTML5? What will you do if they don't?

Major technology shifts are a lot more complicated than just "code it and see what happens" when you have an established product and team.


Losing breakpoints? Why would they lose breakpoints? Maybe I misunderstand but you can debug JavaScript with breakpoints.


My thoughts exactly. HTML5 is going to be (or is already) everywhere, so where's the big risk?


Investment risk. Scribd is a small company, and putting a whole battalion of developers on a PDF to HTML5 conversion for half a year without even knowing whether that'll turn out to be possible (as in, good enough for 10,000,000 documents) is a scary move. That being said, check out our upcoming engineering blog for technical details about how we convert to HTML5 now!


"putting a whole battalion of developers on a PDF to HTML5 conversion for half a year without even knowing whether that'll turn out to be possible"

The Google pdf viewer seems good enough to me. It'll be interesting to see what you can improve on that.


Actual pdf readers like Preview seem good enough to me. I don't understand why I would need to take a PDF and view it as HTML.

I do see the value for formats like .pptx where I may not have a reader, but PDF is already a "lowest common denominator" like HTML.


I've occasionally been thankful for Google's PDF conversion. PDF isn't a lowest common denominator - there are computers without that capability (kiosks and terminals, for example). What's more, PDF viewers can be pretty heavy, especially for fancy documents, and their browser integration tends to suck even when it exists.

Sure, I like reading my PDFs with evince, but sometimes HTML is just preferrable.


Downloading a separate file and then launching an external application is obnoxious, particularly when it creates a new window. I know a lot of people who actively avoid and refuse to click on PDF links for that reason.

I've always thought that the best solution would be for browsers to just include PDF rendering support alongside HTML. PDF is a popular enough format on the web that it should really be handled by the browser as a core feature, not by some add-on or external application. I really liked Apple's Safari for this reason (although I dumped it for Firefox because of AdBlockPlus).

Done right it would seem like perhaps some of the higher-level rendering layers of the browser could be reused, regardless of whether the underlying content was PDF or HTML. (In fact isn't this how Safari works, with Quartz?)

No offense to Scribd, but I'd like to see the need for their service go away by in-browser support for progressively-downloaded PDFs.


In ChromeOS you'll need some kind of pdf viewer that runs in the browser. And not needing a native app is pretty nice in general in terms of bookmarking possibilities.


Doesn't ChromeOS load PDFs into the Google PDF viewer?


Well, two of many reasons are adobe is run by morons incapable of making a secure or performant product. Unless you missed the wave of acrobat exploits? 'Cause my gf's laptop sure didn't :(

Removing yet another plugin from your browser shrinks the attack surface, and as a side benefit, reduces use of one of the worst IMO bits of web tech.


stay tuned.


html >> images


The change isn't as great as it seems (at least at first). Right now their setup probably looks something like this:

PDF -> Internal Representation -> Flash viewer

In principle they should only need to change to

PDF -> Internal Representation -> HTML5 viewer

That alone should make a huge different in terms of time needed.


Most things certainly are simple in principle.


I'm not saying it's trivial... just that they are not exactly restarting from scratch.

Never having done something similar myself, I would expect the hardest part to be correctly parsing the original PDFs and dealing with the varyiaty of PDF generators out there.


Small in size but still one of the top200 (150?) web sites.



I'll believe it when I see it.

To me, this feels like desperation. It seems a little late to stop depending solely on Flash. Also, why not say that you're moving after you partially move? The companies I admire the most brag about distant features the least.


"Distant features"? It's launching tomorrow.


Yes, but that is 86400 seconds from now ..


Performance maybe - but it's hard to imagine any HTML5 implementation to be slower than Flash is. Also, HTML5 isn't arriving on IE for some time yet, and even then it may take years for the general public to update to something that can render HTML5 properly.

I wouldn't use any of the more complex/less-supported features of HTML5 on any live site today, that's for sure.


Really?

I always thought that Flash was quite performant compared to browser-based Javascript. Can you point to any benchmarks?


Flash performance varies widely. Flash on Mac OSX is generally regarded as terrible, while Flash on Windows isn't as bad.


If by generally regarded, you mean literally blows chunks... ffs, VLC is open source and it performs better on my macbook pro. Adobe is too incompetent even to copy freely available code.


Clearly video and PDF display are two different animals. Saying that performance blows on video and then using VLC as a comparison for JavaScript vs ActionScript performance is completely besides the point.


Yeah, sure, "blows chunks" is a perfectly fair assessment. I was just trying to be somewhat diplomatic.

I work with a hardcore Mac zealot and it's a running joke in the office that he will complain about Flash performance every single day.


"Betting the company" was clearly designed to get upvotes and media interest.


Exactly. This seems like a nice little PR move to have everyone talking about Scribd while this Flash-HTML5/Adobe-Apple thing is red hot.

Now, people who had never heard of this company are aware of them and am 100% sure their hit rate when way up high today.


Yep. It was clever of us to invest months of engineering time on this while also arranging for the Apple/Adobe schism to continue to worsen, and talk Steve Jobs into posting his thoughts at just the right time to frame our launch.

We're just that amazing. ;]


What you did was a more than a PR stunt. The "betting the company" line is just dramatic hyperbole. I consider "betting the company" to be saying - OK, if X happens I'll shut down the company, if !X happens you'll buy my company for 100M dollars.

In reality what you are saying is that if X happens, you waste alot of time, if !X happens then there is still no guaranteed outcome.


I believe I understand your perspective. I think you're saying that "Betting The Company" == "Taking Mortal Risk", and I agree with that definition.

I don't agree that the odds are fixed, nor that it's all-or-nothing-to-the-end. For any startup, you take mortal risk as infrequently as you can; you hedge those bets as much as you can; and, since you can invest effort to change the game in reaction to the market, you work to improve your odds over time.

So in the end, I believe you present a false dichotomy. It's rarely $100MM or death, and there's never a guaranteed outcome.


Seriously? You're whining about the CTO calling investing 6 months of three quarters of their dev team into a risky project betting the company? While it isn't literally a 100 percent make-or-break project, Scribd does have competitors who haven't been kind enough to sit on ass for 6 months. They need to have something with some business benefit to show for that 6 months or it's going to be extraordinarily painful.


My concern would be this:

1) Flash is prevalent today. No matter how much people are bashing it on HN, its widely used especially on the sites which are incredibly popular. 2) HTML5 adoption will be slow. Look at how long it to ie6 to die and you'll have an idea of how long it will take html5 to become prevalent. 3) You're taking a technology that allows you to be accessible today and trading it for a technology which will allow you to be accessible at some unknown time in the future, basically removing any barrier to entry for your competitors.

The hyperbole makes this seem worse then it is, I'm sure. You don't really have motivation to ditch your current flash implementation. You'll just put it into a "support as needed" mode.


As the article points out, they are only using parts of HTML5 that are supported by older browsers, including IE6.


HTML5 must be some kind of wundertechnology. Reached draft in 2008, supported on browsers released in 2001.

Come on, it's great that they're ditching Flash, but if it really works on IE6 calling it "HTML5" is nothing but link/pageview-bait. The parts of HTML5 that are supported by IE6 are called HTML4 and Javascript.


Actually, browsers often implement features that aren't in a standard, just because they think it's a good idea. Some of those end up being memorialized by the W3C and part of HTML. Many of the features of HTML5 started that way, including, in this case, web fonts (part of CSS3).


Internet Explorer...


FYI, this will work in IE.


Which versions? If it's IE6 then it can't be terribly HTML5-ish.


IE6+. In less-compatible browsers there are workarounds to implement the same features, so the UX is basically the same.


How big of a role did SEO play in this decision?


Good question, but none, really. It won't affect our SEO - we did this to improve the user experience on the site.


While it may not have played a major role in your decision, you have to know that this will dramatically improve your SEO. Come on. :)


Why? Google already indexes flash, and you can include meta-tags in flash as well. As long as you are doing it right, I can be just as informative as html.


You're right in general with the flash/seo stuff, but they appear to be already extracting all text anyway to present to search engines so maybe there isn't really much advantage:

http://webcache.googleusercontent.com/search?q=cache:ggG39gK...

One thing that does occur to me is that copy and paste auto attribution link stuff that's getting popular on news sites, something like that could hold more value / less overhead / easier to spread than the awkward embedded widget approach.


Actually - I heard it did


You work for scribd too ?


Do you really have to work for scribd to see the benefit of millions of pages of text?

Edit: it turns out they already present the text at least some of the time, but markup and text portability still adds some value (see my comment above).


Just in case anyone's wondering -- it's not just converting each page to an image. It's all HTML5 text, graphics, and images where appropriate.


Is there anything specific to HTML 5 there?


The new viewer doesn't use the full spectrum of HTML 5 features, to maintain compatibility with older browsers, but it would not be possible before HTML 5.


Can anybody name a single feature exclusive to HTML5 that it is actually using? TFA says:

"Friedman estimates that 97 percent of browsers will be able to read Scribd’s HTML5 documents"

That pretty much counts IE6 into the picture, so I'm really wondering exactly what "HTML5" features IE6 supports!


Using the HTML5 doctype lets you use HTML5 tags and custom data attributes and have a valid document. New HTML5 form fields, custom attributes, and markup elements are usable in IE6 mainly because it just doesn't really bother to explode when it encounters them. Form fields just show up as text boxes, custom data attributes are only used in JS anyway, and new structural elements are usable and styleable in IE6 just by adding JS that does a document.createElement().

HTML5 isn't something that just came around. It's been in the works by browser makers for quite a while, which is refreshing. Rather than it being a spec made up in a purely academic environment (XHTML 2), it's something that's made up of technologies that have already been used by one or more browser makers (and often, developers on real sites.)

Also, using the HTML5 doctype in IE6 causes IE6 to go into standards mode, which is just pure luck.

You can do a lot of good for users if you start using some of the HTML5 features right now, even if it's not apparent. If you use the type="email" for your forms when you ask for an email, the ipod and ipad will bring up the Email keyboard layout. That alone is kinda cool.


Why? What do you need from HTML5 to render a static document?? The AUDIO tag? VIDEO? websockets?


custom fonts, for starters :)


Anything else besides custom fonts? Unless there's a subset of html5 features that I'm completely unaware of, I don't see how html5 brings anything useful to a text viewing app like scribd that wasn't already possible before...


text rotation, shadow or indenting, etc. Lots of good 'print' looking stuff that was done with images or in flash before. Now Scribd gets to do it in straight html making it easier to index as well (I know flash in indexable, but I'm pretty sure there is a preference to text).


True, but not convinced enough people care about custom fonts over rendering an image. It ends up being pretty much the same experience for them. (Actually custom fonts may well load slower for users, so it's a worse experience in some ways).


But Google can't crawl images...


Google already crawls the original .pdf files. I'm not sure we need every conversion of a pdf file indexed as well.


But how else will people find their way to Scribd's site to click the ads?


You're seriously so pissy about scribd that you're currently objecting to their using standard html instead of images? Get a life.


I'm trying to figure out why they're bothering to solve something that's already been solved pretty well.

eg:

http://docs.google.com/gview?url=http://infolab.stanford.edu...

Users won't see any difference, or care.

I do think scribd up to now has been pretty bad for the web, locking plain text documents and images up in their walled garden. Maybe they can change that, but what value can they actually add? What problem are they solving?


And of course, you know that users won't see any difference or care, despite never having seen, let alone used it. Just because you think the problem has been solved well enough, doesn't mean everyone does. After all, who needs a refrigerator when you have an ice box?


No offense to Scribd, but what is wrong with viewing a document in your browser using something like Safari? I realize that not all machines are capable of this, but they could pretty easily. Someone asked the other day, "Why doesn't Windows have a native PDF reader?". Surely it's possible for all browsers to quickly and properly render a PDF, with easy controls to navigate.

I understand the added benefits of being able to comment, discuss, share, etc.. your documents with Scribd, but honestly why the need for HTML5 or anything at all? PDF's are viewable just fine in a simple PDF viewer.


Well, they don't. Not currently anyway.


this is great news. flash was just really overbearing and felt heavy. im sure it was also a bitch to deal on the back end. Can't wait to play around with it.

ps- i win newsyc bingo: YC company, techcrunch article, HTML5.


Bzzzt! You're missing: Apple, Facebook. Try again.


He got Adobe Flash and HTML 5. That's tangential enough to almost include Apple.

Needs: * Facebook integration (or even better for HN cred - removing facebook connect) * "Now works on iPhone/iPad"


Find out how a 30 year old lean startup,Apple, made billions by opposing facebook privacy issues and furthering HTML5 on the iPad by acquiring Kiko (YC S05). (techcrunch.com)


Almost perfect. I just realized we missed Erlang.


Doesn't scribed use erlang?


If this doesn't rely on HTML5 features, why call it HTML5?

DHTML or javascript would have been good enough then, wouldn't it?


if they are using html5 features and degrading gracefully, as most html5 does by default anyway, why not say you are using html5?

for one it might help push a few people into using more html5 capable browsers.


They only use features of HTML5 that are also supported by older browsers:

From the article:

"Friedman estimates that 97 percent of browsers will be able to read Scribd’s HTML5 documents because those parts of the standard are older and more widely adopted."

I don't read that as 'graceful degradation', but as a subset based on older tech.


We create basically the same experience across browsers, but use the latest tech that the browser supports. For instance, things will render much faster in Chrome than in IE6.

However, the documents will basically look the same across all of our supported browsers.


I don't normally like the "betting the company" metaphor as I feel it's overused and over-dramatic. There's also a sense of motivating staff with fear, in that if you don't work hard enough or if you make any mistakes, we'll lose this bet and the company will fail and everyone here will be out of work. No pressure, or anything, though :)

When I met Microsoft's Chief Software Architect at PDC last year, he kindly thanked me for "betting on Azure" and I thought to myself "I'm just experimenting with a new technology that may make my life easier, I try my best not to ever bet on anything". Not wanting to be a smart-ass jerk, I just talked about lolcats and 4chan instead.

Reading about what Scribd is doing, it seems that the metaphor is pretty appropriate.


I hope they succeed and HTML5 replaces Flash, but this seems like a very risky move.


Google already does a great job of showing PDF files using HTML and JavaScript. Don't think it is huge risk.

Plus the upside is their stuff will then work on mobile devices too.


Google rasterizes the PDF and streams it to you as an image. Scribd will be converting documents to HTML and CSS while maintaining a near perfect facsimile of the original document.


Google does a lot more than that. For example copy and paste works if you select some text and copy it out. That's non trivial.


That is true, their conversion understands text regions and various other things. However, what makes Scribd's viewer more sophisticated is that it will actually use structured HTML to render the document content. This is more than just putting on a layer that specifies regions in the document, it will actually just be a normal HTML document, made of divs, text, images, etc.

Plus, it will maintain the fidelity of the document -- meaning that even PDFs with complicated layouts will be rendered properly in HTML. No trivial task.


What will be the main advantage?

It's a great technical challenge, but will users notice the difference?


Users will be able to easily view PDFs on the web from any device. I primarily read HN on my phone. Any time I see a [Scribd] link, I forget about the story because I can't read it. I have also had several occasions where I needed to read a PDF on the go. I had to email it to a friend and then call them, dictating the pertinent bits over the phone.

Once this is live, Scribd will gain at least myself as a user, and I suspect many more.

(Nesting is too deep to reply to axod, so: the fact that Google is doing this should be reason enough. Scribd exists as a place to publish material. That material should be reachable by as many users as they can manage.)



Although the semantic web is still pie-in-sky-land, having your content structured instead of a big block o' text is always better.

Especially if your users are disabled.


This is a little indirect, but presumably the googlebot will find it easier to index scribd pages, which might help you as a user if you're searching for something that's hosted on scribd.


True, although .pdf files are already indexed. Do we also need to have html5 versions of those same pdfs indexed? I'm not sure we do.


The Google PDF has an OCRed text component to it, Google Books is a real bear to use on the iPhone. On the iPad, depending on the book, it can be acceptable (when compared to the pain of trying it on the iPhone!).

I don't know how Scribd is going to carry this off, what with people sometimes uploading outright scans of books. I mean, Scribd is not Scribd without the stuff put there by people -- like with Youtube.


If they really can display PDFs accurately using HTML5, that's great, and plenty of other companies (including CrocoDoc) would benefit from following suit. But PDF still does a lot of things that HTML doesn't, like first-class font embedding (@font-face won't cut it). I'm not sure whether this is the right time to make the switch, though me and my iPad wish them luck.


This is a good move. HTML is a better and more natural feeling format for long documents than Flash (and PDF readers) ever were.


If they are moving from flash to html5 then how are they drawing?

You can get alot of drawing flexibility if you use Canvas, but performance of any Canvas-bridge for IE makes it not worth using.


I might be missing your point, but Scribd does not do any drawing. Any non-text elements are represented as images.


Hmmm.... ok, that makes sense. Thanks!


Seems like a route to a possible acquisition by Adobe or Google. The technical constraints of HTML5 make it interesting.


I don't get it. It's a shift of technology from one platform to the next. Content, such as ads, is going to be the same. So one gee-whiz thing to the next is just going to result in the next generation gee-whiz ad-blocker.


HTML5 really makes sense for the online viewing. Guess PDF will stay as popular offline reading format. At least until we get some universal cross-platform/cross-browser webarchive format. Could be an idea for a start-up ...


"Now any document can become a Web page."

Now if only I didn't have to log in to read it.


you have to log in to read documents on Scribd? i've never had to.


Was it to view them as the source PDF rather than their Flash approximations, then? I can't remember, but they /did/ get on my nerves more than once requiring an account to get at the hosted content.


Yes, it was for the pdf. Clutter the web with SE spam, and the make you log in to get the real document. Gee, thanks.


I'm not happy with the way Apple's been throwing its weight around on this issue but all the same I'd much rather see this kind of thing implemented in HTML than in Flash.


Ironically the page is causing me a stack overflow in IE8.


Cool. Maybe now I'll actually be able to log in and view a document?

Because I never have been able to before.


It looks like it's still flash. Why announce this before it's done?


To get at least two waves of attention: (1) announcement; (2) delivery.


Nice. Thank you, it's about time.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: