How Big is Big? A 200TB contentbase?

Many customers have large MarkLogic contentbases, because scaling is something we do quite well. I thought I’d share a bit of what one presenter described today at the user conference.

One speaker today described a project with:

  • 2 petabytes of total storage. Get ready, because that’s 15 zeros. 2,000,000,000,000,000 bytes of data, if I got the zeros right. (For more on how big a petabyte is, see here.)
  • 200 terabytes of content that will go into MarkLogic, requiring about 482 TB of total disk space, including indexes. That’s nearly 25 times the text content of the Library of Congress, per this post. Were it deployed today, it would be the 3rd largest database in the world, per the same post.
  • Approximately 1,200 terabytes of associated binary data that will be stored on the file system.
  • 4B documents going to 7B documents over time.

It’s a good thing that Christopher Lindblad was thinking “Internet scale” in terms of target contentbases when he designed the system. There are customers who have projects that require it.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.