How Big is Big? A 200TB contentbase?

Many customers have large MarkLogic contentbases, because scaling is something we do quite well. I thought I’d share a bit of what one presenter described today at the user conference.

One speaker today described a project with:

  • 2 petabytes of total storage. Get ready, because that’s 15 zeros. 2,000,000,000,000,000 bytes of data, if I got the zeros right. (For more on how big a petabyte is, see here.)
  • 200 terabytes of content that will go into MarkLogic, requiring about 482 TB of total disk space, including indexes. That’s nearly 25 times the text content of the Library of Congress, per this post. Were it deployed today, it would be the 3rd largest database in the world, per the same post.
  • Approximately 1,200 terabytes of associated binary data that will be stored on the file system.
  • 4B documents going to 7B documents over time.

It’s a good thing that Christopher Lindblad was thinking “Internet scale” in terms of target contentbases when he designed the system. There are customers who have projects that require it.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s