Category Archives: OSINT

Harvesting Deep Web Content for Open Source Intelligence (OSINT)

Frequent “KellBlog” (just trying it on for size, see previous post) readers should know that I have a keen interest in open source intelligence (OSINT) both because we do a fair bit of business within the intelligence community at Mark Logic and because I’m inherently fascinated by the notion of things hiding in plain sight. It must be the math puzzle guy in me, but there is something really cool about linking together a series of public information and transforming it into actionable intelligence.

I think Malcom Gladwell’s New Yorker article, Open Secrets, about which I blogged here, is a fascinating piece of work and a must-read for anyone with an interest in OSINT. More recently, I did a post on an OSINT article in, of all places, USA Today.

The purpose of this post is to highlight an upcoming webinar entitled Harvesting Deep Web Content for OSINT where we are participating along with BrightPlanet. The webinar features Mark Logic Federal CTO Chris Biow and William Bushee, VP of development at BrightPlanet.

While I’ve never seen William present, I can say that Chris Biow is both an excellent presenter and a highly knowledgeable individual in search, database, text mining, and semantic web-ish technologies.

It’s on June 30, 2009 at 1:00 PM Eastern Time. For (a little bit) more information, go here.

Director Hayden's Speech at the DNI Open Source Intelligence Conference

Central Intelligence Agency Director Michael Hayden recently gave a speech at the DNI Open Source (Intelligence) Conference.

Open source intelligence is a topic I’m interested in and have blogged about before. See Open Secrets (my favorite), Intelligence 2.0, or USA Today Article on OSINT for background.

Rather than take excerpts, I’ll just link to full text of the speech here.

USA Today Article on Open Source Intelligence

If you’ve not read my previous post on Open Source Intelligence, known as OSINT in the Intelligence Community, then I’d go read it now — it’s called Open Secrets and it’s about a delightfully well written article by Malcom Gladwell on the topic of deriving intelligence from public information.

Provided you’ve read that post, then you should find this recent USA Today story, entitled Today’s Spies Find Secrets in Plain Sight, of interest as well.


… the President’s Daily Brief and other crucial intelligence reports often rely less on secrets from risky espionage missions than on material that’s available to just about anyone.

Intelligence officers have gleaned insights on Iran’s nuclear capabilities from photos on the Internet. They’ve scooped up documents, including a terrorist training manual, at international conferences and public forums. They’ve found information in foreign university libraries and newscasts.


Open sources can provide up to 90% of the information needed to meet most U.S. intelligence needs, Deputy Director of National Intelligence Thomas Fingar said in a recent speech. Harnessing that information “is terribly important,” he said. “It ought to be a normal part of what we do, not being fixated on secrets dribbling into the computer’s in-box.”

Intelligence 2.0

Today’s New York Times had an article entitled, Logged In and Sharing Gossip, Er, Intelligence, that I think is well worth reading. The article describes Web 2.0 style initiatives in the Intelligence Community (IC) designed to improve the quality of intelligence and promote information sharing.


In December, officials say, the agencies will introduce A-Space, a top-secret variant of the social networking Web sites MySpace and Facebook. The “A” stands for “analyst,” and where Facebook users swap snapshots, homework tips and gossip, intelligence analysts will be able to compare notes on satellite photos of North Korean nuclear sites, Iraqi insurgents and Chinese missiles.

A-Space will join Intellipedia, the spooks’ Wikipedia, where intelligence officers from all 16 American spy agencies pool their knowledge. Sixteen months after its creation, officials say, the top-secret version of Intellipedia has 29,255 articles, with an average of 114 new articles and more than 4,800 edits to articles added each workday.


“We see the Internet passing us in the fast lane,” said Mike Wertheimer, of the office of the Director of National Intelligence, who is overseeing the introduction of A-Space. “We’re playing a little catch-up.”

Personally, I’m glad to see the government using Web 2.0 style initiatives to try and improve the intelligence process, and I think the article (e.g., the headline) is overly negative in tone. I do understand that for virtually all technology changes, that it’s not about the technology alone — it’s about people (culture), process, and technology together. I wouldn’t expect things to be any different in the Intelligence Community than in finance or pharma, in that regard. People are people; organizational behavior is organizational behavior.

While I’ve never worked in the IC, it interests me for both personal and professional reasons. See this post, entitled Open Secrets, on what’s called open source intelligence (OSINT) and is based on a delightful article by Malcom Gladwell, or read this book, The Puzzle Palace, a classic that describes the history of the National Security Agency (NSA).

Open Secrets

I recently found and greatly enjoyed this New Yorker article, entitled “Open Secrets: Enron, Intelligence, and the Perils of Too Much Information,” by Malcolm Gladwell (of The Tipping Point and Blink fame). See here for my review of Blink.

Open Secrets is a long (7,000 word) article that goes into considerable depth on the topic of open source intelligence — basically, finding things hidden in plain sight.

Early in the article, Gladwell introduces the distinction between puzzles and mysteries:

The national-security expert Gregory Treverton has famously made a distinction between puzzles and mysteries. Osama bin Laden’s whereabouts are a puzzle. We can’t find him because we don’t have enough information. The key to the puzzle will probably come from someone close to bin Laden, and until we can find that source bin Laden will remain at large.

The problem of what would happen in Iraq after the toppling of Saddam Hussein was, by contrast, a mystery. It wasn’t a question that had a simple, factual answer. Mysteries require judgments and the assessment of uncertainty, and the hard part is not that we have too little information but that we have too much.

He then proceeds to deftly argue that the Enron debacle could easily be mistaken for a puzzle, when in reality it was a mystery. Most or all of the information needed to recognize that Enron was at risk (e.g., the complex special-purpose entities) was disclosed in public documents. The problem with Enron, Gladwell convincingly argues, wasn’t too little information but too much.

He goes on to describe the Screwball Division, a US World War II intelligence outfit that relied entirely on public information:

The analysts listened to the same speeches that anyone with a shortwave radio could listen to. They simply sat at their desks with headphones on, working their way through hours and hours of Nazi broadcasts. Then they tried to figure out how what the Nazis said publicly—about, for instance, the possibility of a renewed offensive against Russia—revealed what they felt about, say, invading Russia.

One journalist at the time described the propaganda analysts as “the greatest collection of individualists, international rolling stones, and slightly batty geniuses ever gathered together in one organization.” And they had very definite thoughts about the Nazis’ secret weapon.

That secret turned out to be the V-1 rocket and most of the inferences the analysts made turned out to be correct. Another excerpt:

The political scientist Alexander George described the sequence of V-1 rocket inferences in his 1959 book “Propaganda Analysis,” and the striking thing about his account is how contemporary it seems. The spies were fighting a nineteenth-century war. The analysts belonged to our age, and the lesson of their triumph is that the complex, uncertain issues that the modern world throws at us require the mystery paradigm.

The article ends with an example that cleverly and clearly supports Gladwell’s hypothesis about Enron. In 1998 six Cornell business school sudents decided to do a term project on Enron for an advanced financial analysis class:

It was about a six-week project, half a semester. Lots of group meetings. It was a ratio analysis, which is pretty standard business-school fare. You know, take fifty different financial ratios, then lay that on top of every piece of information you could find out about the company …

The students’ conclusions were straightforward … There were clear signs that “Enron may be manipulating its earnings.” … The report was posted on the Web site of the Cornell University business school, where it has been, ever since, for anyone who cared to read twenty-three pages of analysis.

The students’ recommendation was on the first page, in boldfaced type: “Sell.”

Gladwell is a delightful writer and this is a topic that should be of interest to virtually everyone. So my recommendation: this is a must read article.