The issue of semantics has been around for a while, before the recent advent of AI and Deep Learning around 2012. It is what the Semantic Web was about. It was mentioned by Tim Berners-Lee, the creator of the Web as we know it, way back in 2001. A lot of time has passed since then and it s abundantly clear that we are very far from the ideal of the Semantics Web, even with all the advances in AI.
And it is not for a lack of effort. Google has been relatively open about their Knowledge Graph and Knowledge Vault. One can easily check them out by typing some searches and inspecting knowledge boxes that often appear.
Microsoft has Satori and there are even successful startups, such as Diffbot, competing and leading in this race to understand meaning. But what can we show as of today for all these efforts? Not much, as even Google with their immense resources is unable to put a significant dent in the problem. Check out those knowledge boxes and ask yourself how impressed are you with those results? And do not think this is only an academic question, as all the uproar around fake news and inappropriate YouTube videos is painfully demonstrating to Google.
So what is the problem? Why is it so difficult even for Google to figure out this semantic graph. The answer is in the scope and size of the best knowledge graphs which are still very limited. They are built to a large extent from human curated efforts, such as Wikipedia and Freebase (which Google acquired) and nodes in those graphs are very sparse and haphazard. They still revolve around basic concepts such as persons, movies and places and the most basic facts about them. Some automated efforts which have been springing up are all about extracting facts from web pages and features what those pages are about.
What is badly missing is a much more comprehensive set of semantic nodes upon which we can build more complex and important relationships, beyond what is the director of a movie, or capital of a country.
At its core, Google’s impressive information empire has been built on syntactic concepts such as keywords and N-grams. It is not a secret what are they and how many are out there. It has been always fascinating how the number of N-grams starts leveling off as the number of words increase, instead of a naive combinatorial explosion. But what are meanings of all those, ore even just some of those N-grams? How do we know what are more important that the others? And what are key relationships connecting them?
These are not just rhetorical questions, there is a great answer in form of an enormous graph – the Reasoning Graph. This is what Waevio is building and what we will be talking about more in the future.