SharePoint has come a long way since its earliest incarnation. What started as a way to improve on file- sharing technology has evolved into one with a complex and powerful collaboration environment. With each new generation, there are new problems. Power typically comes along with complexity—complexity at the level of administration, design, development, deployment and the day-to-day managing of the non-technical aspects of the environment. Change management, user adoption and governance, the glue that holds the other pieces together, are the Achilles’ heel of just about any technology.
Every generation is meant to solve problems, but the nature of the beast is that it begets more problems. Therein lies the dilemma: One can only take advantage of new capabilities if one is not too far behind the learning curve. If there is no maturity around content curation (i.e., the “content hygiene” of applying some metadata, either automatically or through a defined process, and checking quality), then there really is no point in trying to take advantage of the term store, managed navigation or content types.
What does this have to do with semantics and search? First, let’s start with a definition. What are semantics? Or do we say, “What is semantic”? Simply put, it is the study of meaning. When people talk about semantic technology, this refers to programs that can interpret the things that humans are good at and computers are bad at: meaning, nuance and context. Computers are good at the things that humans are not: long lists of information, large numbers and calculations—all of which can be dealt with through pure logic. In contrast, computers struggle with language and interpretation that relies on subtle differences in context or usage.
Human knowledge and the subsequent language used to communicate such knowledge is ambiguous and messy. Take the term diamond. It could refer to jewellery or sports. If I am searching for “Mercury,” is it the planet, the element or the car? We see this problem in business every day. People use different terms to describe the same things and the same terms to describe different things. They also use vague terms like searching for a “solution” or in looking for an “agreement” or “deliverable.” Legal agreements are all about reducing ambiguity, and of course, the legal industry thrives on the inability to do so.
The majority of organizations that have problems with search have them not because the search engine is inherently flawed. Search engines do what they are told—they look for word occurrences. Humans and human language is the bigger problem. Yet, Microsoft found a way around this problem: Define the terms, and add structure and context to the ambiguous, messy information. Add some order and intelligence, and this worked perfectly—for those organizations mature enough to understand these approaches and the process, discipline, accountability and feedback mechanisms to apply them correctly and consistently.
However, most organizations do not have their acts together when it comes to the core foundation for these processes: enterprise taxonomy and information architecture. In fact, a recent survey of practitioners and executives from the industry community (283 responses comprised of enterprises of various sizes) found that over two-thirds of respondents (68%) rated their organizations as “unmanaged or limited” in terms of their maturity in enterprise taxonomy. The point is that without the core of intelligent classification, it is not possible to be successful in content organization, and therefore, search will not be as effective as it could be.
As previously suggested, each generation attempts to solve shortcomings of the prior solution (or actually the problems caused by the prior solution), and SharePoint 2013 does an admirable job of correcting for the sins of the organization and its prior incarnations (both version predecessors and implementations). For example, imagine that you don’t have metadata, and the user is looking for content that pertains to the company’s Deluxe Widget. In this case, we’ll search for Acme’s Deluxe Widget. There are many documents that contain “Deluxe” and many that contain “Widget,” but if the three terms are together, then one can be assured that this is product information. The SharePoint 2013 environment has the ability to develop vocabularies, which are authority files (containing names for auto-classification or entity extraction), and allow facets to be derived based on term occurrences and rules. There are a couple of other powerful approaches, including query processing, result blocks and content preview that add to this capability.
- Query Processing: There is the ability to make queries subject to all sorts of pre-processing manipulations, which can effectively identify the user, understand the term, interpret that term based on a use case and associate terms that are auto-discovered. We can use query rules to change the nature of a term (to align more closely with what the user is asking for). This effectively means that the search engine can recognize the problem that the user is trying to solve and can present results, depending on who they are and what tasks they are going to accomplish.
- Grouping of Search Results: Sometimes, different kinds of information should be grouped as a block to make related information come up in a search. In this way, we can have a video URL come up along with a summary of the video, or, perhaps, related documents, like a proposal and a case study, can come up and be ranked together for easy access—like being handed a file with multiple documents. However, this file is dynamic and what goes into it can be driven by various use cases, content models and even by different roles.
- Content Preview and Interaction: One really interesting way of interacting with information is when you don’t have to download and open a document but can see what is in it just by highlighting it. This nifty little feature is called document preview. Beyond this ability, you can do things in the document like edit, save, forward, share, collaborate and more—right from the search results.
Each of these by themselves seems simple—almost trivial. However, just as simple building blocks can be assembled into complex constructs, these (and a few others) search components can be combined with various rules and sources to create something called a search-based application (SBA). SBAs can’t do things completely by magic, and in fact, they have been around for many years. However, for the first time, extremely powerful applications can be developed at a much lower price point than in prior generations of search tools, like FAST. The search architecture has been completely rebuilt from the ground up. In addition, many powerful functions have been introduced and are extremely useful for enterprises developing next-generation search applications. From connectors that allow crawling of structured and unstructured data sources, to promoted results, query transform rules and configurable ranking algorithms, it is possible to develop intelligence that will compensate for the problems of poorly curated content.
In SBAs, users don’t search for content just through a generic white box. Rather, the application integrates a search engine, data source connectors, document processing and a search index enhancement to provide the right content when it is needed in the context of a task or work process. In other words, content is integrated into the user experience and woven into business processes.
SBAs can be defined by having some or all of the following characteristics:
- The use of a search engine to dynamically drive the information processing interaction
- Triggering of information retrieval through key term or phrase query that returns additional parameters used to pull information from other sources
- The use of search engine connectors to bring data from its sources
- Processing of user search terms either before the query is sent to the engine (pre-processing) or after the results are returned (post-processing)
- Enhancement of the index (by adding more metadata, changing that metadata or removing metadata)
- Combining, ranking and formatting result sets
- Integrating information from disparate or similar sources
- Performing document processing (previewing content, segmenting content or recombining content)
- Allowing for interaction with content or data
SharePoint 2013 is a platform for developing semantic search and search-based applications. These applications represent the next stage in being able to tune search results very specifically for end users, giving people what they need and in the context of what they are trying to accomplish. SBAs, using semantic relationships in SharePoint 2013, require the ability to model people and processes. This can result in very intelligent search engines that understand the organization and the needs of various user types to return content in the context of their goals. The search tool can be developed to think like the user. To embed the users’ vocabularies, terminology, knowledge of complex relationships and, really, the fabric of enterprise information and processes into the core search engine, we are capturing the mental model of the user and embedding it in search processes that will anticipate what users need and gives them what they want. This is the next-generation of enterprise information management. There are many organizations building “Centers of Excellence” to bring best practices to different parts of the enterprise. These act as internal consulting practices that help with design, adoption and operationalization. However, it is important to get these groups on a solid footing. An enterprise search and content maturity model assessment can help identify the areas in greatest need for remediation and where the most valuable outcome can follow from an intervention. By applying best practices that fully leverage the capabilities of SharePoint as a search application platform, organizations can achieve a very powerful result, which breeds a habit of success.