I am working with a client to assist in the creation of an enterprise-wide portal environment. As part of that effort, I was working on defining the information architecture for the portal environment. Several months into the effort the customer asked me exactly what “information architecture” was and why it was needed.

I had made the assumption that most people know what information architecture was. Obviously I was mistaken. This article summarizes the various parts of the information architecture my colleagues and I defined for the client, what capabilities the pieces enable, how they are implemented in SharePoint, and how these parts come together to form a cohesive information architecture.

Ontology

Ontology is the means by which we classify items. An ontology allows us to define the types of entities that comprise the set of information with which our system will deal. For example, Document entity type, Expense Report entity type, Fit Rep entity type. An ontology allows us to define inheritance relationships between the entity types. For example, an Expense Report entity type is a type of Document entity type, and a Fit Rep entity type is also a Document entity type.

The ontology entity inheritance allows us to implement Metadata in an organized, intelligent manner. Instead of having an enormous pool of all possible metadata attributes to choose from for every entity type, we are now able to associate metadata attributes with specific entity types. For example, the Title, File Name, and Created Date attributes are valid attributes for the Document entity type. Because the Expense Report and Fit Rep entity types inherit from the Document entity type, these entity types inherit the Title, File Name, and Created Date attributes. But, because the Expense Report and Fit Rep entity types are distinct entity types, they are also able to define additional attributes. So, the Expense Report entity type defines an Expense Total attribute, while the Fit Rep entity type defines the Report Date attribute.

A well-defined ontology allows users to perform queries like “find all Expense Reports where the Expense Total is greater than $1000” or “find all Tasks assigned to me”.

SharePoint implements ontologies using Content Types. A SharePoint Content Type implements a single entity type in an ontology. Each Content Type has a set of metadata, implemented via SharePoint Site Columns, that defines the valid attributes for a particular entity type.

Taxonomy

Taxonomy is a structure for categorizing objects, often referred to as a taxonomy tree. An object is an instance of an entity type defined by the ontology. While ontologies allow us to organize entity types, taxonomies allow us to organize instance objects of entity types. A taxonomy consists of a tree structure where each branch and lead of the tree is a category. Any given category many have sub-categories. A taxonomy tree starts at the root, which encompasses all object instances. Categories closer to the root are broader in scope, with each sub-category becoming more and more specific. For example, the taxonomy path Root -> All Projects -> Active Projects shows a progression from the very general (All Projects) to the more specific (Active Projects).

It may be useful to think of a taxonomy as a folder structure in which objects reside. However, unlike a folder structure which permits an object to reside in only one location at a time, taxonomies allow objects to reside in multiple locations at once. For example, while a project may reside in the path Root -> All Projects -> Active Projects, the project may also reside in the path Root -> All Projects -> Outsourced Projects. This is because a project may be categorized as an Active Project and an Outsourced Project at the same time.

In addition to allowing one object instance to reside in multiple locations within one taxonomy, there may be (and often are) several taxonomies defined over the same data. So, an object instance may be in multiple locations within multiple taxonomies at the same time. For example, the project in the previous example may reside in the paths Root -> All Projects -> Active Projects, Root -> All Projects -> Outsourced Projects, Root -> Organization -> IT, and in Root -> Organization -> Finance. This is because a project may be categorized as an Active Project and an Outsourced Project, and may represent a collaboration effort between the IT department and the Finance department at the same time. This project has been categorized in two different taxonomies: a functional taxonomy useful for project management and a second taxonomy based on organizational structure.

A well defined set of taxonomies allows users to perform queries like “find all Expense Reports where the Expense Total is greater than $1000 within the category Root -> All Projects -> Outsourced Projects” or “find all Expense Reports where the Expense Total is greater than $1000 within the category Root -> Organization -> IT”.

SharePoint implements taxonomies using a hierarchy of sites and sub sites. A SharePoint site defines one node in a taxonomy. An item physically located within the site or one its sub-sites or referenced by a link in the site or sub-site is said to be part of the taxonomy category represented by the sub site. The Site Directory is also used to assign categorization information to collaborative sites and workspaces.

Topology

Topology defines network locations where services reside. Each location within a given topology that provides services is referred to as a service node. Any given node within a topology may have one or more parent nodes and or one or more child nodes. Topologies allow distributed systems to be designed and implemented in a modular fashion and operated an agile manner. I.e., service nodes can be provisioned when they are needed to expand the capabilities of the topology as a whole. Service nodes may also be moved from one location in the topology to another as the needs of the topology change.

Topologies are primarily used to set limits to the scope of data for which any given service node is responsible. This is referred to as service boundaries or service domains. When a service running on a service node receives a service request, the service may either answer the request if it has the data, or pass along the request to other instances of the service running on the service node’s parent or child nodes. How many nodes work together to answer a service request depends on the nature of the service request. For example, some service requests may only make sense when answered by the node that received the initial request. Other requests may be properly answered by the service node’s parent nodes, or the service node’s immediate child nodes, or some subset thereof. The final determination of the service boundary is determined by the individual services and the types of requests the services process. The topology simply enables the capability.

While topologies can come in all shapes and sizes, there are several commonly implemented topologies:

* Centralized – A topology that consists of one service node.

* Mesh – A topology where a given service node may have one or more parent service nodes and one or more child service nodes. If the service node receives a request for information that it does not have, the service node may pass on the request to one of its parents or children. Mesh topologies imply a peer-to-peer relationship between all nodes within the topology. For example, the World Wide Web implements a Mesh topology by using hyperlinks on pages in order to transfer information requests from one service node to another.

* Tree – A variation of the Mesh topology where a given service node may have at most one parent. A Tree topology implies that nodes closer to the root of the tree are more authoritative that nodes further away from the root. For example, the Internet Domain Name System service (DNS) uses a Tree topology in order to establish service boundaries for DNS queries issued to a given DNS server. A query for the address of “http://www.blackbladeinc.com/” first goes to the com DNS server to get the DNS server for blackbladeinc.com, then to the DNS server for blackbladeinc.com to get the address for http://www.blackbladeinc.com/.

Information Architecture

Information architecture is the collection of ontologies, taxonomies, and topologies used to organize the information in a given system and make that information accessible, i.e. the information managed by and accessible to the set of portal services. Information architecture enables metadata, meaningful enterprise search, drill down browsing of information, information flow, selective data replication, records management, and many capabilities essential to modern information systems.