|
|
We, as a species, kinda stumbled into this Internet thing. It’s all an inconceivably vast, unregulated, unhindered, unorganized cornucopia of digitized crap, and in order to deal with the fact that humans defecate in digital, we’ve had to invent stuff like Google to help us sift through all the … you get the point. |
The way that we’ve evolved in our mode of seeking information has always been a function of the medium that holds what we’re looking for. Books added indexes, articles added abstracts, and libraries added compilations of indexed abstracts. Categorizable volumes fell prey to the formidable Dewey decimal, the granularity of taxonomy thought un-improvable. But Von Neuman screwd’em, and abstract data types, bubble sorts, relational DBs, non-relational DBs and relevancy algorithms took over the world (rich white men are so passé).
At this point in history we search by guessing at words that might be in the document that we are hoping also contains the actual piece of information we’re trying to determine. When you think outside the search-box a little, this methodology doesn’t necessarily strike you as the ultimate form of human-understandable information retrieval, and in a certain light, it might even seem a bit archaic (okay, with nothing more modern to compare it to, I guess that would be faux-archaic).
Now I’m no futurist (though it seems a cushy job, so if any of my half-assed predictions come true, I’ll be charging $200USD/H from this point forward for any futuristic thinking you may need done), but I know things can’t stay this way forever; if the medium changes, the search will change, and the entire paradigm of what it means to ‘look something up’ could evolve. As Bill Hicks would say, we’re not growing any more thumbs people, it’s time to evolve our ideas.
So, with that rather weighty introduction, which in retrospect seems not so ado, and with no further ado that I can do, here are some futurist (read, made-up) ideas of ways the Internet could change that would make ‘search’ as we know it, obsolete.
1) Personalized Intelligent Information Agents
Do you remember that creepy scene in Minority Report where Mr. Katie Holmes was bombarded with personalized advertisements? Memory treadmill:
The idea of information flowing the other way, from the net to the individual without a search request is certainly nothing new – in its most basic form I think I’d call that a newspaper. A slightly more abstract version of the concept would be the vertical ‘portal’ (if you ever took the word ‘vortal’ seriously, you’ve played this game) – and more recently the personalized home-page version of a portal.
These individualized portals like iGoogle deliver information automatically via user-chosen widgets, topics of interest, subscribed RSS feeds etc, without the user having to perform a traditional ‘search’ for the info. This concept can be extended into what are known as ‘smart agents’, or ‘intelligent information agents’, a fledgling sub-field of Artificial Intelligence, where the equivalent of personalized search robots scour the web, retrieving information based on a fairly sophisticated schema of … you, Mr. Katie Holmes.
Search may change in a couple of different ways if proactive AI retrieval becomes commonplace. Search may simply become something we only consider for a smaller subset of information retrieval quests, because so many every-day things are answered automatically. Or, search may become something we apply to specific data sub-sets of the information that is already being retrieved for us, as opposed to searching ‘the whole f@*%ing Internet’ every time we need to find something.
2) The Valid Person & The Real Recommendation
The current state of the Internet is a precarious balance between anonymity and identification. Anonymity is required for any pureness of freedom, for the realization of a borderless, egalitarian digital world where every IP address and every byte is created equal. Identification adds value, some would argue, because any piece of content published under someone’s own name, to put it bluntly, is infinitely more likely to be a valid contribution to the canon of human knowledge.
There is a perfectly reasonable case to be made for both forms of information offering value: the former searcher wanting to know the low down on what’s happening in Tehran, the latter wanting to know the low down on what sucks about the latest Apple gadget.
In the beginning, there was darkness: a wholly anonymous Internet, full of un-credited, unaccredited information. Then there was light: in the most disgusting form you can imagine, Facebook. If the world widely accepts the idea of personally identifying themselves online, the world wide web will change, and along with it, our perceptions of search. A nameless faceless website is no longer adequate for the task of answering my questions, and Google has as much of a clue as a shoe at knowing how legitimate the nature of the information I’m reading is. When this mental shift happens, we’ll revert to searching for answers from trusted, verifiable sources, once again shrinking our world of influence to manageable levels, where we don’t find ourselves ‘learning things’ from anonymous websites that only exist to profit.
3) Specialized Databases & Expert Systems
The Internet is big, in a completely non-literal sense. I often question whether or not I really want to be searching, I’ll say it again, the whole fu**ing Internet, every time I have a query. I also have a brain. I know what *kind* of information I’m looking for, and if given the option, I may even choose a search methodology that takes my previous knowledge into account. This, since Google took over, has only ever manifested in people attempting to change their search methodology by fiddling with their query. However, if options materialize, your brain might actually come back with the suggestion that you try searching a different source of data, maybe one that’s smaller or more specialized.
Youtube, for all intents and purposes, is simply a specialized search engine for videos. It revolves more around its search box than any other feature. Google understands this, and before we were given a chance to think of Youtube as an independent search-spot, Google both bought the company, and began integrating Youtube results right into their current search engine. The universal search concept is, as much as anything, an attempt by Google et al. to try and reform our concept of what search is, keeping us going back to the one-search-box and one-universal-index mentality. I think it’s a move motivated by self-preservation.
Other proprietary, specialized databases exist, such as Lexis Nexis, the subscription based (mostly) legal search engine and data archive. Other, private, super-high-bandwidth globe-spanning networks exist, like Internet2. Perhaps more important are the private, proprietary expert systems (such as those used by large hospitals) using semantic AI. These databases, due to the constrained and relational nature of the data set, are light years ahead in terms of functionality compared to where Google could realistically be (any time soon) when looking at the public Internet as a whole.
Even if Google can try and protect what we think of as ‘search’ by integrating multimedia file-types into its regular old search engine, it can’t monopolize how we think of different information-types. File types are easy. Information types, and semantics, are not.
How much will high quality, privately maintained specialty search engines and expert systems take over our mindset of what it means to find something accurate, quickly? My guess is, as the general level of quality of information published on the net continues to become more and more diluted day by day, the value of maintained or exclusive indexes will grow in accordance.
4) Personalization Gone Mad
Google and the other big boys of search have been pushing the concept of ‘personalization’ of search results for more than a couple of years now, and while we have yet to see (or notice) much of any significant change in the way search engines treat us, the future may not be so friendly.
The issue here is that a great deal of the time, when someone searches they’re not looking for results that are in any way related to their life, hobbies, other websites they visit, or other searches they’ve made. They’re looking for *new* information. If personalization becomes a ‘standard’ feature of our search experience, as opposed to an option available for each query, then it is going to naturally decrease the variety of websites returned in my search results. This, taken to the extreme, is going to change our search experience away from something that can be thought of and acted upon as a shared experience. The notion of telling a friend about something via a search engine query, Googling it, could become antiquated. Right now, this is a pretty valid thing to say to a friend:
Hey I read a great article on n.n. last night, can’t remember the site but just search Google for ‘net neutrality in India’, you’ll find it.
But in a personalized search world, I may have only stumbled across that article because I’d been researching outsourcing companies in Bangalor recently, and Google promoted the same sites I’d visited last week when I included ‘India’ in my net neutrality search this week. Google thinks itself clever, but if I had only been at that site last week for an unrelated reason, then all Google has done in showing it to me again is falsely promote it. In addition to this, I have lost the ability to use Google as the share-point for my information, and telling my friend to search Google may end up in a frustrating wild-goose-chase for them. You might think this a subtle shift, but once we give up search as a shared experience, the nature of if changes. “Google it”, could become meaningless as a command to verify a fact if the results that fill your first page contradict the results that fill mine.
In addition to this, be it fallacy or not, people draw some faith in the validity of a site from rankings – shaking that (false) validity association by serving different results to different people is likely to affect the way the general public thinks about search. Because the concept of associating rankings with validity or trustworthiness is a naive one, this shift may very well be for the best (though I can’t see a loss of association between ‘validity’ and its brand name being a good thing for Google, which may be enough of an impetus for them to not alter results from person to person too much, too quickly).
5) Net Neutrality
Net Neutrality is a topic far too expansive to be explained in a simple bullet point of a blog post, but in its essence it revolves around the concept of whether the information currently freely available on the Internet will be provided without regard for who is requesting it or from where.
This means things like bandwidth may or may not be allocated differently for different people in different places, transfer limits, open ports, data-types being transferred. A neutral network would not discriminate with respect to what sites are accessible, what type of equipment is used to access it, etc etc etc.
Yeah, huge topic, and absolutely vital to the future nature of the Internet (hint, people with lots of money don’t want the same Internet we have now. Oh no…no they don’t). The ways that a non-neutral Internet could affect search are varied, and would depend on how neutrality was affected. If content or sites were restricted, as they are in China, then the mentality of the populace of people who are ‘searching’, has to take this into account (if they’re aware of it), and they must realize that they data-set they are scouring is intentionally incomplete, or obfuscated, and hence change they way they search.
6) Ubiquitous Access Provision
The world of Internet service providers has been shrinking since broadband access became commonplace. Cable television companies currently hold a near monopoly because of the fact that they own the best ‘last mile’ cable from grid to house (just as phone companies had previously). This, again, is a paradigm that could shift with technology.
If technology providing broadband Internet access loses the leash, and either WIFI, cell networks or another medium take over, the way we access the Internet may change – and may even boil down to something much more concentrated than Comcast: a single Internet provider present in every major city. If this was the case, the current model of paying for monthly access and/or data transfer rates and quantities may become obsolete.
A single point of access for getting online would not, in-and-of-itself, change the way we search, except, perhaps, if the entity providing that access were (buu buu BUUUM!) a search engine.
There have been reports for years of Google buying up dark fiber across North America (fiber optic cables laid city to city, but never the ‘last-mile’ to each house, as that last-mile infrastructure is the most expensive). Given some modern WIFI broadcast hardware, Google would be in the perfect position to offer ad-coupled Internet access to people for free in these cities. When you’re the ISP you can do a lot to encourage people towards your search engine. If it took off (and admit it, you’d use free Google wireless) it could potentially choke out a large portion of the competition in local search, stifling innovation, and impeding search as a concept from evolving naturally. It would change search by not allowing it to ever change from Google’s vision of what search is supposed to be.
7) The Spammers Win – Destruction of the Databases
Newsflash: Google isn’t stupid. Every other major search technology has, over the course of a few years of consistent growth and expansion, collapsed in on itself in a pile of gelatinous SPAM. They were all public databases, and as an old school SEO I can tell you from personal experience, those who ran the previous competitors to Google did not know on which side their bread was buttered. Google took care of its index, the others did not, and the others failed.
But is Google infallible? It is just a database after all. Up until now it has only had to deal with marketers who want a slice of the pie, and are willing to exploit the search engine’s weaknesses in order to get it. This is child’s play. Has there been a concerted effort yet by an anti-Google group (we’ll leave potential motivations aside for now) to dilute the Google database? The only intention behind it now is black-hat attempts to get traffic from Google – but what happens when the right group of smart kids rebel, realizing that Google has been a wolf in sheep’s clothing, and decide to destabilize the nature of the index on a larger scale.
Sound like unrealistic conspiracy theorizing? I’ve had this conversation – and trust me, the people who have cause to discuss this don’t do so in jest, or with any hint of the casual, though the idea always seems to spiral so wide (Google is particularly skilled at the upgrade game) as to be walked away from without resolve. But is it so far fetched to think I may not be the most resolute person in the world? (Hey, no laughing just because you happen to know me personally!)
What would happen if we were thrown back to 1999 style search result quality, where everything seems to be spam? We can’t just start over, delving randomly into petabytes of hedonistic data, looking for strings of characters via the distorted prism of a modern, near-meaningless, corrupt link graph… can we?
8) Semantics Start Making Some Sense
Semantics, the study of meaning in language, is the holy grail of search engine relevance. That is, to be able to understand something more of a search query than which words it contains, and to understand more of the documents in your index than simply the sum of its characters.
If you’ve ever queried a relational database with a structured query language (SQL), you have some understanding of how humans have traditionally, proactively created meaningful associations between sets of data. Once these connections are in place and recognized, you can use structured queries to retrieve very specific results or sets of data.
Google’s database is not relational, but vague attempts at drawing some semantic associations between the otherwise free swirling data it contains have been made. Google ‘Squared’ allows you to build sets of information that have meaningful relationships. Try playing with it to get a sense of its limitations and proneness to error.
More constrained data sets such as the information in Wikipedia are being organized and systematized, then placed into a database for structured query options. DBpedia takes the Wikipedia resource description framework dump, and allows for access via a structured query language named SPARQL. I really consider this to be fake semantics however, as it is half-forced structuring of previously existing facts. Semantic interpretation of language is soooo much more than that, and potentially could reign in more meaningful answers to more subtle questions from a global data-set like Google’s.
There are two ways in which the world has to change in order to make the semantic web a reality, each attacking what is missing, from opposite ends:
- Natural language processing and semantic interpretation has to advance for general web content – who knows where the limits may lie. Google squared and Wolfram Alpha are not accessible enough, but a clever way to use relational information to enhance current results could be possible
- Publishers need to start standardizing any information they present which can be standardized (HTML 5 structured markup and equivalents)
The second point speaks to publishers, many of which are simply commercial entities who have products that lend themselves to structured data (such as price, height, weight, color, etc).
I believe searchers would be willing to adopt, and may even appreciate, the option of being able to include some semantic style directives in their search. It would be an evolved query to see something like this being asked of Google’s index of the entire world’s data:
How many average sized Florida Oranges could fit into the flatbed of a Ford F150?
This is the type of query that could be answered accurately (though perhaps not with a high degree of confidence) if semantic associations were to be better identified and indexed. The general searching public may very well shift the way they compose their queries as a result of comprehending that they can request relationship data.
9) The Internet Becomes Self-Aware, Awakens, and Devours Us All.
Resistance is futile.




Hymmnnnnnnnnn !
Confused?
I haven’t an ideas what I read here?
Thanks for the comment Ed. Sorry to confuse you, I’ve recently been given some advice on Brevity (previously, I’d been going around as “The Deuteronomer”, you see) and I was told I should really have gone with draft one:
Nine ways the Internet could change that would make Search as we know it Obsolete:
1) Boobs – big bouncing naked boobs take over all sites
2) Boobs – big stupid ugly boobs take over all content creation
3) Boobs – “Ghost in the machine” + “BS!”
4) Boobs – Television kills the Internet
5) Boobs – Have I failed to make my point yet?
As a librarian, your article made me smile, because almost all the issues you mention are core factors in our occupations. For us, search has little to do with buying or selling…it’s literally all about finding. So we’re very aware of when Google works, when it doesn’t, and when it doesn’t, what works better (and are often involved in creating those ‘better’ sources).
I’d like to add another variant to what could change search: what if search is no longer free? This could be a result of a bad economy, it could be the result of Google changing it’s policies, or energy scarcities could reduce access to search in ways we’re not used to (for example, what happens to search if large cities experience random brown outs…and how are priorities established as to who gets dibs on the electricity?)
This is at the forefront of what libraries dom and plays a part in everything from libraries involvement in the Google Books settlement, to negotiating public internet access, to doing basic search reference. It may seem futurisic but that’s why your librarian has tattooes, and not a bun : )
Thanks for your comment Corina – I understand that librarians are basically the ones who traditionally have not only driven new methodologies for information organization (and of course complimentary human-search access to it), but also have been fundamental in disseminating an understanding of those methods to the public at large (I began in computers teaching novices and elderly people how to surf the internet and search, back in 1997, mostly at public libraries via a (Canadian) government funded program).
I would be more than a little interested in hearing about these issues from the librarian’s point of view. A similar post written from that perspective would be golden. If you feel like writing on the topic yourself and don’t have an outlet, e-mail me if you’d like to guest-post.
I’m thinking #9 is the most likely, LOL!
Your first few points seem to say that the current search engines are already indexing and retrieving pages from the entire Internet; far from it as I’m sure you know
So in a way they already are specialising, with popular, linked to content that they can see, ignoring a whole heap more cr@p out there!
I really liked your post (especially the pictures) but I have to say that although you are right in saying that the “semantic web” has a long way to go to be truly “semantic”, the “semantic web” != “semantics” – it’s more of an abstraction. When we refer to “containers” in coding, we don’t actually mean that they can hold liquid in them.
http://www.scienceforseo.com/semantic-web/semantics-web-semantics/
It’s natural language understanding (and generation) that has to develop for the language issue to be resolved. For that to happen AI also have to progress. Using structured data wikipedia style and querying via SPARQL is fine, it’s what Sem Web is about at the base. The web is unstructured and a notoriously difficult corpus to work with, that’s why we try to organise it using RDF and such things.
“Bringing meaning to the web” through the semantic web is not the same as “machine natural language understanding”.
Microsoft and Google will face tough competition in the next few years and will competd in information technology.
Great article, its not often you find this level of writing in a blog. The sematic web is the future.
Interesting Stuff. I’ve taken for granted just how precise google results can be. Having a Droid has even further enhanced my dependency on accuarte google searches