Tuesday, April 22, 2008

Network Diversity Index Redux

Thanks to Darren Draper for taking a look at a suggestion I had made for network analysis in a previous post. Hopefully this is not a breach in "blog etiquette" , but my response to his comment was rather long so I entered it as a post instead.
Here was Darren's comment:

OK, so I used the Shannon index calculator to learn that my H1 = 0.9088. But what does that mean?


I'm assuming that an H value of 1 means that your population is not diverse - at least not diverse when considering the different kinds of populations assigned (which are arbitrary and subject to bias).

Here's a screenshot of what I've entered (as you can see, I mostly use Twitter to connect with the ed-tech community). http://tinyurl.com/6lupxe


Darren, most biological communities will have a diversity index between 1.0 and 4.0. Your "community", with an index of .9088 indicates, on the surface, very little diversity. This would be what classical ecologists might call a "typal" community, like "grassland". In terms of your network, most of your information is coming from a single "species" called EdTech. Example: An established, mid-latitude ecosystem with limiting resources and most of them passing through a large number of very few or even a single species. The other species in this community, and there may be many, are represented by maybe only a single individual in the sample. You might say "Well yeah, its an EdTech community!' Low diversity in a network, to me, equates with focussed, but low quality (depth) information. Let's say you use Wordpress for your CMS, so you have a number of EdTech people using Wordpress in your network. If you added a few members of the Wordpress Codex community you might also pick-up information that may be of use to you.

Two observations:
One. If we consider this assessment to be correct, then, in conjunction with your discussion of Twitter Set Theory, you should be able to reduce the number of individuals in your network without reducing information content. Your EdTech species has a population of 257 competing for a resource, your time. Assume a 1 in 10 overlap in your EdTech set, you could effectively reduce the number of individuals in your EdTech population to 25-30, increasing efficiency and not degrading information. You might say at this point, "I've come to rely on my connection to more than 30 individuals in this group. How can I eliminate any one?" This brings me to observation two.

I believe your diversity is really higher than reported. I said "on the surface" earlier because I think the problem is in identifying a "species" in our analogy. If all the members in your EdTech population were giving you the same information, competition would have reduced their number before now (my guess is their number is growing). Case in point. Three different species of Anole lizard were observed in a certain tree of a Caribbean island. This couldn't happen because similar species couldn't occupy the same niche for very long without competition favoring one over the other two. Closer inspection revealed that each of them was occupying a very specific part of the tree and feeding on very specific prey in that area. Thus, they were not in competition with each other and were occupying a different role (niche) in the community. I believe closer scrutiny of your EdTech population will really reveal very distinct "species" exist within this group.

Biologists identify species using a key based on a dichotomy (dichotomous key). An organism is assessed as having a described character, which places it into one group or lacking that character which places it in another group. A new character is describe an the assessment continues in branching fashion until the "species" is identified (keyed out) by the set of accumulated characters. I've begun an attempt at this on a wiki but this is a developing idea much like the issue of "tagging". It will take time. One thing that might help is for people to give as much information in their profiles as they can comfortably give.

Of course, most of this is hypothetical and may be based on untested assumptions, but, if networks are going to be an important part of how we use the technology, then I think some metrics need to be established for assessing them.

Thanks again, Darren for the conversation.

Saturday, April 12, 2008

Network Analysis a la Drape's Takes

Drape's Takes: Twitter Set Theory & The Wisdom of the Group

Related to my previous post, here is another view of network analysis with specific reference to Twitter networks.