I gave a talk last week at the University of Luzon on Big Data Analytics upon the invitation of Dean Dads Caronongan. I used this opportunity to conduct a social experiment. I asked the students to fill out a short survey. The students were to detail their top five SMS buddies and top five voice call buddies.
Using Gephi (a network graph visualization tool) with lots of ETL (extact, transform, load) help from Ms Rina and student assistants Jolo Cepeda and Jan Nel, I was able to generate a social network graph of the UL CS/IT student body which is shown below:
The student body is shown with different communities (clicques/barkadas) that are highlighted in different colors with the larger circles representing individuals that are more ‘central’ members of each of the communities. The lines among the individuals represent the SMS or voice links among members.
This social network graph can help explain how an idea/lesson/tsimis can be spread throughout the student body. You can disseminate information more effectively by specifically targeting message to the individuals that are more “central” (Eigenvector, Betweenness and Degree) to the community.
Since these individuals are “central”, they most likely are the leaders and key influencers/ trendsetters of the community. They are more connected and can easily relay information to the rest of their friends in the social network.
If you want to learn more, check out Lada Adamic’s Coursera offerings at https://www.coursera.org/instructor/ladamic
Interesting as this is, I wondered if we can also use the meta data of the survey to gain additional insights about the students? Meta data refers to data that gives information about other data. These are the parts of the survey that are typically ‘thrown away’ — like the time the survey was taken, the IP (Internet Protocol) address, how long it took to complete the survey and the browser type etc.
Apparently there might be something there.
By tracking the time elapsed between the “Announcement” and the time the students actually took the survey, we can get a proxy measure for technology adoption profiles. The lower left graph shows the days elapsed from “Announcement” and the number of students that took the survey on each of those time periods.
As expected, there was a general bell curve pattern that corresponds to the Technology adoption curve on the Upper left side. Fourth year (graduating) students and 2nd year students were among the largest groups in the ‘Innovators’ stage.
We also know that a majority of the respondents took no more than 12 minutes to complete the survey (see upper right histogram). Then using the IP address information, we can also track the network providers and infer the location of the students as they did the survey. I hope these are important insights for data analyst and marketing professionals.
P.S. My kids tell me that they find the graphs ‘intimidating’. And yes, I agree they are, but do take the time to understand them and you will see that they pack a lot of information in them. Humans can recognize patterns 60x better than looking at numbers. I’m still struggling between brevity and clarity. If you have ideas, please do post a comment here.