Data science, big data, machine learning and deep learning - Buzzwords or true reality? Are scientists really able to predict what you have underneath your bedside table?
Nowadays, it is popular to claim that with enough personal data, scientists are able to understand and reveal every detail about your personality using modern machine learning techniques.
> “Wow! This seems like some kind of alien technology! Is that true?” Maybe
Predominantly, it depends on the quantity and quality of personal information available about you. Quantity of course matters, but what about the quality of the information? This is where the real challenge lies. Quality is simply the connection between information and prediction: the tighter the connection, the more accurate the prediction will be.
> “To predict what I currently have underneath my bedside table, is it important to know the exact temperature in Dubai?” Nope
> “And my age and gender?” Maybe. It’s not extremely important, but it’s a valuable piece of information to have
> “Do you need my exact age, or just a rough estimation?” The better the quality of data, the better the quality of the result
That’s a simplistic definition of data science: it exploits many bits of information (or features), to create a reliable prediction. Every bit of information can be directly connected to the prediction outcome, or used to simply support the outcome.
> “Eventually, will scientists really be able to predict what I have underneath my bedside table?” If they have good information about you, they’ll be able to do that with a good degree of accuracy
So, the bits of information I’d use are:
- Your social network posts. Did you lose anything recently? Perhaps you vented about it online? Have you recently bought something that can fit underneath your bedside table?
- Your demographic information. Age, gender and location are important
- Your interests and affinities (courtesy of the Intent HQ Topic Graph). What is your favorite sport? And your favourite kind of music? Do you like jewellery?
Now, how do we aggregate this data and create a model? Well, that’s the data science secret sauce! It’s not an easy answer, and furthermore the solution is not unique (a combination of stats and math; a mixed blessing).
I recently spent some time writing a book all about data science, outlining the essential parts of this complex subject. It’s very practical, so the reader will have a complete hands-on experience of the essence of data science.
It’s organised in 6 chapters … the key points of data science:
- First steps: how to set up your computer
- Data munging: how to download and handle data
- The data science pipeline: how to prepare your dataset for modelling
- Machine learning: how to create the model
- Social network analysis: how to analyse a social network
- Visualisation: how to visualise the key information of the model
> “Ok ok, but what do I actually have underneath my bedside table?” Surely dust :) But, give me more data and I’ll be more accurate