Dreaming of Data Science
Note: Not edited for brevity. For this post is about the journey, and like journeys, it meanders and wanders.
Dreams and Data Science
I am in pursuit of a dream. That dream is data science. I’ll share a bit of the current state of the pursuit, but first I’d like to talk about the dream.
Thinking about dreams, literally, the kind you have when you are asleep. Some dreams are more vivid than others. Some can be recalled very precisely when awake, and others not at all. The people in the dreams can be specific real life individuals or an amalgamation of a few, or can change from one to another. Same thing for the setting, it can be based on something real, or fake, and even shift from one moment to the next. All this to say, dreams can be vague and specific, static and fluid. Simultaneously opposite qualities, as is most of life, dreams (literal and figurative), data science, and my dream of data science.
Now a bit about data science. It’s not dissimilar from actuarial science or athletics. Broad categories that can mean many things. There are many different types of data scientists, actuaries, and athletes. As can be said about any category or label, it can be further defined and specified. It can tell you something, but not everything. Thus, it is simultaneously vague and specific.
A data scientist works with data and is a practitioner of science. Using scientific rigor, a data scientist seeks to understand, analyze, and use the data to develop insights, predictions, and at the extreme - scientific laws. (From how gravity works to how people behave to how games are played and won.) There are a vast many fields and applications - specialties, if you will - in the data science world. Like a dream, specific and yet vague.
Another thing about dreams - Literal dreams can be repeated or they can continue and even build from one thing to the next. So any one dream can be part of a larger dream, built on a sequence of (sometimes non-sequential) dreams. Figurative dreams are typically expressed as a result. Such as dreaming of owning a home. But the difference between a wish and a dream is that a wish is only the result that happens, and a dream is something that must be worked towards achieving — it can be made to happen. In essence, the pursuit of a dream is really part of the dream, because without the pursuit, a dream is just a wish. Therefore, dreams can be made of smaller dreams and smaller parts.
I once had a dream to study data science. This much can be said to have been accomplished through an intensive data science bootcamp.
The Data Science Bootcamp
So what’s a data science bootcamp, when data science is specific and yet vague?
Well, a bootcamp is a new-ish form of education in which the entire program is condensed into a few short months. (Term borrowed from the more established military bootcamps, defined as a short, intensive, and rigorous course of training.) Bootcamps became especially in vogue for coding where software developers can quickly learn the basics of the trade, rather than study computer science at university. The success of these programs really comes with new industries where formal secondary education (e.g. universities) are slower to adopt contemporary curriculums.
The data science bootcamp is focused on curriculum for the data science industry. Data Science requires a combination of computer programming, statistics, and some specific domain knowledge of the data. Practically it meant learning programming languages (Python, SQL) and other software[1]; machine learning and statistical techniques (regression, classification, unsupervised learning, deep learning); and exploring various specific domains such as Natural Language Processing.
The bootcamp I attended was Metis, which structured their curriculum around five projects. Their intention is to “combine traditional instruction in theory and technique with a real-world project-based approach.” The program is appropriately named after a Greek goddess of wisdom, Metis. And perhaps best description of the word metis, comes from William Eamon in his book “Science and the Secrets of Nature”:
“The Greeks called this type of knowledge metis, by which they meant the kind of practical intelligence based upon an acquired skill, experience, subtle wit, and quick judgment: in short, cunning. Metis, or cunning intelligence, was entirely different from philosophical knowledge. It applied in transient, shifting, and ambiguous situations that did not lend themselves to precise measurement or rigorous logic.”
I am very proud to have completed the bootcamp, and picked up some Data Science metis. This foundation is key to delve into the ever-evolving data science industry that is continually exploding with new techniques, software, and applications.
Frustrations at Bootcamp
I do have to admit - this bootcamp was one of the hardest things that I have ever done. There were so many moments of frustration, stress, and depression.
An initial source of frustration was the machine learning vs. statistics perspective[2]. (Long story short: machine learning = predict the right thing; statistics = understand why). Having studied statistics in college, my natural leaning is as a statistician. However, this clashed with what I sensed from the bootcamp. The nature of the work, framing of the problems, and general attitudes were closer to machine learning. Once I finally recognized this, I was able to accept it and move on.
The biggest source of frustration was the fast pace and broad emphasis of the bootcamp. It pushed through materially quickly and broadly. The lectures served more as an introduction to the topics. And with the speed at which we moved from one to another, it was hard to go truly in depth and learn the material beyond what was necessary for implementation.
The quick turnaround in producing projects while learning the material induced a great deal of stress. It was a constant struggle between learning and doing. Time I spent learning more about what I was doing and how to do it, was time I was not spending doing the work. And the timelines were extremely aggressive.
Exacerbating the difficulty in learning the machine learning models, was my lack of programming skills. I spent most of my time fumbling around trying to figure out how to code or use certain software, rather than understanding the data, models, or developing insights and presentations.
All those things broke me down. Over and over. I stopped exercising, I binge-ate all the food, I binge-watched all the television. I would be moody. I was all encompassed, and for much of the time I was not the most pleasant person to be around. At one point, I was debating whether or not to drop out...
And what made it even harder was the fact that it was all remote (due to covid). So it was much harder to make those bonds and commiserate with fellow students. Not that we didn’t support each other and help each other out. We did. It just was not as in depth as it would have been otherwise.
Why I Would Do It All Over Again
Yet - I would do it all over again. Once I accepted the bootcamp for what it was, and put aside my preconceived notions of how education should be, I truly appreciated the bootcamp for what it is and for the growth it pushed in me. Looking back, the growth achieved in just three short months was incredible.
First, it was necessary to see and feel the culture differences of machine learning and statistics. To understand that data science has room for both. And I could still learn data science skills while approaching things how I saw fit.
Truly, the goal of the project based, intensive bootcamp was to learn by doing. And I certainly learned more in a condensed time than I possibly could have otherwise. I learned about the machine learning models and all the basic data science skills. But more than that, I was able to practice taking a project from end-to-end. Formulating the problem; collecting, cleaning, storing the data; exploring the data, and building models, performing analysis and making visualizations; drawing conclusions, making presentations, and producing a webapp.
I also learned how to learn. This field is quickly changing, and needing to have the grit to figure things out and learn things on my own is crucial. This bootcamp forced me to learn how to seek information and learn for myself. This combined with asking for help, is how I would make progress.
But perhaps the most valuable meta skill was understanding scoping and time management for a data science project. Understanding the process, having a feel for how long each step could take, when to reduce scope to move on... There is really only one way to develop this understanding — experience. The bootcamp provided that in spades.
Another big thing is that I have a much better sense for the field of “data science”. Which is hard because it is vague, and ambiguous and means different things to different people. I now have a sense for what those different things can mean. And much of “data science” has been demystified for me.
I wish I could go back in time to tell my past self how to prepare. I’d have told him to get much better at the programming in Python beforehand, to really practice and understand Git, and to read more about the data science field and what specific problems were being tackled. I would share what challenges and frustrations laid ahead.
But you know what, I don’t regret anything. I got so much out of the bootcamp. And even if I wish I was better prepared back then, I am now this much more capable and knowledgeable. I can continue to learn and work on data science projects. And hey, I now have four personal projects completed. That’s huge!
What’s next
I would say that I accomplished my dream of studying data science. And I would take it a step further to say that I accomplished my dream of being a data scientist.
But here’s another wrinkle to this tortured metaphor of dreams. What does it mean, when the dream is a state of being? To be a data scientist in a field that is continually advancing requires continuous growth and learning and practicing of data science. There’s still so much more for me to learn, not only to catch up (whatever that means), but to keep up when the field is still advancing. In other words, being a data scientist is not just a destination, but is the journey.
Here’s my commitment to the journey. I will keep on learning and will pursue different avenues of learning (structured, self-study, projects, employment). A few of my first next steps: Publish a data science website to showcase my portfolio; finish the fast.ai deep learning course; find a job where I can put my skills to use.
This is a dream that I will keep on catching.
BRB… job hunting and sciencing the data!
---
[1] Just a ragtag list, note for myself, not all encompassing: Git, GitHub, SQL, pgAdmin 4, Google Cloud Platform, CLI, Heroku, Python (scikit learn, numpy, pandas, pyTorch, fast.ai, BeautifulSoup, matplotlib, seaborn), Tableau, etc…
[2] The single article that helped me reconcile my feelings was: https://www.svds.com/machine-learning-vs-statistics/ For pedantic clarification, I’m sure there are data scientists who align more with the statistician perspective. Though I do find the distinction and discussion of the differences helpful for those who feel a culture clash.