The Science of Data: Cleaning Out Noise

Jan. 21, 2019, 12:57 a.m.



When I first began the data science tutorials on Udemy, I felt excited to be getting into this "cutting edge" technology of the future. After a few hours, I felt much less enthusiastic. I plodded through the videos, feeling I had one foot in and one out. My lack of advanced statistical knowledge meant I was mostly testing out these Python libraries with sample data. This field looked so vast and complex, and I understood probably less than 50% of the lingo being thrown at me.

Luckily my work experience in accounting and working with excel made some of the data cleaning and visualization exercises feel less daunting. These 2 Data Science/Machine Learning courses are still a work-in-progress for me as I moved on to focus on other aspects of coding for the time being. I want to go through them again in the future with more understanding. The instructors, Frank Kane and Jose Portilla sound like they know the ins and outs of not only Python but advanced probability as well. This combined with Nassim Nicholas Taleb's obsession with probability and new-found enthusiasm for machine learning and data science have not made me give up hope on trying to understand this field just yet.

Now the data structures and algorithms courses and books were just what I needed. Finally the missing link. The method to the madness! While I checked my university's computer science program as a guide to see how I could structure my learning to capture the key parts of the field, I noticed rather than "Intro to Java" or "PHP for Dummies 101" the most general courses were on data structures, and algorithms. In fact they had Part I-II for both. That didn't seem like an accident.

I dived into the Data Structures and Algorithms courses and found supplemental books and youtube videos to work through the structures in Python specifically. I was understanding the overall logic of how and why these structures were used, but it did take some repetition for the details to become more clear. Even now I have to keep practicing and reviewing these structures. My biggest worry is the dreaded whiteboard test in implementing these structures or some center curve-ball implementation question. But that's all part of the prep.

The conflicting reports muddy the water further.

"Just apply for jobs, you don't need to know everything!"
"No, man you need 200 hours on leetcode and be able to code any common algorithm in your sleep."
"Dude just build an Instagram clone and put it on Github. Boom. Passion = Profit."

I'm sure the truth is buried somewhere in those statements. I won't know until I see for myself.

Learning the ins and outs of Python opened the door to be able to dabble in machine learning, data science, and most importantly for me now: data structures and algorithms - the engines of the code. I am still in the process of reviewing my data structures and algorithms. What will fate bring during the interview? Will I be asked about linked lists or thrown to the wolves with Dijkstra's Ternary Search Acyclic Tree Tour?

At this point I saw a giant gap in my web development knowledge that needed filling. The self-learning continues...