Your learning workflow as Data Scientist.

 

Hey! I really wanted to do this guide, it’s very promising. I will tell you just the things you need to know to become a Data Scientist. There are a few, but listen, because this can save you a lot of time and put you straightforward to the top. As always, I don’t want to extend introductions so much, so here we go.

Working and Networking: Key concepts

My experience as a Freelancer has opened my eyes. You really need a job. Or a project. Or to do something with what you’ve learned. You don’t really want to stay in courses, getting extra-information, and paying them for staying where you are. I say this because, at least, in DS there are plenty of businesses and websites built around the concept of preparing you for the scene. But there is no scene behind. Even if they promise you an interview, they agree them with companies, and even they agree with companies to hire a couple of persons that they require.

Key concepts are:

  • Working: Nowadays, Data Scientists with some experience are required to get hired. Experience is a must in order to get a job. This happens because the offer has equalized the demand. In the earlier stages, there was so much demand that even mathematicians ,physicians and tech-related careers were applying for jobs in DS. But due to the growing demand on the 21st century, the offer has already grown up to equiparate. And now, the bottleneck is applying to each candidate in each job candidate hunting. Details matter, a lot! Don’t let yourself getting fooled by some company and lose your time and your money. Just get money while you grow up and build an interesting portfolio. That’s the way.

  • Networking: The key of becoming an effective learner on Data Science (and I suspect, in most other topics), is networking. You must go through the network, read posts, contribute, etc. You will never want to do this alone, and you’re not alone at all. In fact, networking is a great way to do contacts, stay informed on new jobs and projects, and keep yourself updated. Never underestimate this.

Communities: Top 5

  1. Reddit: Reddit can be considered the MetaCommunity, this is, the community of communities. It has all you want to know, people are kind and best answers are voted and remain, although bad ones are just considered comments and not getting punished.

    In fact, what makes Reddit unique is it’s discussion soul. When you are a beginner on something and you really want to know, just surf the Reddit communities related, trying to find information from the best topics.

    Example: You need to know about R. You know nothing. You go to r/Rlanguage, click on Top messages, and you’ll find out really quickly an HTML book called R for Data Science, indeed the best book over there to start learning and a reference when you need to know about R in general (although the author refer to more advanced books). You have it!

  2. Stackoverflow: StackOverflow can be considered the Q&A (Questions and Answers) community for programmers by default/standard. Not only all the posts are forced and rewarded to be clearly expressed, but also all the answers have to be answering the whole thing. There is a very good system to penalyze people for posting semi-answers, link-answers, harassment, etc. Also, growing up in reputation is a little bit harder. If you don’t know the complete answer to something, it’s better not to post, because you’ll likely receive downvotes and bans.

    Use the meta-searcher to search through your topics and quickly find interesting questions to help you out. The are a number of options you can use as filter in the search bar, like [], user: answers: score: and much more.

  3. StackExchange: As I said StackOverflow was top of the programming environment, it’s popularity raised too much due to their system and ease of use. This maid StackOverflow leaders to think about growing up the community and extend to the rest of topics, leaving StackOverflow.com as an option to “Only-programmers”. This way, StackExchange was born. This combines the generality of topics like in “Reddit”, but also keeps safe the “StackOverflow” behaviour of Q&A, search bar, reward/punish system and a integration with “StackOverflow.com”. To be honest, I didn’t search much through it, but I recognise this is quite useful to split topics and make you search efficiently.

  4. GitHub: Ok, you can turn down your computer, go take a glass of water, then turn back. What you have read until now it’s nothing. You just go to GitHub and start over there. GitHub is a must between programmers. I will list the advantages of knowing GitHub and the control version program associated, git:

    • Control version to your projects.
    • Infinite private repositories -> Filesystem storage, free.
    • Free to use.
    • Largest community and standard to communicate, share projects, find other projects and software (even of big companies), and fork projects.
    • Store scripts and pieces of information in your Gists.
    • Star and follow people and projects, you have the reference to come back in the future.
    • Tracking of issues, development, …,aided with the community support.

    Probably, everything you need is hosted in a public GitHub repository. There are many books, resources, lists, programs, enhancements, topics, users…

    Just use GitHub.

  5. Google: As every good tech. person says, “Google is your best friend”, or at least “Google is your friend, not your enemy”. Well, the problem with Google is just that it is so “f**” big, but it manages to make things happen to you. What you find in the top 10 pages from a Google search, is 80\% of times what you need to find. And if you find difficult to find something, you can personalize your queries with keywords (i.e. “allinurl:, type:, insite:”, etc.)

    Another big aspect of Google that people tend to underestimate, is it’s versatility. Data Scientists must know that Google is a big big company that doesn’t provide only a searcher through the web. It provides much more value.

    • Google Colaboratory: A “Jupyter Notebooks” site that allows you free GPU/TPU usage, as well as sharing notebooks and storing them in Google Drive.
    • Google Scholar: A personal favorite of mine, Google Scholar is a great way to keep up to date with academic research and progress in science and technology. Easily and quickly search across masses of scholarly literature from one place.
    • Google Drive: Ok, we just had GitHub for this storage purpose, but “Google Drive” is much more visual and straightforward to manage, and intends to have full Google integration with other features.
    • Google Maps: Relating to services, geolocation, recommendations… Google Maps is absolutely amazing and a unique feature that only Google can develop. Just crazy stuff over there. Like many other things, it has API clients to allow you having the service for your programs/websites.
    • Google Cloud Platform: A Cloud Platform that allows you to buy services at a good reasonable price. You can deploy kubernetes clusters, VMs, networks, start your applications, and much more stuff…
    • Google Mail: Yes, Gmail is a very good e-mail provider. Good integration with many other things, it can allocate apps (“Zoom, Hangouts, Calendar”) to make things easier for you. It also provides API to store and backup your emails or whatever… Thanks Google.
    • Google Chrome: A minimalist web browser. Personally I had tuned up Firefox, but Google Chrome will always have the superGoogle company behind, and you never regret calling superGoogle in your rescue.
    • Google Play Services: Just APPs for everyone in Android platforms and terminals. Your smartphones gets more smart.
    • YouTube: Google bought it. Nothing more to say. Look for tutorials, presentations, but don’t lose time on it!

    And much more: List of Google products.

My final Advice

In descending order:

  1. Abstract everything. Try to focus just on your job and what you are doing. There are plenty of programs and options out there trying to abstract tech from execution. If something get’s complicated, try an easier abstracted solution that works for you.
  2. Don’t reinvent the wheel. Somewhere out there has found a solution and posted it in some website or community. Google has done it for you, surely. Also, check GitHub and Reddit for this purpose.
  3. Don’t be greedy. Trying to understand everything and/or pick the best option is just a pain in the ass. Since we are humans, we don’t understand everything and we can’t communicate things efectively. Don’t build the final mega-cluster with everything included in it. It simply doesn’t work and if you even achieve that, it will be created by someone else, put on a website, lately adquired by Google and served for free.
  4. Just search what you need, use what you must. Keep things simple. Use your master in searching and symbolic linking.
  5. Respect communities. People has spend a lot of time for you, and more. Respect and listen communities users on topics and experience. This guide is an example of it. Respect it.
  6. Be official. Instead of searching for everything in web posts, follow point 3 and try to understand official documentation and how to use it. It will provide an improved way of searching and overview for things, along with keeping versioning diffs. If you need more details or understand something, I suggest going for “Reddit/StackOverflow/Youtube” sites. Transparences are just so useful to transmit concepts. Talks are useful to get a general idea of how things work.
  7. Be a human. What computers need, it is a human. It is good to be humans and sense by humans. Computers are stupid machines, even with ML/DL. They do what they are told to do, but they do what they do extremely well. The last thing you want is becoming a half-human or a semi-machine, because that way you are not providing the machines the kind of support they need in order to obtain the maximum potential.

Best of luck!