Treasure Hunting

 

Source

Exploring Data Science and all the related technologies using The Web can be quite the quest these days, especially when you have a 9-5 job, kids and a household to run. I mean, just trying to find out where to start is already an infinite universe. Another challenge is to find a reputable source. There are so many unreliable sources and sites that are hard to navigate, this can sometimes be discouraging. Something that I found very useful when looking for advice or topics being discussed from like-minded people, is a blog. With the number of blogs available one can come across when looking for Data Science related blogs are mind blowing. Like I said, not all blogs are to be trusted and some can be very deceiving.

To start my career path in Data Science, I did some intensive research on what is required of a Data Scientist. I researched many different job requirements, reading articles that give tips on how to become a Data Scientist from various authors in different fields and by looking at well-known companies to understand their goals. It was clear to me that a BSc degree was the foundation of landing my dream job. Hence, I enrolled for a BSc degree in IT Data Science. It was during the first weeks of attending my online classes that I discovered just how amazing this field is. I was inspired to apply my knowledge for the greater good.

In my current role within Supply Chain, I identified certain processes where I could use AI and Machine Learning for work that is currently being done manually. Algorithms that can be used for these processes would ensure that tasks that are currently done manually can be done automatically. In return this can mitigate the risk of human error and can be more time efficient. By planning for upcoming projects using historic data, it would give a company the ability to effectively and efficiently plan for any future endeavors and cost savings.

 

Explaining AI:

I recently found a blog page called GPTech which gives a breakdown on valuable information about AI to newbies and more advanced users. Seeing that I am a brand-new veteran in this field, I find blogs like these very beneficial.

 

Source
 

In the blog Five Diagrams to Understand AI by Mary Newhauser, she has put together a comprehensive post providing diagrams to better understand AI. These diagrams are referenced from several other sources. As explained by Mary, there are so much information across the web and being able to fully understand all the terms and meeting the expectations surrounding the use of AI can be overwhelming. Newhauser (2023: para. 2)

AI Terminology and Large Language Models (LLM)

In her blog Newhauser refers to a blog post by Tobias Zwingmann. Zwingmann provides an overview of important AI terminology that can be used when collaborating with other participants. AI terminology according to Zwingmann can be very confusing. This is mostly because of the complex and evolving nature of AI and how it is used differently across industries. Zwingmann (2023; para. 5) The topics discussed in this blog cover the basic understanding of the terms AI – Strong and Narrow AI, Machine Learning, Deep Learning using Artificial Neural Networks (ANN), Generative AI (Transformers), Large Language Models (LLM) and more.

The video from Andrej Karpathy provided under topic 8. GPT-4 was something I found to be particularly interesting as it provides an overview of how the LLM mentioned are trained to perform.

The video is divided into two sections. The first section explains how they train GPT Assistants; the second section explains how to use these models effectively. In the first section Andrej provides an overview of the recipe that is used to train the model. The recipe is further divided into four stages. These stages consist of Pre-training, Supervised Finetuning, Reward Modelling and Reinforcement Learning. In the Pre-training “part” of the process, large amounts of data are gathered from sources like CommonCrawl, C4, GitHub, Wikipedia, Books and more. He explains how these models are trained over a period of approximately 21 days to provide more accurate and specific feedback required by the user. Andrej explains the difference between base models and assistants, where base models don't want to answer questions but simply want to complete documents. They can however be tricked to be assistants and answer your questions if the user is specific about their prompts. He explains how to use LLMS optimally to achieve best results and provides guidance on cases to use LLM’s. Andrej reminds users that these models still have flaws. LLM’s can be biased, hallucinate information, have reasoning errors and they are vulnerable to attacks. (Karpathy, 2023)

 

 

Another topic from Neuwhausers’ blog which I found to be very informative was on how to get the most out of LLM’s. Newhauser (2023: para. 8). This topic revers to a post from Ben Lorica: Maximizing the Potential of Large Language Models. Lorica (2023) uses this diagram to explain how teams use pretrained LLM’s and manipulate or configure it to provide the desired outcome. He further goes to say that to be able to select the appropriate model the teams need to assess the compatibility of the models license with their requirements as well as the quantity of domain-specific data.


Source


My Thoughts

The topics covered in this post are merely a drop of water in the ocean. However, I found that the information provided therein really helped me to improve my understanding on these subjects even though I do not have any programming background. I can honestly say that the highlight from these articles is that AI Technologies are an ongoing learning experience. That is why I felt compelled to share the pot of gold I discovered when I went treasure hunting. If you are like me, interested in a career path in Data Science, you should be prepared to learn and grow continuously.  One can never assume you know it all. I would really suggest reading some of the work done by these bloggers as they provide great insight and encourage their readers to explore various sources.

(Source list)

I would love to hear your thoughts, perhaps you have also discovered a pot of gold that you would like to share with me in the comments 😊



Comments