From their early days at MIT, and even before, Emma Liu ’22, MNG ’22, Yo-whan “John” Kim ’22, MNG ’22 and Clemente Ocejo ’21, MNG ’22 knew they wanted to perform computational research and explore artificial intelligence and machine learning. “Since high school, I have been in deep learning and have been involved in projects,” says Kim, who attended a Research Science Institute (RSI) summer program at MIT and Harvard University and has continued to work on recognition. of actions in videos using Microsoft’s Kinect.
As students in the Department of Electrical and Computer Engineering who recently graduated from the Master of Engineering (MEng) thesis program, Liu, Kim, and Ocejo have developed the skills to help lead application-centric projects. Working with the MIT-IBM Watson AI Lab, they improved the classification of text with limited labeled data and designed machine learning models for better long-term forecasting for product purchases. For Kim, “it was a very smooth transition and … a great opportunity for me to continue working in the field of deep learning and computer vision in the MIT-IBM Watson artificial intelligence lab.”
Modeling video
Working with researchers from academia and industry, Kim designed, trained, and tested a deep learning model for recognizing actions across domains – in this case, video. Her team specifically targeted the use of synthetic data from generated videos for training and performed prediction and inference on real data, which is composed of several action classes. They wanted to see how pre-workout models on synthetic videos, especially simulations or actions generated by game engines, humans or humanoids, accumulate into real data: publicly available videos extracted from the Internet.
The reason for this research, says Kim, is that actual videos can have problems, including bias of representation, copyright, and / or ethical or personal sensitivity, for example, videos of a car hitting people would be difficult to collect or l use of faces, real addresses or license plates without consent. Kim is experimenting with 2D, 2.5D and 3D video models, with the goal of creating a domain-specific video dataset or even a large synthetic video dataset that can be used for some transfer domains, where data is lacking. For example, for applications in the construction industry, this could include performing action recognition on a construction site. “I didn’t expect synthetically generated videos to perform on par with real videos,” she says. “I think this opens up a lot of different roles [for the work] in the future.”
Despite a rough start to the project of collecting and generating data and running many models, Kim says she wouldn’t have done it any other way. “It was amazing how the lab members encouraged me: ‘It’s okay. You will have all the experiments and the fun part coming up. Don’t stress yourself too much ‘”. It was this facility that helped Kim take responsibility for the job. “In the end, they gave me so much support and great ideas that helped me get this project done.”
Data labeling
The scarcity of data was also a theme of Emma Liu’s work. “The main problem is that there is all this data out there in the world, and for many machine learning problems, the data needs to be labeled,” Liu says, “but then you have all this unlabeled data that is available that is not available. we are really taking advantage of.
Liu, with the leadership of his MIT group and IBM, worked to use that data, training semi-supervised text classification models (and combining aspects of them) to add pseudo labels to unlabeled data, based on predictions and probabilities. on which categories each previously unlabeled piece of data fits. “Then the problem is that there has been previous work that has shown that you can’t always trust the odds; in particular, neural networks have been shown to be too secure most of the time, “Liu points out.
Liu and his team addressed this problem by evaluating the accuracy and uncertainty of the models and recalibrated them to improve his self-training framework. The self-training and calibration phase allowed her to have greater confidence in forecasts. This pseudo-tagged data, she claims, could then be added to the real dataset, expanding the dataset; this process could be repeated in a series of iterations.
For Liu, his greatest achievement was not the product, but the process. “I learned a lot about being an independent researcher,” she says. As a student, Liu worked with IBM to develop machine learning methods to reuse drugs already on the market and hone her decision-making skills. After collaborating with academic and industry researchers to gain skills to ask targeted questions, seek experts, digest and present scientific articles for relevant content, and test ideas, Liu and her cohort of MEng students working with the MIT-IBM Watson AI Lab they felt they had confidence in their knowledge, freedom and flexibility to dictate the direction of their research. Taking on this key role, Liu says, “I feel I own my project.”
Demand forecast
Following his tenure at MIT and with the MIT-IBM Watson AI Lab, Clemente Ocejo also gained a sense of mastery, having built a solid foundation in AI techniques and time series methods starting with his MIT Undergraduate Research Opportunities program. Program (UROP), where he met his advisor MEng. “You have to be really proactive in the decision-making process,” says Ocejo, “by vocalizing it [your choices] as a researcher and let people know that this is what you are doing.
Ocejo used his background in traditional time series methods to collaborate with the lab, applying deep learning to better predict product demand forecasts in the medical field. Here you have designed, written and trained a transformer, a specific model of machine learning typically used in natural language processing and has the ability to learn long-term addictions. Ocejo and his team compared goal prediction requests over months, learning about dynamic connections and attention weights between product sales within a product family. They looked at the characteristics of the identifier, regarding the price and amount, as well as the characteristics of the account of who is purchasing the items or services.
“A product does not necessarily have an impact on the forecast made for another product at the time of the forecast. It only affects the parameters during training that lead to that prediction, “says Ocejo.” Instead, we wanted it to impact a little more directly, so we added this layer that creates this connection and draws attention between all products. in our dataset “.
In the long run, in a one-year forecast, the MIT-IBM Watson AI Lab group was able to outperform the current model; most impressively, it did so in the short run (close to a fiscal quarter). Ocejo attributes this to the dynamics of his interdisciplinary team. “Many people in my group weren’t necessarily very experienced in the deep learning aspect, but they had a lot of experience in supply chain management, operations research and optimization, which I don’t have that much experience,” says Ocejo. “They provided a lot of high-level feedback on what to tackle next and … and knowing what the industry sector wanted to see or was trying to improve, so it was very helpful in simplifying my focus.”
For this work, a deluge of data didn’t make a difference for Ocejo and his team, but rather its structure and presentation. Often, large deep learning models require millions upon millions of data points to make meaningful inferences; however, the MIT-IBM Watson AI Lab group has demonstrated that the results and improvements in the technique can be application specific. “It just proves that these models can learn something useful, in the right setup, with the right architecture, without the need for too much data,” says Ocejo. “And then with too much data, it will only get better.”