Can Machine Learning Play a Role in Atomic Force Microscopy?
March 5, 2020
Presented by Dalia Yablon, Ph.D., Founder, SurfaceChar, LLC and Ishita Chakraborty, Ph.D., Data Scientist, Stress Engineering Services
This webinar will explore the avenues in which machine learning (ML) can improve operation and analysis in atomic force microscopy (AFM). We begin with a brief overview of what ML is, then explore three application areas involving specifically AFM: image recognition, particle analysis and autonomous operation.
Charles Zona: Okay, I think we’re ready to get started. Hello, and welcome to another McCrone Group webinar. My name is Charles Zona, and today we are happy to welcome back Dalia Yablon along with Ishita Chakraborty. Dahlia and Ishita’s presentation today will answer the question: Can machine learning play a role in atomic force microscopy?
Before we get started, I would like to give you a bit of their backgrounds. Ishita holds BS, MS and PhD degrees in mechanical engineering. She has research and consulting experience spanning broad fields of data analytics, machine learning, material characterization with atomic force microscopy, and structural vibrations. She works as a consultant at Stress Engineering Services, Inc. in Houston and in her current role. She provides solutions to a variety of industrial problems through machine learning as well as physics-based modeling.
Dalia is the founder of SurfaceChar, an AFM consulting firm located in the Boston area. SurfaceChar specializes in surface and interface characterization measurement, along with education and training focusing on scanning probe microscopy. Prior to SurfaceChar, Dalia spent a good portion of her career at ExxonMobil research and engineering, working with the chemicals division to develop new AFM-based imaging methods. She has worked in a number of cross-sector areas including polymers, tribology, lubrication, corrosion, and unconventional gas resources, with a specific focus on developing new nanomechanical characterization methods. Dalia is also the editor of the book Scanning Probe Microscopy in Industrial Applications published in 2013 by Wiley Publishing.
Ishita and Dalia will field questions from the audience immediately following today’s presentation. This webinar is being recorded and will be available on The McCrone Group website under the webinars tab. And now, I will hand the program over to Ishita and Dahlia.
Ishita Chakraborty (IC): So this is the outline of our talk today. I’ll start with a brief introduction to machine learning. This is not meant to be comprehensive, but I’ll give some examples, some terminology and some of the overview of the methods that we have used here. And after that, Dalia will go to the application of machine learning for a few use cases with AFMs, then we’ll conclude with the summary.
A lot of terms float around when it comes to machine learning: AI—or artificial intelligence, machine learning, deep learning—these are one of the most commonly used terms. AI, or artificial intelligence, is the broad umbrella of which machine learning is a subtopic. Machine learning gives the computer the ability to learn without being explicitly programmed. So basically, you feed the computer a lot of data, and the computer learns from the data. There can be a number of machine learning models. Neural network is a type of machine learning model. A neural network can become very deep with multiple hidden layers, and that’s when we start calling it deep learning. Deep learning is basically a type of machine learning model, and deep learning is very popular for image classification and a lot of other problems, which have a huge amount of data and a lot of computing power.
Machine learning can be broadly divided into supervised machine learning and unsupervised machine learning. Supervised machine learning has data that has a label to it; unsupervised machine learning doesn’t have labeled data. So again, supervised machine learning can be of two broad categories: one is regression type, and the other is classification type.
In regression models, we generally predict a value. For example, if we are trying to predict the temperature of a day, that will be a regression-type problem. And another very popular type of supervised machine learning is classification problems. Classification problems are models where we are trying to predict a class. For example, we have a bunch of images and we are trying to predict whether this image is a dog picture or a cat picture; that will be the classification type of problem.
Unsupervised machine learning doesn’t have labeled data, so we do not have to predict any label. It can be used for grouping some data, or finding an anomaly in the data—stuff like that. In this presentation, we will focus on supervised machine learning.
Moving forward, I’ll introduce a few more terms: training and test are two terms that you are going to hear very often in this presentation today. The data that you have will be randomly divided in training and test data. Training data will be the bigger portion of the data, like 75 to 80 percent of the data, and the model will be fed this training data set and the test data will be kept separate. That’s the blind holdout data that the model will be tested on later, after the training is complete. It is like when you do a course; the questions on the exam will not be the same as the questions that you practiced on. So the test data basically is a test for your model performance.
I will give an example of a data set in the context of AFM. On the right, there is an image, an AFM image, of fen poly propylene and ethylene propylene rubber copolymer blend. So any point in this image can be thought as a data point. So for example, if we say there is—if we take a scan point here, and it falls in the polypropylene domain, it will have an associated amplitude and a phase, and we can treat this amplitude and phase value as the input, and the output label will be polypropylene, here. Once we have a data set—just remember, all these values I’m showing here are not real; I just made it up for illustration purposes—once we have the data set, we feed that labeled input data to the computer and the computer gives us the machine learning model.
So going back to the steps of building a machine learning model, you have the full data set and then you randomly divide the data set into training data, and then you keep the test data as a holdout data. Then you feed the training data in the computer and you get a model out. So after you build the model, you need to figure out how good the model is, so to test the model, you feed in the inputs of the test data, and then you get the labels out. And you can compare the labels predicted by your machine learning model to the actual labels of the test data. And your model’s performance will be evaluated by its performance in both the training data and the test data.
An important attribute of a good machine learning model is not overfitting or underfitting the data. I’ll take an example here where the y-axis is the output of an input that’s the x-axis. These blue dots are the training data, and if I fit a very complex high-order regression model, my model will look like this wiggly red line here. And then, when the test data comes in, which are these green dots, this very high-order regression model is not able to capture the test data or predict the test data correctly at all.
So the optimum machine learning model is shown by the green line here, which has similar performance on both the training and the test data. In the machine learning framework, there are various ways to reduce overfitting and introducing regularization in the model. We will touch through some of those in our later examples.
Later in the presentation, we are going to talk about machine learning models around image classification, so I want to introduce how an image can be thought about as a data point. For example, on the left you can see a dog picture with a 64 x 64 pixel size. This picture actually is an array of red, green and blue channels. It has three channels that have 64 x 64 matrices, to it. So we can take all these three channels and flatten it and put it into one single array. This blue part here comes from this blue channel, and this green part here comes from the green channel, and the red part comes from the red channel. So in this input array, we will end up with around 12,000 points, and the output will be 1 or 0 depending on whether the picture is dog or not dog. We can think of an image as an array of a data.
Here, I will introduce a machine learning model, logistic regression, as a precursor to neural networks. After we have extracted and flattened the area of pixel values from the dog picture, we transform it linearly to this value Z by the parameter matrices W and b. This Z is then mapped to the final output ŷ (y-hat), which is the probability of the picture between to be a dog or not a dog. Z is mapped to a probability between 0 and 1 by this nonlinear activation function. The training process ensures an optimum value of W and b, so that a dog picture will have an output ŷ as one, and if it’s not a dog picture, it will have an output ŷ value of zero.
A lot can be said about neural networks. Here, I’ll only give a brief overview comparing it to the previous example of logistic regression. Neural networks have a number of hidden layers with linear and nonlinear combinations. After it goes through all the hidden layers, towards the end, the functionality is very similar to the previously shown logistic regression example. So here again, from a linear function we map it to a probability, which in turn becomes a binary output ŷ, which for a binary classification problem can take a value of either 0 or 1. The hidden layers have two functions as I mentioned before, so it can have a linear function and then it goes to a nonlinear activation function.
There can be a number of hidden layers. Here, I have only shown two hidden layers, and there can be whole lot of possible combinations of hidden units per hidden layer, so you can see that a neural network can be very powerful, but it can also be computationally very expensive.
Later in the presentation, we will talk about convolutional neural networks, or CNNs. CNNs are a very popular deep learning model used for applications where a large amount of data set is present and it also requires a large computing power. It is popularly used for image recognition. In a CNN, instead of sending the first input that we have from an image directly to the neural network, we make it go through a number of convolutional layers. So what the convolutional layers does, is that it extracts the important features of an image, like an edge or feature boundaries, and then those important features are stored in arrays that are different in dimensions. When the final array goes to the neural network layers—here, we call that interconnected layer—it has reduced in dimension and has all the important features. So I will not go into any more detail than this. We will see an example of using and then for image classification later in the presentation.
After a model is trained, we will evaluate the performance of the model on the training set and also on the test set. One popular evaluation parameter that we are going to use in our examples is confusion matrix. In a confusion matrix, we are putting in the actual labels as the columns, and the predicted labels as the rules. For example, if we have built a dog classifier, and we are testing, say, around 100 images on that classifier, we want most of the images to lie in the diagonal terms of the confusion matrix. The non-diagonal terms are the number of images that are misclassified. A good model should have most of the images, or most of the data examples, classified on its diagonal terms.
Dalia Yablon (DY): Thank you so much, Ishita, for that excellent overview of just some basic machine learning concepts, and now the second part of this webinar is going to focus on how can we use some of these machine learning models and tools that Ishita just explained for scanning probe microscopy.
Probably the most natural area that one could think of to apply machine learning would be in areas of image processing and image analysis. And, in fact, that is our first application that we’re going to describe. It’s an application of image recognition, which is very common in machine learning. Ishita just went over her dog versus non-dog example, and that’s a classic example of image recognition. And our question was, can we train the computer using these models and algorithms to recognize AFM images depending on what material they are? And so our test case that we started with was two different kinds of polymer blends, images of two different kinds of polymer blends. Our first blend we’re calling Blend A, and this is a 20 micron by 20 micron image of Blend A, and you can see that it has a continuous phase or matrix in the background that’s dark red, and it’s been impregnated with these smaller domains, highlighted in white, of a different material. And these, by the way, are phase images, and all the images on this slide are AFM phase images collected in tapping mode. So, this is a 20 micron by 20 micron image of Blend A, and this is about a 1.5 x 1.5 micron image of same Blend A, where now you can zoom in and you can see that continuous phase in the background (purple), with some of these domains—smaller domains, a few hundred nanometers in size domains, inside—here, they’re yellow. So this is Blend A’s two components. On the right side, here, we have Blend B, which is a very different kind of blend than Blend A. Blend B actually has five different components to it. It involves these little particles that are surrounded by different kinds of materials, and then, also, you’ve got this background here that is yellow. Actually, there are two different kinds of components that go into this background continuous phase. So this is a 5 micron x 5 micron image, phase image, of Blend B. We zoom in to a few hundred nanometers on the side, and we can see we’ve got these little particles in here and they’re in a different phase, and then we’ve got this continuous matrix here. Now by eye, we have no problem differentiating Blend A from Blend B, and that goes to one of the guidelines when you’re trying to figure out where you can use machine learning. One of the guidelines is to use it to accomplish a task that takes us only a few seconds to do. So if it’s a task that takes us a minute, or a couple minutes, that we have to think about and think through, that may not be such a great task for machine learning. But something that only takes us a few seconds, like an image recognition problem, is an ideal test case for this tool.
We wanted to test these algorithms and see, can we train the computer to differentiate Blend A from Blend B. Now, we weren’t going to make it easy for the computer, so we picked images that were very different stand sizes. For example, this 20 micron image and this couple micron image—the images look really different. We also gave different resolutions, from 512 x 512 pixels down to even 128 x 128 pixels, and we even included images of different aspect ratios. So this has a length to width aspect ratio of 1:1, but, for example, this image has a 2:1 aspect ratio, which sometimes we want to do in AFM to speed up the imaging. So we kind of threw all of these kinds of images into the mix with the idea of trying to test, can the computer successfully differentiate them?
Our starting set was only 160 images, which is really, really small by machine learning standards. Typically, your data sets are thousands, or even tens of thousands. We’re starting with the small data set, trying to see if the computer can differentiate Blend A from Blend B when we’re using images of different sizes, different resolutions and different aspect ratios.
One trick in machine learning, because it is very data-intensive, is that when you don’t have enough data, can you augment your data? Can you increase the number of your data points? This is called data augmentation. And so, for example, we can do that with our AFM images; from one AFM image, we can take it and rotate it, or we can flip it along a different axis to create all these different images. So from one image on Blend A, we’ve actually created five images. And we can do the same thing with Blend B—play this game of rotating and flipping the images in order to increase the size of the data set. And so from our starting data set of 160 images, we very easily, now, augmented to 800 images, which is certainly a better data set size to be working with.
Alright, so we have our data, we’ve augmented it, and now how can we use these algorithms to see if we can differentiate between Blend A and Blend B? Now that we have our data set, let’s see how well we can train the computer to differentiate Blend A from Blend B. Our first attempt involved the use of the neural nets that Ishita described before, where we have a number of hidden layers that lead up to a linear function and then a probability where the probability would be defined, for example, that the image is Blend A. And then that maps to the output ŷ where the output is, is this image Blend A, or is it Blend B?
Of our 800 images in our data set, 80% were randomly selected to be the training data, and 20% was the test data. So we use a confusion matrix, also described by Ishita, to understand how well did our neural net do. Here we can see the results. Remember, Ishita said that the off-diagonal matrix elements in a confusion matrix would ideally be zero for a fully accurate model. This model did okay. What it shows, here, is that we had seven images of Blend A that were mistakenly identified as Blend B, and two images of Blend B that were mistakenly identified as Blend A, overall, giving us an accuracy of 94% using this model.
Our next attempt involved the use of the convolutional neural nets, or CNNs, that Ishita also explained earlier, and here, you can see that our confusion matrix did much better. It actually achieved an overall accuracy of 100% where the off-diagonal elements are, in fact, zero, and so we got 100% accuracy. Clearly, this was improved and an excellent result, and this model very easily able, now, to differentiate Blend A versus Blend B, no matter what the stand size and resolution and aspect ratio was.
The second application involves an image analysis application in AFM, and specifically in an analysis of particle identification. This involved, actually, the same images from our Blend B, where we had—actually it has five components in it, but we have this matrix here, and then we have these little black particles that you can hopefully see here. Again, this is the AFM phase image, and they’re embedded in a kind of light purple matrix here, and here it’s a little bit of an orange matrix. And the question is can you isolate or identify these particles? So, for example, in these little blue circles, I’ve identified three particles here. Here in this area we probably have about five or six. The reason you would want to do this is you might want to do analysis on these particles in terms of their morphology, their dimensions, and analyze them.
Again, if we follow our guidelines for machine learning, is this a task that RI could do in a few seconds? And certainly, RI is remarkable at being able to distinguish and differentiate these little particles within this very complicated matrix. But using even traditional particle analysis software, it turns out that’s not such an easy problem; it’s very challenging. So we can use, here in our second application, machine learning for the ultimate customizable image analysis, because we can train the system to actually identify particles—not using some generic algorithms that we have to apply for all kinds of images for particle identification, but train it specifically for this kind of image. And we can do that using the images themselves, and train the system to identify the particles in this very specific niche class of images.
Here is how we set up this study to use machine learning to be able to identify particles in specifically this subset of images.
Another advantage machine learning has, is we can take advantage, in machine learning, of multiple image channels, which AFM is very useful for. We’re not just getting one channel of information; AFM provides so many different contrast information, in addition to topography, depending on the mode that you use. Whether those properties are mechanical-based or electrical-base or magnetic, you can get a lot of information for many different channels in AFM, and you can integrate all of them into your analysis, which is very powerful, and a unique feature of machine learning. In this case, we chose to take advantage of both the height and the phase channel shown here.
When you overlay the phase onto the height, as you can see in this image, the differentiation and identification of these particles actually becomes pretty easy. It becomes much easier than just looking at one channel or the other. By combining the information, we can do a better job of identifying these particles. And so we actually pick this group right in here from this image which translates to this area right here.
We pick that as basically our training set for this application where we now identify the particles in this image, and we train the model on this image, and then said, okay, now can we extrapolate to the rest of the image and see if that model can pick out the particles in the rest of the image?
Here was how we created our model. Combining the height and the phase channels here, that is what the inputs into our machine learning model and the output is basically a label of 0 or 1; 0 where it’s a base material or 1 where it’s the actual particle. So here, for example, if we take this pixel and this height image, h1 and then this pixel, here, ϕ1, our input into our model, the output is going to be a 0 in this case, because that’s our base material. That’s how we train the model based on just these five particles.
So how did it do? If we again start with the inputs to the model, which was here our phase and our height, and our training data is what you can see in this little blue square here, that’s our training data. Then the question was how well could we apply this to the rest of the image to pick out the particles in the rest of the image? So this was the result for our model, and you can actually see it did a quite a nice job picking these very difficult to pick out particles within this complicated matrix, purplish-orange matrix. It did quite nice. Was it perfect? No, it was not perfect, but it was actually a pretty small training set here. We were only trying to do this for a proof of concept. On this one image, if we compare it with traditional particle analysis software out there as shown here, you can see that the machine learning actually did far better. Much more accurate in terms of finding these little particles and identifying and separating them. The traditional particle analysis, first of all, even in the training set, did not do well, but had a very difficult time even though by eye we can clearly see all these different particles, here, traditional particle analysis software does not do well. So this is a great initial step of machine learning in an application like this where the particle identification is very complicated. And again, it really does provide the ultimate customizable image analysis because we train the model not on generic images, but specifically on these types of images, which is obviously going to give it the most accurate performance for these kinds of images.
Now our final application for how we can use machine learning to improve atomic force microscopy has nothing to do with image analysis or image processing, and, instead, is the application of autonomous operation of the instrument. This is a bit of a holy grail in the operation of AFM. There is a niche application where there is autonomous operation of the instrument. There are very large, very expensive instruments that go into the clean rooms in the semiconductor industry, a build for the semiconductor industry, in order for them to do quality control on a lot of their parts. This is a very focused application because it’s one kind of imaging on one kind of sample, so it’s fairly routine. It’s a fairly straightforward application of AFM, and there they do have the ability to run these instruments for hours and days without human intervention. But for our everyday use that we have in our research or industrial labs, we still don’t have the ability for autonomous operation because we throw so many different kinds of samples, so many different kinds of modes, and we don’t have the ability to set up the instrument on many different samples and different modes and then just let it go for a week and come back at the end of the week and have it collect all data. That’s because there are so many things that can go wrong, things like the z-piezo drifting out of range, or your tip chunking up, contaminating, or wearing down, or on every image you have to optimize your imaging parameters, from things like the feedback loop, your stand rate. If you’re operating in tapping mode, your frequency of your cantilever that you often have to retune over the course of the experiment, your set point, you’re free amplitude…there are a lot of parameters that would require continuous online monitoring and optimization in order for true autonomous operation to occur, so we don’t have that yet.
But certainly this is an area very much where machine learning can play a role because it can be trained to analyze the images, and when things start to go wrong, like we do, start tweaking those parameters and understanding in what direction and what the problem is. For example, we can even use machine learning to monitor for tip quality. This actually was done for scanning tunneling microscopy in this great publication by Bob Wolkow in ACS Nano, published in 2018, where he had a reference sample that the STM would image and check for tip quality, and he was able to train the computer to see when the reference image showed a worn-out tip, and then the tip would move to a different area and be conditioned—because in STM you have methods for institute conditioning of the tip—we actually can’t do that yet in AFM, but then it condition it and then bring it back, we image on the reference sample—make sure it was okay and then it would continue with the work. Certainly, machine learning will play a big role in the many, many steps that would be required to achieve autonomous operation, but again, that’s a bit of a holy grail. And then, of course, once you can get the operation and can be coupled with automated analysis procedures to really improve throughput and improve our analysis and capabilities that we don’t we don’t have right now.
Okay, so in summary, Ishita and I have shown that machine learning can successfully differentiate the AFM images of the two blends: Blend A and Blend B. We also used it to conduct particle identification in challenging images, that was actually images from Blend B, and another note I would say on that particular application is remember we integrated two channels to do that analysis. And this is one of the huge benefits of machine learning because you can integrate so much more information than we would be able to process on our own. That was just an example using two images, sorry—two channels, but you can imagine an example where we pull in three, four, maybe five channels, depending on what information we have, and trying to integrate all of that to create a model to predict a particular output, which is kinds of computations that far exceed our own human capabilities.
There are obviously still major limitations. Probably the biggest limitation, especially for AFM—I mean it exists for all machine learning and all characterization methods, but it’s a particular limitation for AFM—is that you need large data sets. Even the data set that we used arguably was very small. You ideally would be in the thousands, and really in the tens of thousands, of images for a data set. The reason that’s a challenge for AFM is that we have really long acquisition times for our images. A high-resolution image, let’s say 512 x 512, depending on what instrument you use, can still take a good few minutes—even up to close to eight, nine minutes per image—that throughput’s pretty slow if you want to collect so many images. We’re starting to go into archived data, what images do we have, whatever we’ve collected—seeing what we can glean from that and take advantage of that.
In addition, you really do need subject matter expertise, otherwise it’s a bit of a garbage in, garbage out exercise. Not only an understanding what channels, what your input should be, but correctly interpreting the output, for example, in that particle identification image, understanding what did the image actually mean? What are we trying to identify? What did the contrast assign to? Subject matter expertise definitely plays a very large role in any successful machine learning application.
Going forward, we still have a few tasks at hand and a lot of areas for improvement. We can always be improving our models. We saw a very easy example of that when we went from the neural net to the convolutional neural net, that was a big Improvement, but there’s so many other algorithms in so many different directions that we can and we will be taking these applications. And improving our data, like we mentioned. The more data the better.
This is just kind of an initial study that Ishita and I have embarked on, and we’re going to continue working and playing in this area. I think it has a lot of potential to improve our capabilities, improve our image analysis and processing, and hopefully give us new insights into data that we haven’t had before. And we’re going to continue sharing it with this community through webinars and through papers, and we look forward to getting feedback and hearing your questions.
Thanks so much for your attention.
CZ: Okay, that was some great information. Thanks Ishita and Dalia. We’re going to start answering some questions, so if you have some questions, just go ahead and type them into the questions field. Our first question is: How much computation time is required to run the models that you showed and what kind of computers can run it?
IC: Yes, so the most computationally expensive part of running a machine learning model is training the model. The examples that we presented today, the nanoparticle analysis example didn’t take a long time to train—maybe a few seconds, less than a minute. The image analysis, but clearly for the CNN, that took around 15 to 20 minutes to train on a CPU in a PC. So when the image—the data sizes are much larger, we would definitely need GPU capability and much very, very, high computing power for that.
CZ: Okay. Our next question asks: Has machine learning been used in other microscopy methods?
DY: Yes. So the answer to that question is, certainly, yes. Machine learning has been used in other microscopy methods. I’m increasingly seeing it in the literature. It’s used to improve both image analysis and data acquisition, actually, in some of these other techniques. For example, in the electron microscopy world, I’d certainly use machine learning to analyze different kinds of images—both optical and electron microscopy. Also in the field of electron microscopy, I’ve started to see it being used during acquisition as a form of real-time data compression to reduce the beam time that some samples are exposed to by using machine learning techniques. It’s really starting to penetrate into the characterization world. It certainly is not as widespread a technique as in other areas of research, like drug design or materials discovery, but it has certainly entered the characterization world.
CZ: Next question is asking, is it a problem to have different scan sizes for image recognition?
IC: That’s an excellent question. When we have different scan sizes, differently zoomed images for the same sample, the algorithm can get confused. But one way out of it is to have enough images of differently zoomed views of the same sample so that the algorithm can train itself to know that this is a zoomed-out view and this is a zoomed-in view, and it is still the same sample. I just want to mention here that all the data processing from AFM for the examples that we showed here has been done using Python, and that can be used to resize/reshape images as well, and to do a lot of custom image analysis.
CZ: Here’s another question: Can you give another example where machine learning can be used in AFM?
DY: Sure. Another area that we actually did not really touch on in this webinar—because it’s not an area that we have results in yet—is to use machine learning to explore correlations between morphology or parameters or features in the AFM image with some other property outside the AFM image. For example, Igor Sokolov of Tufts University had published a very nice study last year where he looked at bladder cells, and tried to find correlations between features in the AFM image and whether the cell was malignant or benign. I think they looked at something, you know, dozens of parameters that they could analyze off of the AFM image, and certain of them did have very strong correlations with that output. So this whole area of correlations—trying to find correlations between different features in the image and some other property—is certainly an area where machine learning can be used quite effectively to tease out what those correlations are.
CZ: Okay, I think that’ll do it for the questions. I’d like to thank Ishita and Dalia again for doing this presentation, and for all of those out there who tuned in today, we appreciate it, and we look forward to seeing you again at a future McCrone Group webinar. Thanks.