Building the KNN algorithm With JavaScript
k-Nearest Neighbor (KNN)
The KNN is a simple, fast, and straightforward classification algorithm. It is very useful for categorized numerical datasets, where the data is naturally clustered. It will feel similar in some ways to the k-means clustering algorithm; with the major distinction being that k-means is an unsupervised algorithm while KNN is a supervised learning algorithm.
If you wish to perform a KNN analysis manually, here’s how it should go: first, plot all your training data on a graph and label each point with its category or label. When you wish to classify a new, unknown point, put it on the graph and find the k closest points to it (the nearest neighbors).
The number k should be an odd number in order to avoid ties; three is a good starting point, but some applications will need more and some can get away with one. Report whatever the majority of the k nearest neighbors is classified, as that will be the result of the algorithm.
Finding the k nearest neighbors to a test point is straightforward, but you can use some optimizations if your training data is very large. Typically, when evaluating a new point, you would calculate the Euclidean distance (the typical, high school geometry distance measure) between your test point and every other training point, and sort them by distance. This algorithm is quite fast because the training data is generally not more than 10,000 points or so.
If you have many training examples (in the order of millions) or you really need the algorithm to be lightning-fast, there are two optimizations you can make. The first is to skip the square root operation in the distance measure and use the squared distance instead. While modern CPUs are very fast, the square root operation is still much slower than multiplication and addition, so you can save a few milliseconds by avoiding the square root.
The second optimization is to only consider points within some bounding rectangle of distance to your test point; for instance, only consider points within +/- 5 units in each dimension from the test point’s location. If your training data is dense, this optimization will not affect the results but will speed up the algorithm because it will avoid calculating distances for many points.
The following is the KNN algorithm as a high-level description:
- Record all training data and their labels
- Given a new point to evaluate, generate a list of its distances to all training points
- Sort the list of distances in the order of closest to farthest
- Throw out all but the knearest distances
- Determine which label represents the majority of your knearest neighbors; this is the result of the algorithm
A more efficient version avoids maintaining a large list of distances that need to be sorted by limiting the list of distances to k items. Now get started with your implementation of the KNN algorithm.
Building the KNN algorithm
Since the KNN algorithm is quite simple, you can build your own implementation:
- Create a new folder and name it Ch5-knn.
- Add the following jsonfile to the folder. Here, you have added a dependency for the jimp library, which is an image processing library:
{ "name": "Ch5-knn", "version": "1.0.0", "description": "ML in JS Example for Chapter 5 - k-nearest-neighbor", "main": "src/index.js", "author": "Burak Kanber", "license": "MIT", "scripts": { "build-web": "browserify src/index.js -o dist/index.js -t [ babelify --presets [ env ] ]", "build-cli": "browserify src/index.js --node -o dist/index.js -t [ babelify --presets [ env ] ]", "start": "yarn build-cli && node dist/index.js" }, "dependencies": { "babel-core": "^6.26.0", "babel-plugin-transform-object-rest-spread": "^6.26.0", "babel-preset-env": "^1.6.1", "babelify": "^8.0.0", "browserify": "^15.1.0", "jimp": "^0.2.28" } }
- Run the yarn installcommand to download and install all the dependencies and then create subfolders called src, dist, and files.
- Inside the srcfolder, create an js file and a knn.js file.
You will also need a data.js file. For these examples, a larger dataset has been used which is difficult to be printed here, so you should take a minute to download the Ch5-knn/src/data.js file from GitHub. You can also find the complete code for this article at https://github.com/PacktPublishing/Hands-On-Machine-Learning-with-JavaScript/tree/master/Chapter05/Ch5-knn.
- Start with the jsfile. You’ll need a distance-measuring function. Add the following to the beginning of knn.js:
/** * Calculate the distance between two points. * Points must be given as arrays or objects with equivalent keys. * @param {Array.<number>} a * @param {Array.<number>} b * @return {number} */ const distance = (a, b) => Math.sqrt( a.map((aPoint, i) => b[i] - aPoint) .reduce((sumOfSquares, diff) => sumOfSquares + (diff*diff), 0) );
If you really need a performance optimization for your KNN implementation, this is where you might omit the Math.sqrt operation and return just the squared distance. However, since this is such a fast algorithm by nature, you need to do this only if you’re working on an extreme problem with a lot of data or with very strict speed requirements.
- Next, add the stub of your KNN class. Add the following to js, beneath the distance function:
class KNN { constructor(k = 1, data, labels) { this.k = k; this.data = data; this.labels = labels; } } export default KNN;
The constructor accepts three arguments: the k or the number of neighbors to consider when classifying your new point, the training data split up into the data points alone, and a corresponding array of their labels.
- Next, you need to add an internal method that considers a test point and calculates a sorted list of distances from the test point to the training points. You can call this a distance map. Add the following to the body of the KNN class:
generateDistanceMap(point) { const map = []; let maxDistanceInMap; for (let index = 0, len = this.data.length; index < len; index++) { const otherPoint = this.data[index]; const otherPointLabel = this.labels[index]; const thisDistance = distance(point, otherPoint); /** * Keep at most k items in the map. * Much more efficient for large sets, because this * avoids storing and then sorting a million-item map. * This adds many more sort operations, but hopefully k is small. */ if (!maxDistanceInMap || thisDistance < maxDistanceInMap) { // Only add an item if it's closer than the farthest of the candidates map.push({ index, distance: thisDistance, label: otherPointLabel }); // Sort the map so the closest is first map.sort((a, b) => a.distance < b.distance ? -1 : 1); // If the map became too long, drop the farthest item if (map.length > this.k) { map.pop(); } // Update this value for the next comparison maxDistanceInMap = map[map.length - 1].distance; } } return map; }
This method could be easier to read, but the simpler version is not efficient for very large training sets. What you’re doing here is maintaining a list of points that might be the KNNs and storing them in map.
By maintaining a variable called maxDistanceInMap, you can loop over every training point and make a simple comparison to see whether the point should be added to your candidates’ list. If the point you’re iterating over is closer than the farthest of your candidates, you can add the point to the list, re-sort the list, remove the farthest point to keep the list small, and then update mapDistanceInMap.
If that sounds like a lot of work, a simpler version might loop overall points, add each one with its distance measurement to the map, sort the map, and then return the first k items. The downside of this implementation is that for a dataset of a million points, you’d need to build a distance map of a million points and then sort that giant list in memory.
In your version, you only ever hold k items as candidates, so you never need to store a separate million-point map. Your version does require a call to Array.sort whenever an item is added to the map. This is inefficient in its own way, as the sort function is called for each addition to the map. Fortunately, the sort operation is only for k items, where k might be something like 3 or 5.
The computational complexity of the sorting algorithm is most likely O(n log n) (for a quicksort or mergesort implementation), so it only takes about 30 data points for the sophisticated version to be more efficient than the simple version when k = 3, and for k = 5, this happens at around 3,000 data points. However, both versions are so fast that for a dataset smaller than 3,000 points, you won’t notice the difference.
- Finally, tie the algorithm together with the predict The predictmethod must accept a test point, and at the very least, return the determined label for the point. You can also add some additional output to the method and report the labels of the k nearest neighbors as well as the number of votes each label contributed. Add the following to the body of the KNN class:
predict(point) { const map = this.generateDistanceMap(point); const votes = map.slice(0, this.k); const voteCounts = votes // Reduces into an object like {label: voteCount} .reduce((obj, vote) => Object.assign({}, obj, {[vote.label]: (obj[vote.label] || 0) + 1}), {}) ; const sortedVotes = Object.keys(voteCounts) .map(label => ({label, count: voteCounts[label]})) .sort((a, b) => a.count > b.count ? -1 : 1) ; return { label: sortedVotes[0].label, voteCounts, votes }; }
This method requires a little bit of datatype juggling in JavaScript but is simple in concept. First, generate your distance map using the method you just implemented. Then, remove all data except for the k nearest points and store that in a votes variable. If you’re using 3 as k, then votes will be an array of length three.
Now that you have your k nearest neighbors, you need to figure out which label represents the majority of the neighbors. You can do this by reducing your votes array into an object called voteCounts. To get a picture of what you want voteCounts to look like, imagine that you’re looking for the three nearest neighbors and the possible categories are Male or Female. The voteCounts variable might look like this: {“Female”: 2, “Male”: 1}.
The job is still not done, however—after reducing your votes into a vote-count object, you still need to sort that and determine the majority label. You can do this by mapping the vote counts object back into an array and then sorting the array based on vote counts.
There are other ways to approach this problem of tallying votes; any method you can think of will work, as long as you can return the majority vote at the end of the day. That’s all you need to do in the knn.js file. The algorithm is complete, requiring fewer than 70 lines of code.
Now set up your index.js file and get ready to run some examples. Remember that you need to download the data.js file first. You can do this by downloading the file from https://github.com/bkanber/MLinJSBook. Now add the following to the top of index.js:
import KNN from'./knn.js'; import {weight_height} from'./data.js';
You can now try out the algorithm using a simple example.
Example – Height, weight, and gender
KNN, like k-means, can work on high-dimensional data—but, like k-means, you can only graph example data in a two-dimensional plane, so keep your example simple. The first question you’ll tackle is: can you predict a person’s biological sex given only their height and weight?
The data for this example has been downloaded from a national longitudinal survey on people’s perception of their weight. Included in the data are the respondents’ height, weight, and gender. This is what the data looks like when graphed:
Just by looking at the preceding charted data, you can get a sense as to why KNN is so effective at evaluating clustered data. It’s true that there’s no neat boundary between male and female, but if you were to evaluate a new data point of a 200 pound, 72 inches-tall person, it’s clear that all the training data around that point is male and it’s likely your new point is male, too.
Conversely, a new respondent at 125 pounds and a height of 62 inches is well into the female area of the graph, though there are a couple of males with those characteristics as well. The middle of the graph, around 145 pounds and 65 inches tall, is the most ambiguous, with an even split of male and female training points. Expect the algorithm to be uncertain about the new points in that area. As there is no clear dividing line in this dataset, you would need more features or more dimensions to get a better resolution of the boundaries.
In any case, try out a few examples. Pick five points that you may expect to be definitely male, definitely female, probably male, probably female, and indeterminable. Add the following code to index.js, beneath the two import lines:
console.log("Testing height and weight with k=5"); console.log("=========================="); constsolver1 = new KNN(5, weight_height.data, weight_height.labels); console.log("Testing a 'definitely male' point:"); console.log(solver1.predict([200, 75])); console.log("\nTesting a 'probably male' point:"); console.log(solver1.predict([170, 70])); console.log("\nTesting a 'totally uncertain' point:"); console.log(solver1.predict([140, 64])); console.log("\nTesting a 'probably female' point:"); console.log(solver1.predict([130, 63])); console.log("\nTesting a 'definitely female' point:"); console.log(solver1.predict([120, 60]));
Run yarn start from the command line and you should see the following output. Since the KNN is not stochastic that is it does not use any random conditions in its evaluation, you should see exactly the same output with the possible exception of the ordering of votes and their indexes, if two votes have the same distance.
If you get an error when you run yarn start, make sure that your data.js file has been correctly downloaded and installed.
Here’s the output from the preceding code:
Testing heightand weight withk=5 ====================================================================== Testing a'definitely male' point: { label: 'Male', voteCounts: { Male: 5 }, votes: [ { index: 372, distance: 0, label: 'Male' }, { index: 256, distance: 1, label: 'Male' }, { index: 291, distance: 1, label: 'Male' }, { index: 236, distance: 2.8284271247461903, label: 'Male' }, { index: 310, distance: 3, label: 'Male' } ] } Testing a'probably male' point: { label: 'Male', voteCounts: { Male: 5 }, votes: [ { index: 463, distance: 0, label: 'Male' }, { index: 311, distance: 0, label: 'Male' }, { index: 247, distance: 1, label: 'Male' }, { index: 437, distance: 1, label: 'Male' }, { index: 435, distance: 1, label: 'Male' } ] } Testing a'totally uncertain' point: { label: 'Male', voteCounts: { Male: 3, Female: 2 }, votes: [ { index: 329, distance: 0, label: 'Male' }, { index: 465, distance: 0, label: 'Male' }, { index: 386, distance: 0, label: 'Male' }, { index: 126, distance: 0, label: 'Female' }, { index: 174, distance: 1, label: 'Female' } ] } Testing a'probably female' point: { label: 'Female', voteCounts: { Female: 4, Male: 1 }, votes: [ { index: 186, distance: 0, label: 'Female' }, { index: 90, distance: 0, label: 'Female' }, { index: 330, distance: 0, label: 'Male' }, { index: 51, distance: 1, label: 'Female' }, { index: 96, distance: 1, label: 'Female' } ] } Testing a'definitely female' point: { label: 'Female', voteCounts: { Female: 5 }, votes: [ { index: 200, distance: 0, label: 'Female' }, { index: 150, distance: 0, label: 'Female' }, { index: 198, distance: 1, label: 'Female' }, { index: 147, distance: 1, label: 'Female' }, { index: 157, distance: 1, label: 'Female' } ] }
The algorithm has determined genders just as you would have done, visually, by looking at the chart. Feel free to play with this example more and experiment with different values of k to see how results might differ for any given test point.
If you found this article interesting, you can explore Burak Kanber’s Hands-on Machine Learning with JavaScript to gain hands-on knowledge on evaluating and implementing the right model, along with choosing from different JS libraries, such as NaturalNode, brain, harthur, and classifier to design smarter applications. This book is a definitive guide to creating an intelligent web application with the best of machine learning and JavaScript.
Customizing a Simple Social Media Application Using MERN
Learn how to use the MERN stack to build simple social media application features in this tutorial by Shama Hoque.
MERN Social
MERN Social is a sample social media application, used for the purpose of this article, with rudimentary features inspired by existing social media platforms such as Facebook and Twitter. In this article, you’ll learn how to update a user profile in MERN Social that can display a user uploaded profile photo and an about description.
The main purpose of this application is to demonstrate how to use the MERN stack technologies to implement features that allow users to connect and interact over content. You can extend these implementations further, as desired, for more complex features.
The code for the complete MERN Social application is available on GitHub in the repository at github.com/shamahoque/mern-social. You can clone this code and run the application as you go through the code explanations. Note that you will need to create a MERN skeleton application by adding a working React frontend, including frontend routing and basic server-side rendering of the React views for following these steps.
The views needed for the MERN Social application can be developed by extending and modifying the existing React components in the MERN skeleton application. The following component tree shows all the custom React components that make up the MERN Social frontend and also exposes the composition structure that you can use to build out extended views:
Updating the user profile
The skeleton application only has support for a user’s name, email, and password. However, in MERN Social, you can allow users to add their description and also upload a profile photo while editing the profile after signing up:
Adding an about description
In order to store the description entered in the about field by a user, you need to add an about field to the user model in server/models/user.model.js:
about: { type: String, trim: true }
Then, to get the description as input from the user, you can add a multiline TextField to the EditProfile form in mern-social/client/user/EditProfile.js:
<TextField id="multiline-flexible" label="About" multiline rows="2" value={this.state.about} onChange={this.handleChange('about')} />
Finally, to show the description text added to the about field on the user profile page, you can add it to the existing profile view in mern-social/client/user/Profile.js:
<ListItem> <ListItemText primary={this.state.user.about}/> </ListItem>
With this modification to the user feature in the MERN skeleton code, users can now add and update a description about them to be displayed on their profiles.
Uploading a profile photo
Allowing a user to upload a profile photo will require that you store the uploaded image file and retrieve it on request to load in the view. For MERN Social, you need to assume that the photo files uploaded by the user will be of small sizes and demonstrate how to store these files in MongoDB for the profile photo upload feature.
There are multiple ways of implementing this upload feature considering the different file storage options:
- Server filesystem: Upload and save files to a server filesystem and store the URL to MongoDB
- External file storage: Save files to an external storage such as Amazon S3 and store the URL in MongoDB
- Store as data in MongoDB: Save files of a small size (less than 16 MB) to MongoDB as data of type Buffer
Updating the user model to store a photo in MongoDB
In order to store the uploaded profile photo directly in the database, you can update the user model to add a photo field that stores the file as data of type Buffer, along with its contentType in mern-social/server/models/user.model.js:
photo: { data: Buffer, contentType: String }
Uploading a photo from the edit form
Users should be able to upload an image file from their local files when editing the profile. You can update the EditProfile component in client/user/EditProfile.js with an upload photo option and then attach the user selected file in the form data submitted to the server.
File input with Material-UI
You can use the HTML5 file input type to let the user select an image from their local files. The file input will return the filename in the change event when the user selects a file in mern-social/client/user/EditProfile.js:
<input accept="image/*" type="file" onChange={this.handleChange('photo')} style={{display:'none'}} id="icon-button-file" />
To integrate this file input with Material-UI components, you can apply display:none to hide the input element from view, then add a Material-UI button inside the label for this file input. This way, the view displays the Material-UI button instead of the HTML5 file input element in mern-social/client/user/EditProfile.js:
<label htmlFor="icon-button-file"> <Button variant="raised" color="default" component="span"> Upload <FileUpload/> </Button> </label>
With the Button‘s component prop set to span, the Button component renders as a span element inside the label element. A click on the Upload span or label is registered by the file input with the same ID as the label, and as a result, the file select dialog is opened. Once the user selects a file, you can set it to state in the call to handleChange(…) and display the name in the view in mern-social/client/user/EditProfile.js:
<span className={classes.filename}> {this.state.photo ? this.state.photo.name : ''} </span>
Form submission with the file attached
Uploading files to the server with a form requires a multipart form submission. You can modify the EditProfile component to use the FormData API to store the form data in the format needed for encoding type multipart/form-data.
First, you need to initialize FormData in componentDidMount() in mern-social/client/user/EditProfile.js:
this.userData = new FormData()
Next, you need to update the input handleChange function to store input values for both the text fields and the file input in FormData in mern-social/client/user/EditProfile.js:
handleChange = name => event => { const value = name === 'photo' ? event.target.files[0] : event.target.value this.userData.set(name, value) this.setState({ [name]: value }) }
Then, on clicking submit, this.userData is sent with the fetch API call to update the user. As the content type of the data sent to the server is no longer ‘application/json’, you’ll also need to modify the update fetch method in api-user.js to remove Content-Type from the headers in the fetch call in mern-social/client/user/api-user.js:
const update = (params, credentials, user) => { return fetch('/api/users/' + params.userId, { method: 'PUT', headers: { 'Accept': 'application/json', 'Authorization': 'Bearer ' + credentials.t }, body: user }).then((response) => { return response.json() }).catch((e) => { console.log(e) }) }
Now if the user chooses to upload a profile photo when editing profile, the server will receive a request with the file attached along with the other field values.
Processing a request containing a file upload
On the server, to process the request to the update API that may now contain a file, you need to use the formidable npm module:
npm install --save formidable
Formidable will allow you to read the multipart form data, giving access to the fields and the file, if any. If there is a file, formidable will store it temporarily in the filesystem. You need to read it from the filesystem, using the fs module to retrieve the file type and data and store it in the photo field in the user model. The formidable code will go in the update controller in mern-social/server/controllers/user.controller.js as follows:
import formidable from 'formidable' import fs from 'fs' const update = (req, res, next) => { let form = new formidable.IncomingForm() form.keepExtensions = true form.parse(req, (err, fields, files) => { if (err) { return res.status(400).json({ error: "Photo could not be uploaded" }) } let user = req.profile user = _.extend(user, fields) user.updated = Date.now() if(files.photo){ user.photo.data = fs.readFileSync(files.photo.path) user.photo.contentType = files.photo.type } user.save((err, result) => { if (err) { return res.status(400).json({ error: errorHandler.getErrorMessage(err) }) } user.hashed_password = undefined user.salt = undefined res.json(user) }) }) }
This will store the uploaded file as data in the database. Next, you’ll need to set up file retrieval to be able to access and display the photo uploaded by the user in the frontend views.
Retrieving a profile photo
The simplest option to retrieve the file stored in the database and show it in a view is to set up a route that will fetch the data and return it as an image file to the requesting client.
Profile photo URL
You need to set up a route to the photo stored in the database for each user and also add another route that will fetch a default photo if the given user has not uploaded a profile photo in mern-social/server/routes/user.routes.js:
router.route('/api/users/photo/:userId') .get(userCtrl.photo, userCtrl.defaultPhoto) router.route('/api/users/defaultphoto') .get(userCtrl.defaultPhoto)
You need to look for the photo in the photo controller method and if found, send it in the response to the request at the photo route; otherwise, you’ll need to call next() to return the default photo in mern-social/server/controllers/user.controller.js:
const photo = (req, res, next) => { if(req.profile.photo.data){ res.set("Content-Type", req.profile.photo.contentType) return res.send(req.profile.photo.data) } next() }
The default photo is retrieved and sent from the server’s file system in mern-social/server/controllers/user.controller.js:
import profileImage from './../../client/assets/images/profile-pic.png' const defaultPhoto = (req, res) => { return res.sendFile(process.cwd()+profileImage) }
Showing a photo in a view
With the photo URL routes set up to retrieve the photo, you can simply use these in the img element’s src attribute to load the photo in the view in mern-social/client/user/Profile.js. For example, in the Profile component, you’ll get the user ID from state and use it to construct the photo URL.
const photoUrl = this.state.user._id ? `/api/users/photo/${this.state.user._id}?${new Date().getTime()}` : '/api/users/defaultphoto'
To ensure that the img element reloads in the Profile view after the photo is updated in the edit, you need to also add a time value to the photo URL to bypass the browser’s default image caching behavior.
Then, you can set the photoUrl to the Material-UI Avatar component, which renders the linked image in the view:
<Avatar src={photoUrl}/>
The updated user profile in MERN Social can now display a user uploaded profile photo and an about description:
If you found this article helpful, you can explore Shama Hoque’s Full-Stack React Projects to unleash the power of MERN stack. This book guides you through MERN stack-based web development, for creating a basic skeleton application and extending it to build four different web applications.
What is Modernizr
Modernizr is a small piece of JavaScript code library for feature detection that detects the browser support next-generation web technologies in your user’s browsers. Modernizr uses feature detection to allow you to easily fit your website on user’s experiences based with the actual capabilities of their browser.
There are several JavaScript libraries are available now a days, which can help you detect features and browsers. Using these libraries we can detect whether the browser has support for that particular feature or not.
Modernizr gives the ability to detect the new features in the browsers that can render or utilize
Let’s take a example with plain javascript to detect canvas feature.
<script> $(document).ready(function(){ var canvasElement = ""; canvasElement = document.createElement('canvas'); if(canvasElement = = = undefined){ $("#draw").html("Canvas is not supported, use something else."); }else{ // canvas is supported by the browser. $("#draw").html("Canvas is supported"); $("body").append(canvasElement); } }); </script>
In this code, we tried to create a canvas element and then added it to body.
we could use the Modernizr.canvas property to test if the browser supported canvas or not:
<script> $(document).ready(function(){ var canvasElement = ""; canvasElement = document.createElement('canvas'); if(Modernizr.canvas){ $("#draw").html("Canvas is supported"); $("body").append(canvasElement); }else{ // canvas is supported by the browser. $("#draw").html("Canvas is not supported, use something else."); } }); </script>
Modernizr creates an object named Modernizr to the document, which contains all of the test results as Boolean properties. We can write a simple script to list all supported and unsupported features:
$(document).ready(function(){ for(var key in Modernizr) { if(typeof Modernizr[key] == 'boolean'){ console.log(key + "::" + Modernizr[key] ); if(Modernizr[key]){ $("#supported- features").append("<b>"+key+"</b><br/>"); }else{ $("#unsupported- features").append("<b>"+key+"</b><br/>"); } } } });
Polyfilling with YepNope,requireJS
As a Developer, we can load the JS as per the feature is supported by your browsers. We can use Modernizr.load() method to test if a certain feature is supported or not and load the appropriate JavaScript based on the result. Unfortunately Modernizr.load() is not provided in the modernizr.js file by default and was a part of Modernizr Version 2.8.3. But there are many other third party libraries are available to load resources like YepNope,requireJS,HeadJS etc.
Modernizr.load() is actually yepnope.js (http://yepnopejs.com/). yepnope.js was known as a conditional loader for polyfills, but now its deprecated after version 1.5.
Let’s take a example for this:
<script> Modernizr.load({ test: Modernizr.touch, yep : 'js/touch.js', nope: 'js/no-touch.js' }); </script>
In below code we are checking if touch is available in the browser then it will load touch.js, otherwise it will load the no-touch.js file into the browser.
For the newer versions of Modernizr, we can include yepnope.js to use Modernizr.load().
Developers can also wrote the previous example of code as below with RequireJS:
<script> if(Modernizr.touch){ require(['js/touch'], function() { //code to execute when touch..js is loaded }); }else{ require(['js/no-touch'], function() { //code to execute when our polyfill is loaded }); } </script>