/ April 12, 2023

    by Robert Komar

    Using artificial intelligence to automate the recruitment process

    At a software house like Scalac, an essential part of the company's operation is to match people with the right skills to the demands of a given project. To get the most out of the programmers on the bench, one should consider hirings and layoffs in relation to the anticipated workload. To gauge this workload, you need to know how complex a software project is, and the best way to deal with this challenge is to use data analysis.

    We have started to work on an algorithm to automate the process of recruiting and evaluating employees. This process is not an easy task and can take a lot of time for a team of people with expertise in given fields such as BD, HR, and PM. The goal is to minimize the time members of the given groups spend on evaluating new/current employees in order to successfully assign them to new projects so they can focus on other tasks.

    In addition, using extensive data, we also aim to achieve at least as good matches, if not better, than a team of specialists would be able to achieve, in no more than one minute. We know that this task is not easy and will take a lot of time, but we are motivated by the potential of having such a tool.

    1. The Data

    The data obtained in the process is not only relevant to the algorithm but is also important from the company's own perspective. Preparing tools to analyze the available data will make it possible to obtain quick information about any given employee on the basis of which the Talent Team could then develop a plan for the employee's further development. The company could use this data in negotiations, and the employees themselves would also be able to analyze their own strengths and weaknesses.

    1.1 Skill Matrix

    The basis of the algorithm is the so-called "Skill matrix," a matrix in which skills in given technologies are grouped into individual professions (e.g., Frontend, Backend, Data, etc.). In each profession, the data is divided into subgroups, each consisting of individual technologies. Each technology has complexities defined by a specialist in the respective fields.

    In addition, to achieve the best quality of evaluation, the process is supported by using IR (Information Retrieval) techniques to obtain the most objective evaluations possible. Moreover, for each technology field, such as knowledge and enjoyment, levels are defined in 5 level scales, so every user that belongs to the system needs to define the level of knowledge and enjoyment they have in a given technology. 

    1.2 User Data

    Although the "skill matrix" is the main component of the algorithm, it is just the tip of the iceberg. There is much more data to consider that is defined by each user in their profiles, such as education, seniority, nationality, known languages, job experience, roles in past projects, etc. All of these factors are taken into account in the algorithm to create as realistic an evaluation as possible. Additionally,  the algorithm considers the working time the user has proposed in the system.

    Two instances are given by the user: preferred and possible. The overlap between the projects' working time and the times proposed by the user is calculated and made use of in the classification process. It might sometimes happen that the user's working time is not compatible with the project's, but even in such cases, the user would not necessarily be disqualified since such matters are often negotiable, especially when it comes to working remotely. 

    1.3 Project Data

    In the project data, we take into account the time, budget, and resources of the project, as well as the technologies required. In addition,  we consider the domains in which the projects operate, since in some situations it's important that the system is also able to filter any banned countries in which the project can’t accept operatives from. It is also important to take into account the operatives that are actually working on the project, as one part of the algorithm matches employees' soft skills to each other, and if possible also to the projects’ overall national culture.

    1.4. Soft skills

    Soft skills are often overlooked, however, in our opinion, they are a very important part of the evaluation process. The reason for this is that employees need to work together, and even if they don't need to share an office, they will still need to cooperate with each other. Some examples of soft skills are:

    - Communication skills, Negotiation skills, Presentation skills, Teamwork, Problem-solving, Creativity, Persistence, Motivation, Flexibility, Interpersonal skills, Leadership, Time management

    2. The Algorithm

    2.1 Evolutionary Algorithms

    Recruitment and selection problems happen to be NP-hard problems. Therefore we are using a classification technique known as evolutionary algorithms. [y9808] Evolutionary algorithms are a subset of artificial intelligence and computational intelligence that uses processes inspired by natural evolution to optimize a given function. These algorithms are also known as evolutionary computations because they can simulate the process of evolution.

    The main idea of evolutionary algorithms is to have a population of individuals that evolve through the generations. We can interpret the result of the process of evolution as the solution to the problem we are solving. The most common way of implementing an evolutionary algorithm is to have a population of individuals, each of whom has a set of values. These values are called genes and the individual is called a chromosome. The set of values of all the individuals that form the population is called the search space.

    The individuals are evaluated according to a function called the fitness function. The fitness function is the function that tells us how good a given individual is. The fitness function is what we want to optimize. In other words, we want to maximize or minimize the fitness function. The process of evolution is divided into generations. In each generation, the individuals are selected according to their fitness. The individuals are then mutated or crossed over according to some rules and the new individuals are evaluated according to the fitness function. The process is repeated until some stopping criterion is met. 

    Since our problem is multi-objective in our solution, we are testing the newest algorithms introduced in Pymoo, such as NSGA-III, MOEA/D, and NSGA-II, which are very promising. [x9808] The goal is to find a Pareto front that represents several of the team's proposals, each of which will be at the forefront of any of the predetermined objectives

    2.2 Active Learning

    Active learning is a method of learning in which the learner interacts with the environment to get the most out of the learning process. In our case, the learner is the algorithm. In active learning, it is not passive, but interacts with the environment to get the most out of the learning process. In our case, an environment is a qualified person evaluating the same input vectors as the evolutionary algorithms. Based on this, it is possible to find the discrepancy between the populations created by the algorithm and the teams created by the evaluator and thereby improve the algorithm. 

    2.3 Indicators of success

    There are two main indicators of success in this project: 

    • The quality of the results - determined by the quality of the Pareto front that the algorithm produces.
    • The time it takes to get the results - determined by the time it takes to produce the Pareto front.

    We want the results to be of the same quality as those that would be achieved by a team of specialists. That is, we want the results to be at least as good as those that would be achieved by a team of specialists, and we want to get them in less than one minute.

    3. The testing tool

    To facilitate testing and improvement processes, we have developed a tool called Ml Tester. This is a lightweight application that allows an authorized person to create new prospects and evaluate them. An artificial intelligence algorithm is automatically run and its results are corrected by the input entered by the user. If the user does not want to make changes to the algorithm, the customized statistics evaluating the quality of the team's matching by the algorithm are displayed instead.

    Such a system is important due to the fact that every company may have slightly different expectations of the hiring process, so such a simple way to reevaluate the classification results and improve the algorithm should be required.

    4. Conclusion

    The research on this algorithm is still in its initial phase, but we are already seeing promising results. The algorithm is constantly being improved and the tool is being developed to make the process of testing and improving the algorithm as efficient as possible. The algorithm is still developing, but we are confident that it will become a valuable tool in the process of hiring and evaluating employees.


    Read more about

    Robert Komar Machine Learning Developer

    The author

    Robert Komar

    Machine Learning Developer

    Data science enthusiast, particularly interested in the mathematics behind the latest machine learning solutions. The main topic I've been working on in recent months is MLOps and multi-criteria optimization using evolutionary algorithms. In my spare time, I work out at the gym and do random aerobic exercises.