Given our current in-house expertise in machine learning and partner network of high quality software development engineers, we will be able to actively support the development of your predictive analytics solution.

More specifically, our development expertise is centred on database technologies such as MySQL, MariaDB, MongoDB; and Python 3.x and its ecosystem of libraries to perform various tasks in the data science and machine learning pipelines as summarised below:

We specialise in Python libraries such as pandas, numpy, Sci-kit learn pipeline / pre-processing, matplotlib and Seaborn to visualise and manipulate the data and support tasks such as pattern class balancing, data cleansing, attribute scaling, outlier removal, dimensionality reduction, data transformation.

Sci-kit learn and our bespoke libraries to support sample selection and manipulation and to implement a range of popular machine learning models such as random forests, CATBoost, XGBoost, linear regression, logistic regression, kernel regression, support vector machines, kNN classifiers, Naïve Bayes classifiers and k-means clustering. These libraries also enable the implementation of effective model fitting and selection methods such as bagging and boosting, cross-validation and grid-based searching. Appropriate model evaluation metrics are also supported (eg area under the curve, precision/recall, F1 measures etc). We apply the statsmodels.org python libraries for time-series processing using traditional statistical models such as VAR, ARIMA, and GARCH.

For processing natural language texts (eg tweets or news data scraped from webpages), we have expertise in using the Natural Language Toolkit (NLTK) library for producing bag of words models, n-gram statistics, parsing and semantic analysis.

We use the pytorch libraries to design and implement the following types of neural network models:

  • Multi-layered perceptrons with many non-linear hidden layers (with batch normalisation and dropout)
  • Deep auto-encoders (denoised, regularised)
  • Convolutional neural networks
  • Residual neural networks
  • Recurrent neural networks, such as Long short-term memory models and gated recurrent unit models, for time-series forecasting and sequence learning

The coordinated use of collaborative development platforms, such as github or your organisation’s chosen version controlled development platform, are actively supported and encouraged throughout the development process.