Data & AI
We offer custom software development with the capability to deliver turn-key projects at the national level or specific professional services including ICT system design, implementation, support and quality assurance services.
The foundation for any data science effort helping you to:
- collect the data – using various types of batch or streaming imports
- clean the data – removing any “junk” (e.g. setting all “None” values to NULL)
- transform the data – ensuring that data appears in consistent formats (e.g. parsing date and time values), flattening out nested structures and so on.
With the right Data Engineering solutions you are making sure your data processing pipelines are efficient, secure and scalable.
With some top tier Data Engineering tools, like Apache Spark, Kafka, Airflow, Python, Meltano and Dataiku, you will have all your data needs met.
Involves application of skills and knowledge from the following areas:
- Computer Science and programming
- Statistics and math
- Machine learning
- Data cleaning and transformation
- Predictive and prescriptive data analysis
- Data mining
- Story-telling through data visualization
It is a vast field. That is why data science means different things to different people. But they all share the same goal: extracting value from data.
In order to accomplish that goal efficiently, a good set of tools is necessarry, including:
- Java, Python and R programming languages
- Tensorflow and Keras packages
- Jupyter notebooks
- Apache Spark
- Dataiku Data Science Studio
Plays a big part in Data Science and AI. It is AI’s core component comprised of a large number of algorithms responsible for discovering hidden data relations and dependencies. This helps you to make predictions about the future or draw conclusions about the present.
The machine learning algorithms can be classified in the following way:
- supervised algorithms – the algorithm learns how to attach value to data based on training data that is already valued in some way
- unsupervised algorithms – the algorithm learns the patterns within the data on its own, without previous values atteched to the data
- semi-supervised algorithms – some of the data is labeled and some is notebooks
Supervised algorithms can be further classified based on the “target variable” – the value we are trying to predict or understand:
- regression algorithms – if the target variable is continuous (a real number)
- classification algorithms – if the target variable is discrete (we have a set of possible classes to which a datum can belong).
We cannot finish our quick overview of machine learning without mentioning deep learning which has been making all the waves during the past few decades. Deep learning uses concepts borrowed from biological neural systems to accomplish learning at several levels of abstraction at once. For example, by looking at images of faces, the algorithm learns the details, contours, but also “big picture” features that faces consist of. You might have seen unbelievably realistic “deep fakes” for which deep learning is responsible. If you haven’t, search for it now. We’ll wait…
Machine Learning tool that can really help you is Dataiku – it is intuitive and easy. In the absence of Dataiku, you can use Apache Spark, Python with Keras and Tensorflow, R and Jupyter notebooks.