Quickstart
==========

This page introduces the basic concepts related to data analysis and processing
for clinical databases. The API has been originally developed for various
machine learning tasks in a context of early drug development for cancer, but it
is intended to be flexible and adaptable for other similar problematics.

The objective of the API is to make available the tools needed to solve the issues
raised by the complexity of Electronic Health Records (EHR), that will be
introduced in intro_ .


Introduction
------------
.. _intro:
    During the last decade, there has been tremendous progress in the field of
    of machine learning which has been conducted by the availability of massive
    amounts of data and the thriving power of computers. Those progress have
    impacted domains such as computer vision, speech recognition and many others.
    More recently, researchers have started to apply this knowledge on healthcare,
    given clinical data from hospitals or other portable devices and that is
    what we are focusing this API on.

    The usage of EHR has been widely adopted around the earth, which has led
    doctors and statisticians to mine them to improve the care of patients.
    However, medical health records are very difficult to tackle since they contain
    all the difficulties that exist in data analysis:
    * **sparcity**
    * **high cardinality categorical features**
    * **unstructured data** (text, images)
    * **temporality of the events**

    All those issues make it hard to initiate a machine learning project using
    clinical data, for those reason, we aim at providing the right tools to
    preprocess such databases with *efficiency* and *simplicity*.