Coding Guidelines¶
As more time and effort is spent on maintenance rather than on writing the initial code, it is essential to keep the code base manageable. Any developer should be able to quickly understand the code written by another developer. By ensuring consistency in the code base you save time and effort for both yourself and others.
These guidelines cover our coding style along with more practical issues. To learn more about contribution process, see How to Contribute.
Coding Style¶
General Rules¶
Whitespaces¶
Do not leave trailing spaces in the end of a line.
Always put an empty line in the end of the file.
Surround operators with spaces:
x += 1
is preferred tox+=1
.Only use spaces, not tabs.
Keep the same indent in spaces: indent is equal to four spaces.
Tip
Configure your editor to produce spaces instead of tabs. This guarantees that source code always looks and prints fine regardless of any editor settings.
Functions¶
The order of function’s parameters: first inputs, then outputs.
C++ Files¶
Each C++ file should begin with the following:
a comment containing the name of the file
the copyright notice
a brief description of the file’s content
Attention
Code must compile without errors and warnings.
Header Files¶
A header file should include only the header files it needs.
Whenever possible put
include
s in .cpp files, not in header files.Do not use
#include
if a forward declaration is sufficient.
Naming Conventions¶
Rules that apply for a name of any entity:
Give descriptive and meaningful names.
Avoid ambiguity.
Do not abbriviate words.
Do not remove letters from words.
Not recommended
int n; // Meaningless
int nerr; // Ambiguous abbreviation
int nCompConns; // Ambiguous abbreviation
int wgcConnections; // Meaningless for others
int pcReader; // Ambiguous ("pc" can mean different things)
int cstmrId; // Missing letters
Good
int priceCountReader;
int numErrors; // "num" is a widespread convention
For more specific rules, see:
Naming conventions for distributed and streaming computational schemes
File Names¶
The name of a file:
should be in all lowercase
may include underscores (
_
)
algorithm/kmeans_assign.h
Type Names¶
The names of classes, structs, typedefs, and enums:
should start with a capital letter
should have a capital letter for each new word
should not contain underscores (
_
)
class MyClass;
struct MyStruct;
typedef int MyType;
enum MyEnum;
Function Names¶
The name of a function:
should start with a lowercase letter
should have a capital letter for each new word
should not contain underscores (
_
)
addTableEntry();
deleteTable();
Variable Names¶
The name of a local variable:
should start with a lowercase letter
should have a capital letter for each new word
should not contain underscores (
_
)
int numBlocks;
Class Member Names and Method Names¶
There is a difference in naming public
and private
class members and methods:
public
class members and methods are named in the same way as local variables and functions:private
class members and methods should have an additional underscore (_
) at the beginning.
class Table
{
public:
int colsInTable; // external
private:
string _tableName; // internal
}
Constant Names¶
Constants:
const int daysInAWeek = 7;
Enum members:
enum MyTableTypes {intType, doubleType};
Distributed and streaming computational schemes¶
Enum Names¶
For numeric tables: InputId
. For models: Input[Name]Id
remove
Distributed
from namesadd
Step?
remove
Numeric Table
andTable
from names
Name template is the following: [Prefix][Method][SbjName][Type]Id;
- Prefix
Step?Master
orStep?Local
– for distributed processing modeOnline
– for online processing modenone – for batch processing mode
- Method
SVD, Correlation, etc.
- SbjName
PartialResult, Input
- Type
Collection
orModel
none – for Numeric Table
Examples of renaming:
MasterInputId
->Step2MasterInputId
DistributedPartialResultStep3Id
->Step3LocalPartialResultId
PartialSVDTableResultId
->Step?LocalSVDPartialResultId
Step3LocalNumericTableInputId
->Step3LocalInputId
Step3LocalCollectionInputId
->Step3LocalInputCollectionId
Class Names¶
Name templates:
DistributedInput<step, method>
DistributedPartialResult<step, method>
- for distributed computationPartialResult<method>
- for online computation and distributed step1
- step
step2Master
,step3Local
, etc. (starts from 2)- method
default if there is only one method used
- PartialResult
is returned from 1st computation step
- DistributedPartialResult
is returned from steps 2 and higher
Declaration Order¶
Use the specified order of declarations within a class:
your class definition should start with its public:
section, followed by
its protected:
section, and then its private:
section. If any of these
sections are empty, omit them.
The declarations generally should be in the following order:
Typedefs and Enums
Constants (
static const
data members)Constructors
Destructor
Methods, including static methods
Data Members (except
static const
data members)
Important
Friend declarations should always be in the private
section.
Typedefs¶
Use typedefs for template-based classes. It improves code readability and ensures that the same type is used in all cases.
// typedefs
typedef hash_map<UrlTableProperties *, string> PropertiesMap;
Defines¶
#define DAL_MY_DEFINE // external
#define __MY_DEFINE // internal
#ifndef __FILE_NAME_H__
#define __FILE_NAME_H__
#pragma directive should be indented
Common definitions¶
defined(__x86_64__) // defined for Linux 64bit and MacOS 64bit OSes
defined(__linux__) // defined for Linux and MacOS
defined(__APPLE__) // defined for MacOS
defined(_WIN64) // defined for Winddows 64bit
defined(_WIN32) // defined for Winddows 32bit & Winddows 64bit
defined(__ICL) // defined for Intel compiler. Has numeric value.
defined(__INTEL_COMPILER) // defined for Intel compiler. Has numeric value.
defined(_MSC_VER) // defined for Intel and MS compilers. Has numeric value.
Statements¶
Each line should contain at most one statement:
Not recommended
// Not recommended a++; b++;
Good
a++; b++;
The statement that follows conditional statement should be on a separate line:
Not recommended
// Not recommended if (condition) statement;
Good
if (condition) statement;
Comments¶
Comments should provide information that is not readily understandable from the code itself.
Linus Torvalds says:
NEVER try to explain HOW your code works in a comment: it’s much better to write the code so that the _working_ is obvious, and it’s a waste of time to explain badly written code.
Generally, you want your comments to tell WHAT your code does, not HOW. Also, try to avoid putting comments inside a function body: if the function is so complex that you need to separately comment parts of it, you should probably split it into smaller functions. You can make small comments to note or warn about something particularly clever (or ugly), but try to avoid excess. Instead, put the comments at the head of the function, telling people WHAT it does, and possibly WHY it does it.
Function Comments¶
Always document functions that are a part of an API.
Document locally used functions when they require clarification.
Variable Comments¶
In most cases, the name of a variable should be descriptive enough to give a good idea of what the variable is used for. In certain cases, more comments are required.
Class Data Members Comments¶
Each data member of a class should have a comment describing what it is used for.
If the variable can take sentinel values with special meanings, such as a null pointer or -1, document this.
private:
// Keeps track of the total number of entries in the table.
// Used to ensure we do not go over the limit. -1 means
// that we don't yet know how many entries the table has.
int _nEntries;
Function Declarations Comments¶
Every declaration of a non-trivial function should be preceded by comments that describe what the function does and how to use it.
Templates declaration¶
Empty template classes should have a
{}
.Template<...>
should all be in one line.
Deprecated API¶
To mark API in oneDAL as deprecated, do the following:
Mark deprecated API with
DAAL_DEPRECATED
define. For virtual functions useDAAL_DEPRECATED_VIRTUAL
define.
DAAL_DEPRECATED class A // Deprecated class { public: DAAL_DEPRECATED A() {} // Deprecated class constructor DAAL_DEPRECATED void func() {} // Deprecated function in class DAAL_DEPRECATED int classMember; // Deprecated class member };
Add a special tag to documentation comment:
Use
\DAAL_DEPRECATED
when you remove something from API.Use
\DAAL_DEPRECATED_USE{ newFunction }
when you introduce a new function to use instead of a deprecated one and want to reference it.These tags add standard phrases about deprecation. They are defined in our doxygen configurations.
/** * Description for function that will be removed * \DAAL_DEPRECATED */ DAAL_DEPRECATED int removedFunction() {} /** * Description for function with old name * \DAAL_DEPRECATED_USE{ newFunction } */ DAAL_DEPRECATED int oldFunction() {} /** * Description for new function */ void newFunction() {}
Note
To mark enums and their elements as deprecated, you only need to put the appropriate documentation tag in a doc comment.
Mark deprecated code with
@Deprecated
annotation before:
@Deprecated public class A // Deprecated class { @Deprecated public A() {} // Deprecated class constructor @Deprecated public void func() {} // Deprecated function in class @Deprecated public int classMember; // Deprecated class member };
Add special tag to documentation comment:
Use
@DAAL_DEPRECATED
when you remove something from API.Use
@DAAL_DEPRECATED_USE{ newFunction }
when you introduce a new function to use instead of a deprecated one and want to reference it.These tags add standard phrases about deprecation. They are defined in our doxygen configurations.
Notes¶
You can generate a reference to another functionality without \ref
:
Add the appropriate tag to documentation comment :
Use
\DAAL_DEPRECATED_USE{ newFunction }
to reference global functions.Use
\DAAL_DEPRECATED_USE{ ClassName::newFunction }
to reference functions that are defined inside classes.Use
\DAAL_DEPRECATED_USE{ \ref daal::<all namespaces to ClassName>::ClassName::newFunction "newFunction" }
if a reference is not parsed correctly by doxygen (quotation marks are a part of a command).
Add
DAAL_DEPRECATED
andDAAL_DEPRECATED_VIRTUAL
after\return
tag.
Programming guidelines¶
Local Variables¶
To make it easier for the reader to find the declaration, see what type the variable is, and what it was initialized to:
Place a function’s variables in the narrowest scope possible.
Initialize variables in the declaration.
Use initialization instead of declaration and assignment.
Not recommended
int i; i = f();
Good
// Declaration includes initialization. int j = g();
Not recommended
vector<int> v; v.push_back(1); // Prefer initializing using brace initialization. v.push_back(2);
Good
// Good -- v starts initialized, brace initialization is used. vector<int> v = {1, 2};
When you work with variables needed for if
, while
, and for
statements:
If the variable is not an object, declare it within the statement to confine it to the scope of the statement:
while (const char* p = strchr(str, '/')) str = p + 1;
If the variable is an object, declare it outside the loop.
As object’s constructor is invoked every time the object is created or enters the scope, and its destructor is invoked every time the object goes out of scope, you make your code inefficient by placing variable’s declaration inside a loop.
Not recommended
// Inefficient implementation: for (int i = 0; i < 1000000; ++i) { Foo f; // My ctor and dtor get called 1000000 times each. f.DoSomething(i); }
Good
// More efficient Foo f; // My ctor and dtor get called once each. for (int i = 0; i < 1000000; ++i) { f.DoSomething(i); }
Constants¶
Use constant variables or functions returning the constant value instead of hard coded constant values:
Not recommended
foo("Hard coded string"); // ... bar("Hard coded string");
Good
const char* hardCodedString() { return "Hard coded string"; } foo(hardCodedString()); bar(hardCodedString());
Static and Global Variables¶
Caution
Static variables of class type may cause hard-to-find bugs due to the undetermined order of construction and destruction.
Avoid using static variables of class type.
Use static variables within a function scope instead.
Undetermined order of construction and destruction¶
The order in which class constructors and initializers for static variables are called is only partially specified in C++. It may even change from build to build, which causes bugs that are difficult to find.
On program termination, global and static variables are destroyed. The order in which destructors are called is defined to be the reverse of the order in which the constructors were called. Since the order in which constructors are called is undertermined, so is the order of destructors.
Consider the following examples:
At program-end time, a static variable is destroyed, but the code is still running in another thread, tries to access the destroyed variable and, therefore, fails.
The destructor for a static string variable might be run prior to the destructor for another variable that contains a reference to that string.
Static variables within a function scope¶
A static variable within a function scope may be initialized with the result of a function, since its initialization order is well-defined and does not occur until control passes through its declaration.
This is why we reccomend using a static variable within a function scope if you need a static or a global variable of a class type. Create a function that returns the variable you need, but make sure that the function call does not happen when the program is terminating.
Not recommended
static SingletonClass inst; // Bad – initialization order is undefined.
Good
SingletonClass& getInst()
{
static SingletonClass inst;
return inst;
}
// Good – initialization happens when the function is called
Usage of const
¶
The general rule is to use const
declaration whenever possible.
For details, see:
The advantages of using const
:
Implies semantic constraints handled by compiler
Adds a level of compile-time type checking
Helps to specify the logic of the code
Helps to make the code consistent and self-documented
Helps to find possible errors
// Good
void foo(Collection& c)
{
//we need to save initial size, then make sure it will not be modified
const size_t len = c.size();
}
Usage of const
in Functions¶
If a function guarantees that it will not modify an argument passed by
reference or by pointer, the corresponding function parameter should be
a reference-to-const (const T&
) or pointer-to-const (const T*
),
respectively.
void func(const Foo& foo)
{
// foo should not be modified by this function, 'const' states it clearly
}
Usage of const
Methods¶
Declare methods to be const
whenever possible:
Accessors should almost always be
const
.Other methods should be
const
if they do not modify any data members, do not call any non-const methods, and do not return a non-const pointer or non-const reference to a data member.
class Foo
{
public:
int getValue() const; // does not modify data members
void setValue(int); // can modify data members
protected:
//...
}
Usage of const
and mutable
Data Members¶
Consider making data members
const
whenever they do not need to be modified after construction.class Foo { public: Foo (int val): _val(val){} // never going to be changed protected: const int _val; }
const
usage is sort of is viral: once appeared in a function it
causes its propagation in related functions. But this is an excellent
feature!
If a class needs to modify its member in a
const
function (e.g. when implementing caching), then declare this data membermutable
.class Foo { public: Foo():_isInitialized(false){} int get () const { if(!_isInitialized) { _value = calcValue(); _isInitialized = true; } return _value; } private: mutable int _value; mutable bool _isInitialized; }
Caution
When you use
mutable
in multi-threaded program, make sure thread-safe access is provided.
Ownership (RAII – Resource Acquisition Is Initialization)¶
C++ provides constructor and destructor symmetry which can naturally be used to manage resource allocation and deallocation pairs, e.g. dynamically allocated memory. This bookkeeping technique is called Resource Acquisition Is Initialization (RAII).
Holding a resource is tied to the object’s lifetime:
Resource allocation (acquisition) is done by the constructor during object’s creation. More specifically, during initialization.
Resource deallocation (release) is done by the destructor during object’s destruction. If objects are destroyed properly, resource leaks do not occur.
class FileHolder { public: FileHolder (FILE* f): _file(f) { } ~FileHolder { if(_file) fclose(_file); } private: FILE* _file; }
Place resource acquisition in a separate instruction:
Not recommended
// Not recommended void foo(Foo* data, int priority); int priority(); foo(std::shared_ptr<Foo>(new Foo()), priority()); // if priority() throws exception then the memory allocated by new can be lost
Good
// Preferable void foo(Foo* data, int priority); int priority(); std::shared_ptr<Foo> ptr(new Foo()); foo(ptr, priority());
Smart Pointers¶
“Smart” pointers are partial case of RAII idiom. These classes act like pointers, e.g. by overloading the *
and ->
operators providing additional features, such as automatic memory management.
Some smart pointer types can be used to automate ownership bookkeeping.
Passing parameters of non-primitive types by reference¶
Parameters of complex types should be passed by reference.
Use const to emphasize they are not modified.
This makes code more efficient: copy constructors are not called. It also allows to avoid unwanted inexplicit casting of types.
Not recommended
// Not recommended class string { public: string(const string& other);// allocates memory buffer and copies ‘other’ content into it. }; void print(string s) {// Inefficient: string(string) is called std::cout << s; }
Not recommended
// Not recommended class Base { Base(const Base& other); virtual printMe() const { std::cout << "Base"; } }; class Derived: public Base { virtual printMe() const { std::cout << "Derived"; } }; void print(Base b); // Reference was forgotten!!! Derived d; print(d); // Logical error: Base was created, polymorphic behavior is lost!
Preprocessor Macros¶
Caution
Be very cautious with macros. Prefer inline functions, enums, and const variables to macros.
Macros mean that the code you see is not the same as the code the compiler sees. This can introduce unexpected behavior, especially since macros have global scope.
Use a
const
orenum
instead of#define NAME value
:const unsigned int MAX_CAPACITY = 1000;
If constants define a related set, make them an enumerated type:
class MyNet { public: enum { eMaxUsers = 255; } ... private: enum { eMaxServers = 10; } };
Namespaces¶
Use namespaces instead of name prefixes to prevent name conflicts.
Never put
using
directive before#include
or in header files outside of functions, methods, or classes.Caution
Since you do not know in what order directives may appear in the code below, this can leas to a name confict.
You may use a
using
declaration anywhere in a .cpp file.
Not recommended
// Not recommended
// Header foo.h
namespace foo
{
class Bar{}
}
using namespace foo;
Not recommended
// client_code.cpp
#include “foo.h”
struct Bar { int code; }
Bar res; // Error: ambiguous
Functions¶
Avoid multiple nesting levels, e.g. use small functions for complex conditions checking.
Never write code that depends on the order of function’s arguments.
Write short functions¶
Prefer small and focused functions, as they promote clarity and correctness. Long functions are sometimes appropriate, so no hard limit is placed on a function’s length.
Note
If a function is longer then 40 lines, consider breaking it into smaller and easier-to-maintain pieces of code. Make sure that by doing this you are not harming the structure of the program.
Linus Torvalds says:
Functions should be short and sweet, do just one thing and do that well.
The maximum length of a function is inversely proportional to the complexity and indentation level of that function. So, if you have a conceptually simple function that is just one long (but simple) case-statement, where you have to do lots of small things for a lot of different cases, it’s OK to have a longer function.
Another measure of the function is the number of local variables. They shouldn’t exceed 5-10, or you’re doing something wrong. Re-think the function, and split it into smaller pieces. A human brain can generally easily keep track of about 7 different things, anything more and it gets confused. You know you’re brilliant, but maybe you’d like to understand what you did 2 weeks from now.
Function’s arguments¶
Do not rely on the evaluation order of function’s arguments, as it is undefined. Directly specify evaluation order instead.
Not recommended
// Not recommended
void foo(int, int);
int count = 5;
foo(++count, ++count); // Error: arguments evaluation order is undefined
int inc(int &val) { return ++val; }
foo(inc(count), inc(count)); // Error: arguments evaluation order is undefined
Good
int count = 5;
int tmp = ++count;
foo(tmp, ++count);
Infix and postfix increment operators¶
Prefer infix increment operator to the postfix one.
When a variable is incremented (++i
or i++
) or decremented (--i
or i--
)
and the value of the expression is not used, one must decide whether to
pre-increment (pre-decrement) or post-increment (post-decrement).
When the return value is ignored, the “pre” form (++i
) is never less
efficient than the “post” form (i++
), and is often more efficient. This
is because post-increment (post-decrement) requires a copy of i
to be
made, which is the value of the expression. If i
is an iterator or other
non-scalar type, copying i
could be expensive. Since the two types of
increment behave the same when the value is ignored, why not just always
pre-increment?
Not recommended
for(size_t i = 0; i < n; i++) // Not recommended
Good
for(size_t i = 0; i < n; ++i) // Preferable
Assertions¶
Assert is a powerful tool. It makes code self-documented and self-checking in run-time. Some code quality standards define asserts usage paradigm “the more the better”, i.e. the number of asserts in the code characterizes its quality.
Use
assert
to check invariants and internal assumptions.Make sure that
assert
usage does not cause side effects:
Not recommended
// Not recommended
assert( ++i < limit); // Error: ++i happens in debug only
Classes¶
Use classes with strictly defined purposes.
Avoid fat-interface classes.
Do not add a new method to the class if it is not supposed to modify its protected data nor does it call its protected methods.
Use friends only when it is absolutely necessary.
Declare virtual destructor in a polymorphic base class.
Initialization¶
If you define member variables inside a class, provide an in-class initializer for each member variable or write a constructor (it can be a default constructor).
Caution
If you do not declare any constructors yourself, then the compiler will generate a default constructor for you. This might result in some fields not being initialized at all or initialized to inappropriate values.
Initializers¶
Use initializers list instead of assignment in constructors:
Not recommended
// Not recommended
class Foo
{
Foo(){
_string = "AAA";
}
// This is an equivalent to
// _Foo(): string() { _string = "AAA"; }
};
Good
// The following is more efficient:
class Foo
{
Foo():_string("AAA"){} // string(const char*) only
};
Exception: Acquisition of resource allocated by new
is preferable to perform as assignment in the constructor body:
Not recommended
// Not recommended
class Foo
{
Foo(): _ptrA(new A()), _ptrB(new B()){
}
};
// Memory can be allocated first, then passed to _ptrA and _ptrB.
// If B::B() throws exception, then memory allocated by new A() is lost
Constructors¶
Avoid doing complex initialization in constructors (in particular, initialization that can fail or that requires virtual method calls).
The problems with doing work in constructors are:
There is no easy way for constructors to signal errors, short of using exceptions.
If the work fails, you now have an object which initialization code have failed, so it may be an indeterminate state.
If the work calls virtual functions, these calls will not get dispatched to the subclass implementations. Future modifications of your class can quietly introduce this problem even if your class is not currently sub-classed.
If someone creates a global variable of this type (which is against the rules, but still), the constructor code will be called before
main()
, possibly breaking some implicit assumptions in the constructor code.
Constructors should never:
call virtual functions
attempt to raise non-fatal failures.
Note
If your object requires non-trivial initialization, consider using a factory function or init()
method.
Note
Destructors should not call virtual functions or throw exceptions either.
Constructors for non-trivial types¶
Use explicit copy constructors for non-trivial types:
class Base
{
explicit Base(const Base& other);
virtual printMe() const { std::cout << "Base"; }
};
class Derived: public Base
{
virtual printMe() const { std::cout << "Derived"; }
};
void print(Base b);
Derived d;
print(d); // Compile time error
This helps to prevent unwanted conversion to the base type. It also helps to indicate the places where duplication of the class data is required, which can probably be handled more efficiently.
Clone all parts of an object¶
Call base class copy constructor and copy constructors for all members of the class. The same goes for assignment operation.
Assignment¶
Operator =
should check for ‘this’ argument passed and return a reference to the object.
New and delete¶
Provide overloaded
new
anddelete
consistently: overloadedvoid operator new(params)
should be matched withvoid operator delete(void*, params)
, the same goes fornew []
,delete []
.Provide all standard versions of
new()
operator.
Access Control¶
Make data members private, and provide access to them through
set
andget
methods as needed.Do not open data except for static const values on a class. Exception: simple structures.
Template metaprogramming¶
Avoid complicated template programming. It allows extremely flexible interfaces that are type safe and high performance, but, at the same time, the techniques used in template metaprogramming are often obscure and hard to maintain.
Put considerable effort into minimizing and isolating the complexity.
Make sure that tricky code is especially well commented: carefully document how the code is used, say something about what the “generated” code looks like.
Pay extra attention to the error messages that the compiler emits when users make mistakes. The error messages are a part of your user interface, and your code should be tweaked as necessary so that the error messages are understandable and helpful.
Use class member types (
typedefs
) to increase readability of template classes:template<typename T> class Point { public: typedef T value_type; typedef std::shared_ptr<Point> pointer; ... static pointer create() { return pointer(new Point()); } };
Hiding the implementation in Model
classes¶
Separate the interface and the implementation in Model
classes. This
allows to change the implementation specifics without changing the public interface.
The implementation of the Model
class should consist of the following three parts:
a class with a public interface
an internal class with the implementation
an internal class that maps the interface to the implementation
Below you can find the details of what each class should consist of.
Public interface.
This class defines the public API of the model via purely virtual functions and should not provide any implementation.
namespace some_namespace { class Model : public daal::algorithms::Model { public: virtual int foo() = 0; }; }
Internal class with the implementation.
This class contains everything that is needed for implementation:
functionality of the public API of the model
training algorithm
prediction algorithm
namespace some_namespace { namespace internal { class ModelInternal { public: int fooImpl() {...} /* All the methods that are needed to implement training and prediction algorithms go here */ }; } }
Internal class that maps the interface to the implementation.
This class can be used inside the respective algorithm’s kernel. To unify the implementation of the methods of this class,
typedef ImplType
should be defined within this class and point to the type with the implementation.
namespace some_namespace { namespace internal { class ModelImpl : public some_namespace::Model, public some_namespace::internal::ModelInternal SDL{ public: typedef some_namespace::internal::ModelInternal ImplType; int foo() DAAL_C11_OVERRIDE { return ImplType::fooImpl(); } }; } }
SDL Requirements¶
SDL stands for Security Development Lifecycle and discusses the best known practices:
Handling Primitive Data Types¶
SDL requires to explicitly catch type conversion errors:
Detect and handle runtime errors that could occur upon conversion.
Detect invalid type conversions by using exceptions or post-validation code where applicable.
For oneDAL it means that every method where type conversion occurs
should return Status
explicitly or by reference.
Below we discuss conversion between integer types in different cases:
Code is performance-oriented (i.e. recursive functions, loops with indefinite or a large number of iterations)
Note
Conversion of floating point data types is not a security problem.
Important
It is required to add error code into error_handling.cpp.
Code is performance-oriented¶
In this case, it is a good practice to use DAAL_ASSERT
:
for (size_t i = 0; i < nrows; i += di)
{
if (i + di > nrows)
{
di = nrows - i;
}
for (size_t j = 0; j < ncols; ++j)
{
NumericTableFeature & f = (*_ddict)[j];
char * pc = (char *)_arrays[j].get();
DAAL_ASSERT(pc)
char * ptr = pc + (idx + i) * f.typeSize;
internal::getVectorUpCast(f.indexType, internal::getConversionDataType<T>())(di, ptr, lbuf);
for (size_t k = 0; k < di; ++k)
{
buffer[(i + k) * ncols + j] = lbuf[k];
}
}
}
Code is not performance-oriented¶
For code that is not performance-oriented, consider whether or not the conversion is safe.
Note
Type conversion is safe if the range of the destination type contains the range of the source type. For example, the following would be considered a safe conversion:
from
byte
toint
from
byte
tosize_t
from
short
toint
from
unsigned short
tounsigned int
See examples of safe conversions for more details.
Conversion is safe¶
SDL allows two options:
a comment justifying why the type converstion is safe for each instance where it is used
DAAL_ASSERT
Note
Using DAAL_ASSERT
is a better solution as macros is more informative than a comment.
Conversion is not safe¶
In this case, every method where type conversion occurs should return Status
.
Examples of type conversion verification¶
In the examples below, <error code>
depends on the context.
Conversion from int
to size_t
¶
int blockIndex;
// ...
DAAL_CHECK(blockIndex >= 0, <error code>)
size_t N = (size_t)blockIndex;
Conversion from size_t
to int
¶
size_t blockIndex;
// ...
DAAL_CHECK(blockIndex <= INT_MAX, <error code>)
int N = (int)blockIndex;
Examples of safe conversions¶
The conversion is safe if a variable being converted is declared as const
and the following is true for the value it is initialized with:
It is unrelated to the input data.
It is within the range of the destination type.
const int n = 1024 DAAL_ASSERT(n >= 0) size_t uN = (size_t)n;
The conversion is safe if the following is true for the value that a variable being converted is initialized with:
It is with the ragne of the destination type
It cannot leave that range during the workflow.
size_t getMaxElement(const int *elementsArray, size_t nElements) const { status = services::Status(); int max = 0; for(size_t i = 0; i < nElements; i++) { if (max < elementsArray[i]) { max = elementsArray[i]; } } // max is initialize by 0 and during workflow it can increase only, // so max is equal to 0 or more than 0, such conversion is safe DAAL_ASSERT(max >= 0) return (size_t)max; }
Working with Pointers¶
When you work with pointers, follow security best practices listed below:
Handle allocation errors¶
Always check the result of a memory allocation operation.
Explicitly handle all errors.
Handle errors¶
Methods that allocate memory should return Status
object.
services::Status initialize()
{
T * newData = static_cast<T *>(services::daal_malloc(_size * sizeof(T)));
DAAL_CHECK(newData, services::ErrorMemoryAllocationFailed);
// ...
services::daal_free(oldData);
return services::Status();
}
Do not access freed memory¶
To avoid accessing memory after it is released:
Explicitly set pointers stored in variables to
NULL
.Check against
NULL
when in doubt.
Be careful of race conditions when freeing memory and setting the
related pointer to NULL
asynchronously.
Avoid null pointer de-reference errors¶
To prevent reading or writing to memory with a null pointer, explicitly validate a pointer against NULL
before using it.
Avoid freeing the same buffer more than once¶
To avoid freeing buffers that have already been freed,
explicitly invalidate pointers to the buffers that have already been freed by setting pointer to NULL
.
Do not take the size of a pointer to determine the size of the object it points to¶
This mistake may go unnoticed as the outcome might be the same on certain architectures. Bear in mind that this might not be the case if the code is run in another environment.
Set up a pointer to zero after freeing memory buffer¶
According to SDL, you should set a pointer to zero after using it.
services::Status initialize()
{
T * data = static_cast<T *>(services::daal_malloc(n * sizeof(T)));
DAAL_CHECK_MALLOC(data)
// ...
services::daal_free(data);
data = NULL;
return services::Status();
}
Working with Buffers¶
When you work with buffers, follow security best practices listed below:
Use a special function for copying bytes between buffers¶
Use daal_memcpy_s
to copy memory:
int daal_memcpy_s(void *destination, size_t numberOfElements, const void *source, size_t count);
If its returned value is not equal to zero:
change the library status to
ErrorMemoryCopyFailedInternal
free the memory that was used for a copy
set the buffer to
NULL
Use a special function for memory allocation and zero-fill buffers¶
Use daal_calloc
or service_scalable_calloc
for memory allocation and zero-fill buffers.
Note
You may use daal_calloc
or service_scalable_calloc
in performance-oriented cases as well,
but make sure you initialize all data before using it.
Check buffer size¶
To prevent buffer overflow when performing multiplication operations on the size of the buffer, check its size:
DAAL_OVERFLOW_CHECK_BY_MULTIPLICATION(type, multiplier1, multiplier2)
Handling exceptions¶
Exceptions should be handled securely:
Provide messages and details that contain no secure information.
Use oneDAL recommended way for providing your exception messages.
Handling null byte injection¶
Check input strings on null byte injection to prevent possible security vulnerabilities.
Verify that third-party and system functions you use can appropriately handle overly-long, malformed, and non-printable symbols.
Caution
If you are working with user-defined data, be careful as you might encounter null characters there.
For example: after copying data from user-defined source into your internal buffer and inserting a null symbol in the end of it as a flag of the data ending there, you might encounter the user’s null symbol somewhere else in the data and take is for your flag.