Business Organisations all around the world are investing heavily in storing and maintaining their data. With the data volume growing like never before it is becoming increasingly complex to handle this data.
Apart from storage, this data is being used for Daily Operations and sometimes being stored with the aim to perform analysis and get important insights out of it. These are the use cases where Microsoft SQL Server and Microsoft SQL Server Integration Services come into the picture to provide solutions and facilitate data handling.
This article will help you understand what Microsoft SQL Server is, its key features, the functionality of SQL Server Integration Services [SSIS], related terminology, and components in SSIS, and will provide you with an understanding of the SSIS Development Environment.
Table of Contents
What is Microsoft SQL Server?
Image Source
Microsoft SQL Server is a Relational Database Management System (RDBMS).
Since it is an RDBMS-based system it can help users to store or retrieve data to the applications connected. The platform supports multiple utilities that help implement Transactional Processing, Business Intelligence. Data Analytics etc. This server platform is considered to be amongst the best in the industry along with Oracle Database and IBM DB2.
The platform supports two versions of Structured Query Language [SQL], namely ANSI [American National Standards Institute] SQL and proprietary Transact SQL [T-SQL]. The proprietary version has additional features that enable users to declare variables, handle exceptions, etc. The Development Environment in which the user interacts with the Microsoft SQL Server is called SQL Server Management Studio[SSMS].
The Key Features of Microsoft SQL Server
The key features of Microsoft SQL Server are as follows:
- Data Recovery at a Fast Pace
- Encryption of Stored Data
- Smarter Processing of Queries
- Support for Analytics
- Multiple Integrations
1. Data Recovery at a Fast Pace
Since there are situations like Hardware failure or Network failure that can lead to Server crashes during the processing of data. The Server and Database are required to be reverted back to the previous state before the crash to prevent the loss of important data.
Earlier this process was solved using proprietary and slow external methods, but now the Data Recovery can be implemented at an impeccably faster rate due to the availability of Microsoft SQL Server’s Accelerated Data Recovery Feature.
This solution is based on the redevelopment of Transaction logs which can lead to a drastic decrease in the time required for Data Recovery.
Image Source
2. Encryption of Stored Data
Privacy is a big concern for the enterprise community nowadays. Hence it is essential for database solutions to focus on the privacy and security of the data stored in the database.
With the updates in Microsoft SQL Server 2016, there was the inclusion of a new feature called Always Encrypted which achieves security by allowing transparent Column Encryption without giving Database Administrators access to the Decryption Keys for the database.
Image Source
3. Smarter Processing of Queries
Implementing Complex queries on large databases can be quite taxing, requiring immense processing power, and can even take a long time for execution. Microsoft SQL Server aims to solve this issue by using an in-built Query Optimizer that can generate an execution plan for the queries that are supposed to be executed.
The functions performed by the Query Optimizer include Dynamic Memory Grants for Rowstore Tables, Batch Mode processing in Rowstore, Table Variable Deferred Compilation, etc.
Image Source
4. Support for Analytics
Every business aims to utilize the data being stored and managed to generate useful insights for making future strategic decisions. Microsoft SQL Server facilitates this use case of analyzing the stored data in the database by leveraging the integration with Microsoft SQL Server Analysis Service (SSAS) which is a Data Mining and Online Analytical Processing tool.
5. Multiple Integrations
There are a lot of other applications for the data being stored in the database which could be required to perform various tasks such as Data Extraction, Transformation, Data Loading, Data Migration, etc. Microsoft SQL Server facilitates this functionality by using Microsoft SQL Server Integration Services (SSIS)
Hevo Data, a No-code Data Pipeline helps to transfer data from 100+ sources to a Data Warehouse/Destination of your choice and visualize it in your desired BI tool. Hevo is fully managed and completely automates the process of not only loading data from your desired source but also enriching the data and transforming it into an analysis-ready form without having to write a single line of code. Its fault-tolerant architecture ensures that the data is handled in a secure, consistent manner with zero data loss.
It provides a consistent & reliable solution to manage data in real-time and always have analysis-ready data in your desired destination. It allows you to focus on key business needs and perform insightful analysis using a BI tool of your choice.
Check out what makes Hevo amazing:
- Secure: Hevo has a fault-tolerant architecture that ensures that the data is handled in a secure, consistent manner with zero data loss.
- Schema Management: Hevo takes away the tedious task of schema management & automatically detects the schema of incoming data and maps it to the destination schema.
- Minimal Learning: Hevo, with its simple and interactive UI, is extremely simple for new customers to work on and perform operations.
- Hevo Is Built To Scale: As the number of sources and the volume of your data grows, Hevo scales horizontally, handling millions of records per minute with very little latency.
- Incremental Data Load: Hevo allows the transfer of data that has been modified in real-time. This ensures efficient utilization of bandwidth on both ends.
- Live Support: The Hevo team is available round the clock to extend exceptional support to its customers through chat, email, and support calls.
- Live Monitoring: Hevo allows you to monitor the data flow and check where your data is at a particular point in time.
Simplify your data analysis with Hevo today! Sign up here for a 14-day free trial!
What are Microsoft SQL Server Integration Services [SSIS]?
Image Source
Microsoft SQL Server Integration Service is a Data Warehousing tool that can help perform a wide range of integration tasks such as Data Extraction, Transformation, and Data Loading Transformations such as Cleaning, Aggregating, Merging Data, etc.
As the name implies it enables the user to load the data from one database to another such as SQL Server, Oracle, Excel file, etc. To ensure ease of use for the user, it also contains graphical tools and window wizard workflow functions such as sending email messages, FTP operations, and data sources.
This tool can be used to perform two main functions:
- Data integration from multiple sources to provide data in a central place to the users.
- Workflow management depends on various parameters to help in performing analytical tasks.
SQL Server Integration Services [SSIS] Terminology
Here are some of the key terms that will prove to be useful while working with SSIS:
- SQL Server Integration Services: Package
- SQL Server Integration Services: Task
- SQL Server Integration Services: Component
- SQL Server Integration Services: Execution
- SQL Server Integration Services: Deployment
- SQL Server Integration Services: Project
- SQL Server Integration Services: Solution
- SQL Server Integration Services: SSIS Runtime Engine
1. SQL Server Integration Services: Package
It refers to the central component of the SSIS codebase. It is the collection of one or more operations that might be invoked together.
2. SQL Server Integration Services: Task
It refers to a single operation within a Package. There are diverse types of Tasks depending upon the SSIS package being used.
3. SQL Server Integration Services: Component
It refers to any part of the Data Pipeline such as Source, Destination, Transformation Logic, etc.
4. SQL Server Integration Services: Execution
Execution refers to the act of invoking or running, the logic in an SSIS Package. These Packages can be directly executed from a properly configured instance of SQL Server.
5. SQL Server Integration Services: Deployment
It refers to the process of pushing a fully developed SSIS project from the development workstation to an instance of SQL Server. Deployment can take place when the fully-developed SSIS project is pushed from the development workstation to a SQL Server instance.
This is where it can be executed either through a scheduling tool such as SQL Server Agent or manually. Deployment is generally more complex than simply copying code from one machine to another, although SQL Server does an impeccable job of hiding this complexity for most of the deployments.
6. SQL Server Integration Services: Project
Projects refer to the functional units in which the SSIS source code is arranged. A Project may consist of multiple packages. Generally, when you deploy SSIS code, the entire project is deployed to the server.
7. SQL Server Integration Services: Solution
A Solution refers to the logical grouping of related projects.
8. SQL Server Integration Services: SSIS Runtime Engine
It refers to the logic that allows a package to run. The SSIS packages are run in SQL Server Data Tools [SSDT] they use SSIS runtime in the development machine. Once you’ve deployed the code to SQL Server, any execution run on that server will utilize the server’s SSIS runtime to function efficiently.
What are the Components of SSIS?
The following are components of SSIS architecture:
- Control Flow
- Precedence Constraints
- Task
- Containers
- Data Flow
- SSIS Packages
- Parameters
- Variables
- Expressions
1. Control Flow
Control Flow refers to the central logic of the SSIS Package. It houses Containers, Stores, and Tasks. This central logic helps decide the order of execution for all its components.
Control flow is the default view when you create or open a package. The Control Flow tab, at the top left of an opened package, displays the control flow surface. This would be blank on a new package. It is responsible for directing which tasks will execute and in what order.
Control flow with six tasks and a container is illustrated in the image below:
Image Source
Event Handlers are a special type of SSIS Control flow that has been designed to execute when a specific event occurs. The following figure shows an event handler set to execute when the executable (here, the packages) encounters an error (OnError is the default setting).
In this instance, the Event Handler logic has not yet been set up. To create an Event Handler for this event and executable, you can click on the hyperlinked message in the middle of the design surface. The SSIS toolbox for this tab depicts all the tasks commonly available in the control flow.
Event Handlers are useful, but make sure you don’t overuse them. Since the Event Handler logic can be set up individually for every container and task, the overuse of Event Handlers can create a logic mess that becomes very difficult to debug and maintain.
Image Source
2. Precedence Constraints
Image Source
These are the components that direct tasks to be executed in a particular predefined order. It runs the destination tasks based on the result of the previous task. It is also responsible to define the workflow of the entire SSIS package. If no precedence constraints are defined, all the containers and tasks in that package will be invoked at the same time.
It looks like a simple connector at first glance, however, it can do so much more than simply defining the order of operations. By double-clicking the precedence constraint, you can open the editor for that object, revealing more configuration options.
Image Source
Here are a few available configuration options:
- Value: This option can be used to set the type of constraint. By default, the constraint has been set to Success, which means that the downstream task will execute only after the successful execution of the upstream task. You can choose Completion or Failure if needed.
- Expression: If the evaluation operation has been set to use an expression as part of the constraint, you can define the expressions in the field.
- Evaluation Operation: This option lets you set this precedence constraint to be used solely as a constraint (the default). It can also be bound by expression.
- Multiple Constraints Behavior: If the downstream task has multiple constraints bound to it, this selection would define whether all those constraints are being met (the default option) or if only one of them must be met before the task gets executed.
Image Source
You can use precedence constraints and containers to modulate the complexity of the workflow of the package according to your needs.
3. Task
Task refers to an individual unit of work. In SSIS, the user is required to use the drag and drop interactive interface to design the process and logic surface and configure them.
SSIS offers dozens of in-built task types that have the capability of handling almost any ETL need. When you select the Control Flow tab, the SSIS toolbox lists all the available tasks. Here are a few of the most commonly used tasks:
- Execute SQL Task: This task lets you execute the SQL against a relational database.
- Script Task: For situations that need more flexibility than what is found in the built-in tasks, you can use the Script task to create highly customized ETL logic through VB.NET or C#.
- Data Flow Task: This is the most useful task of the lot, since it contains dozens of data transformation components.
- File System Task: You can use this task to interact with the file system, including moving, copying, or deleting directories or files.
- Execute Package Task: Packages can invoke other packages through the Execute Package Task.
- Execute Process Task: This task lets you trigger logic in a batch or executable file.
4. Containers
It refers to a unit that groups tasks together into a cluster of work. It provides visual consistency and allows you to declare variables and event handlers. The three types of containers in SSIS are:
Sequence Container
This container allows you to organize subsidiary tasks by grouping them together and allowing you to apply transactions or assign logging to the container.
For-Loop Container
This container is similar to a Sequence Container but additionally, it allows you to run tasks multiple times.
For-Each Loop Container
This container is similar to For-Loop Container but this loops over a set of objects like files in a folder. It comes in handy when processing different but identically shaped sets of data. For instance, when exporting or ingesting flat file data.
5. Data Flow
Data flow refers to the main use case for the SSIS tool i.e. extraction, transformation, writing data to the destination, etc. This task varies from the others in the list since it has its own child elements, called components, that can be connected to create end-to-end data flow operations.
You can access the Data Flow through the Data Flow tab in the package editor. If a package contains multiple data flows, you can utilize the Data Flow Task drop-down menu to select the data flow to edit.
Image Source
Here are a few of the many components available:
- Lookup: This can be used for comparing one set of data to another for key retrieval or validation.
- Derived Column: This comes in handy when you’re looking to perform lightweight data type changes, cleansing operations, or combining or splitting string values.
- Sort: This performs a sort operation against a set of data in the data flow.
- Union All: This brings together two or more inputs of data into a single stream, similar to the UNION ALL operator in T-SQL.
- ODBC Source and Destination: You can use this to retrieve data from or write data to an ODBC connection.
- OLE DB Source and Destination: This can be used to retrieve data from or write data to relational data stores by leveraging Microsoft OLE DB Driver.
- Script Component: This is a very versatile scripting environment that can be used to create customized transformation logic in VB.NET or C#.
- Conditional Split: This can be used to logically separate one set of data into two or more sets.
- Flat File Source and Destination: This component can be used to read from or write to flat files.
- Merge Join: Similar to the join operation in the Database engine, this component can take two streams of input and perform an outer, inner, or left join.
Image Source
6. SSIS Packages
SSIS refers to the collection of tasks and functions in which constraints help manage the order in which tasks are executed.
Image Source
7. Parameters
These are similar to variables but can be set outside a Package. These also store the information which is required to start a package.
8. Variables
Similar to any good programming environment, SSIS allows user-defined variables that can be dynamically manipulated or statically set during package runtime.
Variables can be defined at the package level, and by default, can be accessed by any task or component within the package. Every variable will have the following attributes:
- A Scope: By default, every variable is scoped at the package level. You can also limit the scope to a container or a task. But, this is a very rare situation that needs the variable scope to be changed.
- A Name: For identification.
- A Value: For some data types, like the string data type, you can leave the value blank. For others like the integer or DateTime data types, you need to provide a default value.
- A Data Type: Every variable in SSIS is created with a specific data type, and any values written to that variable must be aligned to the constraints of that variable. For instance, the variable might fail if you try to load a value of February 30th to a DateTime variable or if you unwittingly try to assign the value “red” to an Int32 variable. Every one of the SSIS variable data types has an analog in SQL Server.
- An Expression (optional): You can set variables to a static value or you can configure to use an expression for more dynamic behavior.
Image Source
9. Expressions
SSIS utilizes its own expression language to create dynamic behaviors. Nearly every component and task in SSIS can use expressions to replace otherwise static values with a little bit of code that gets interpreted at runtime. Here are a few practical uses of expressions in SSIS:
- Replacing an output file name with one that includes the current time and date in that file name.
- Substituting a hard-coded database connection string with an expression that utilizes a parameter that allows the connection string to be supplied at runtime.
- Replacing a static query with one that uses a customized WHERE clause.
- Creating a variable that utilizes an expression to combine several other variables together (e.g., a directory path combined with a relative filename).
Expressions can be used across SSSIS. Many properties of components, tasks, and containers can be modified to substitute expression values for static text.
If you wish to view the expressions available for a given component or task in SSIS, you can choose the item (indicated by the arrow in the following image) and find the Expressions item in the Properties Window (displayed inside the rectangle).
Image Source
Once you’ve selected the property and item in which the expression will apply, you can use the Expression Builder window, as depicted in the following image, to design the expression. The list of parameters, variables, and functions can offer a lot of help when building your expression.
Expressions in SSIS hold a lot of power, but it might take you some time to get used to them. You can use the Expression Builder to get acquainted with the finer details of creating SSIS expressions.
Image Source
There are two methods provided to work with SSIS packages namely SSDT and SSMS.
- SQL Server Data Tools (SSDT): It is used for developing the Integration Services packages that a business solution requires and it also provides the Integration Services project in which you can create packages.
- SQL Server Management Studio (SSMS): It is used for managing packages in a production environment.
SSDT tools help you perform multiple tasks such as running the SQL Server Import and Export Wizard to create basic Packages for copying and storing data from source to destination and for the creation of complex control and data flow, event-driven logic, and logging of data. They also help in testing and debugging Packages by using the troubleshooting features in SSIS.
You can save copies of Packages to SQL Server MSDB database and create a deployment utility that can install packages and their dependencies on other computers/servers.
Official Documentation regarding SSDT can be found here.
SSMS Studio provides the functionality to manage Packages, monitor running Packages, and determine impact and data lineage for Integration Services and SQL Server objects. Using the SSMS Studio you can create folders to organize packages, run them using Execute Package Utility, Import and Export packages to and from the SQL server msdb database.
Official Documentation regarding SSMS can be found here.
The SSIS Development Environment
SSIS Development Environment utilizes multiple-document interfaces like that of Visual Studio and enables users to open or terminate the SSIS packages independently.
The environment has the following components:
- Solution Explorer
- Properties Window
- SSIS Toolbox
- SSIS Package
- Connection Manager Tray
1. Solution Explorer
This file management system helps browse the project and all the associated files within it. This window allows you to delete, add, and rename files in the current solution seamlessly.
2. Properties Window
It is the Context-Aware Window that displays the properties for the currently selected item.
3. SSIS Toolbox
This Context-Aware Toolbox provides with a wide range of operations that are available in that SSIS package depending on the model that you are working in i.e. Control Flow or Data Flow.
4. SSIS Package
It displays the current package you are working on as the environment supports working with multiple packages.
5. Connection Manager Tray
It is the area where currently available connections are shown.
Image Source
What are Connection Managers?
Connection Managers are the logical representation of the connection. Properties of the Connection Manager can be adjusted and at run time, each instance of the Connection Manager creates a connection that has different attributes.
There are multiple types of Connection Managers that can be used, some are installed at Setup, and some can be downloaded from Microsoft’s Website. The user can also create their own Connection Manager.
A Connection Manager can be used to define the data structures that can be written to or read from during the execution of the package. A Connection Manager can refer to a database file (such as Microsoft Access), relational database, a web service, or a cloud structure (similar to blob storage).
Once it has been defined within an SSIS package, a Connection Manager can be used by any of the components or tasks within that package. The Connection Manager can be used to write data, retrieve data, and validate or manipulate the data as it passes through the package logic (transformation). Connection Managers stand out because they can be defined at the project level or the package level. The only difference between the two is the scope of visibility.
Package connections can be used within the package in which they are created, while project connections can be accessed by any package in the project. The latter comes in handy when creating a project that can use the same connection in various packages.
In the figure shown below, both the project and package connections can show up in the Connection Managers tray. Plus, any project connections are also shown in the Solutions Explorer under the Connection Managers folder. In the tray, any project-scoped connection will showcase a project prefix in the name.
Image Source
If you want more information on Connection Managers in Microsoft SQL Server you can find it here.
Image Source
Conclusion
In this article, you learned what Microsoft SQL Server is, its key features, SQL Server Integration Services [SSIS], related terminology, and components of SSIS, and developed an understanding of SSIS Development Environment.
Automated integration with your Data Warehouses/multiple data sources and the Analytics database can make your choice much simpler as a lot of necessary features can be integrated readily.
Integrating and analyzing data from a huge set of diverse sources can be challenging, this is where Hevo comes into the picture. Hevo Data, a No-code Data Pipeline helps you transfer data from a source of your choice in a fully automated and secure manner without having to write the code repeatedly. Hevo with its strong integration with 100+ sources & BI tools, allows you to not only export & load Data but also transform & enrich your Data & make it analysis-ready in a jiffy.
Get started with Hevo today! Sign up here for a 14-day free trial!