[Coursera Course Notes] - Explore Core Data Concepts in Microsoft Azure

Not all data is structured or even semi-structured. For example, audio and video files and binary data files might not have a specific structure. They refer to as unstructured data. Depending on the type of data such as structured, semi-structured, or unstructured, data will be stored differently. Structured data is typically stored in a relational database such as SQL Server or Azure SQL Database. Azure SQL Database is a service that runs in the cloud. You can use it to create and access relational tables. The service is managed and run by Microsoft Azure. The act of setting up the database server is called provisioning. You can provision other services as well in Microsoft Azure. For example, if you want to store unstructured data such as video or audio files, you can use Azure Blob Storage. Blob is an acronym for Binary Large Object. If you want to store semi-structured data such as documents, you can use a service such as Azure Cosmos DB. After your service is provisioned, the service needs to be configured so that users can be given access to the data. 

The work performed by transactional systems is often referred to as online transactional processing or simply OLTP. Analytical systems are concerned with capturing raw data and using it to generate insights. An organization can use these insights to make business decisions. Data ingestion is the process of capturing the raw data and will generally require what is referred to as ETL or Extract, Transform, and Load. This data could be taken from controlled devices measuring environmental information such as temperature and pressure, point of sale devices, and recording the items purchased by a customer in a supermarket. Financial data recording the movement of money between bank accounts and weather data from weather stations. Some of this data may originate from different systems with different formats to process and analyze this data, you must first store the data in a repository of some sort. The repository could be a file store, a document database or even a relational database. During data transformation or data processing, the raw data might not be in the format that is suitable for querying. The data might contain anomalies that should be filtered out, or it may require transforming in some way. For example, dates or addresses might need to be converted into a standard format. After data is ingested into a data repository, you may want to do some cleaning operations and remove any questionable or invalid data or perform some aggregations such as calculating profit, margin, and other key performance metrics, and KPIs. KPIs are how businesses are measured for growth and performance. In data querying, after data is ingested and transformed, you can query the data to analyze it. You may be looking for trends are attempting to determine the cause of problems in your systems. Many database management systems provide tools to enable you to perform atoz queries against your data and generate regular reports. In data visualization, data represented in tables such as rules and columns or as documents aren't always intuitive. Visualizing the data can often be useful as a tool for examining data. You can generate charts such as bar charts, line charts, pie charts, plot results on geographical maps or illustrate how data changes over time. Microsoft offers visualization tools like Power BI to provide a rich graphical representation of your data. 

A primary use of relational databases is to handle transaction processing. A transaction is a sequence of operations that are atomic. This means that either all operations in the sequence must be completed successfully or if something goes wrong, all operations run so far in the sequence must be undone or rolled back. Bank transfers are a good example. You deduct funds from one account and credit the equivalent funds to another account. If the system fails and the transaction cannot be completed, the funds must be reinstated in the original account by rolling back the transaction. They mustn't be lost. You can then attempt to perform the transfer again. Similarly, you shouldn't be able to credit and account twice with the same funds. Each database transaction has a defined beginning point, followed by steps to modify the data within the database. At the end, the database either commits the changes to make them permanent or rolls back the changes to the starting point, when the transaction can be tried again. A transactional database must adhere to the ACID, atomicity, consistency, isolation, durability, properties to ensure that the database remains consistent while processing transactions. Atomicity guarantees that each transaction is treated as a single unit, which either succeeds completely or fails completely. If any of the statements constituting a transaction fails to complete, the entire transaction fails and the database is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors and crashes. Consistency ensures that a transaction can only take data in the database from one valid state to another. A consistent database should never lose or create data in a manner that can't be accounted for. In the bank transfer example described earlier, if you add funds to an account, there must be a corresponding deduction of funds somewhere or a record that describes where the funds have come from if they have been received externally. You can't suddenly create or lose money. Isolation ensures that the concurrent execution of transactions leaves the database in the same state that would have been obtained if the transactions were executed sequentially. A concurrent process can't see the data in an inconsistent state. For example, the funds have been deducted from one account but not yet credited to another. Durability guarantees that once a transaction has been committed, it will remain committed even if there is a system failure, such as a power outage or a crash.

Many systems implement relational consistency and isolation by applying locks on data when it's being updated or modified. The lock prevents another process from reading the data until the lock is released. The lock is only released when the transaction commits or rolls back. Extensive locking can lead to poor performance, while applications wait for locks to be released. Distributed databases are widely used in many organizations.

A distributed database is a database in which data is stored across different physical locations. It may be held in multiple computers located in the same physical location, for example, a data center, or it may be dispersed over a network of interconnected computers. When compared to non-distributed database systems, any data update to a distributed database will take time to apply across multiple locations. If you require transactional consistency in this scenario, locks may be retained for a very long time, especially if there's a network failure between databases at a critical point in time. To counter this problem, many distributed database management systems relax the strict isolation requirements of transactions and implement eventual consistency. In this form of consistency, as an application writes data, each change is recorded by one server and then propagated to the other servers in the distributed database system asynchronously. While this strategy helps to minimize latency, it can lead to temporary inconsistencies in the data. Eventual consistency is ideal where the application doesn't require any ordering guarantees. Examples include counts of shares, likes or non-threaded comments in a social media system. 

Analytical workloads are typically read-only systems that store vast volumes of historical data or business metrics, such as sales performance and inventory levels. Analytical workloads are used for data analysis and decision-making. Analytics are generated by aggregating the facts presented by the raw data into summaries, trends and other kinds of business information. Decision-makers usually don't require all the details of every transaction, they want the bigger picture. An example of analytical information is a report on monthly sales. As the head of the sales department, you may not need to see all daily transactions that took place, transactional information, but you definitely would like a monthly sales report to identify trends and to make decisions, analytical information. Transactional information, however, is an integral part of analytical information. If you don't have good records of daily sales, you can't compile a useful report to identify trends. That's why efficient handling of transactional information is very important. 

Describe the difference between batch and streaming data

Which of the following are examples of batch processing? 

  • Election counting
  • Production line reporting
  • Credit card billing

Data Processing is simply the conversion of raw data to meaningful information through a process. Depending on how the data is ingested into your system, you could process each data item as it arrives or buffer the raw data and process it in groups. 

  • Processing data as it arrives is called streaming.
  • Buffering and processing the data in groups is called batch processing.

In batch processing, newly arriving data elements are collected into a group. The whole group is then processed at a future time as a batch. Exactly when each group is processed can be determined in a number of ways. For example, you can process data based on a scheduled time interval, for example every hour, or it could be triggered when a certain amount of data has arrived or as a result of some other event. An example of batch processing is the way that votes are typically counted in elections. The votes are not entered when they are cast, but are all entered together at one time in a batch. Advantages of batch processing include, large volumes of data can be processed at a convenient time. It can be scheduled to run at a time when computers or systems might otherwise be idle, such as overnight, or during off-peak hours. Disadvantages of batch processing include the time delay between ingesting the data and getting the results. All batch jobs input data must be ready before a batch job can be processed. This means data must be carefully checked. Problems with data errors and program crashes that occurred during batch jobs bring the whole process to a halt. The input data must be carefully checked before the job can be run again. Even minor data errors such as typographical errors and dates can prevent a batch job from running. An example of effective use of batch processing would be a connection to a mainframe system. Vast amounts of data need to be transferred into a data analysis system, and the data is not real-time. An example of ineffective batch processing would be to transfer small amounts of real-time data such as financial stock ticker. In stream processing, each new piece of data is processed when it arrives. For example, data ingestion is inherently a streaming process. Streaming handles data in real time. Unlike batch processing, there's no waiting until the next batch processing interval, and data is processed as individual pieces rather than being processed a batch at a time. Streaming data processing is beneficial in most scenarios where new dynamic data is generated on a continual basis. In financial institution tracks changes in the stock market in real time, computes value at risk, and automatically rebalances portfolios based on stock price movements. An online gaming company collects real-time data about player game interactions and feeds the data into its gaming platform. It then analyzes the data in real-time, offers incentives and dynamic experiences to engage its players. A real estate website that tracks a subset of data from consumers' mobile devices, and makes real-time property recommendations of properties to visit based on their geolocation. Stream processing is ideal for time critical operations that require an instant real-time response. For example, a system that monitors a building for smoke and heat needs to trigger alarms and unlock doors to allow residents to escape immediately in the event of a fire. Apart from the way in which batch processing and streaming processing handles data, there are other differences. Data scope. Batch processing can process all the data in the data set. Stream processing typically only has access to the most recent data received or within a rolling time window, the last 30 seconds, for example. Batch processing is suitable for handling large data sets efficiently. Stream processing is intended for individual records or micro batches consisting of a few records. Performance. The latency for batch processing is typically a few hours. Stream processing typically occurs immediately with latency in the order of seconds or milliseconds. Latency is the time taken for the data to be received and processed. Analysis. You typically use batch processing for performing complex analytics. Stream processing is used for simpler response functions, aggregates, or calculations such as rolling averages.

Knowledge Check

Tabular data that is represented by rows and columns in a database is referred to as?
Structured data

A key-value store is similar to a relational table except that each row can have any number of columns.
True

Unstructured Data can be made up of which of the following?
Images
Binary data
Video files


Structured data is typically stored in which of the following?
SQL Server or Azure SQL Database

The process of splitting data into a large number of narrow, well-defined tables with references from one table to another is referred to as normalization?
True

A transaction is defined as a sequence of operations that are atomic and the transactional database must adhere to the ACID properties. In this context, what does ACID stands for?
Atomicity
Consistency
Isolation
Durability


Which of the following are advantages of Batch Processing?
It can be scheduled to run at a time when computers or systems might otherwise be idle.
Large volumes of data can be processed at a convenient time.


Batch processing is suitable for handling large datasets efficiently while Stream processing is intended for individual records or micro-batches consisting of few records.
True

Which of the following is an example of a streaming dataset?
Data from sensors and devices.

Job roles in the world of data

There are a wide variety of rules involved in managing, controlling, and using data. There are three key job roles that deal with data in most organizations: Database administrators manage databases, assigning permissions to users, storing backup copies of data, and restore data in case of any failures. Data engineers are vital in working with data, applying data cleaning routines, identifying business rules, and turning data into useful information. Data analysts explore and analyze data to create visualizations and charts to enable organizations to make informed decisions.An Azure Database Administrator is responsible for the design, implementation, maintenance and operational aspects of on-premises and cloud-based Database solutions built on Azure Data Services and SQL Server. They're responsible for the overall availability and consistent performance and optimizations of the Database solutions. They work with stakeholders to implement policies, tools and processes for backup, and recovery plans to recover following a natural disaster are human-made. The database administrator is also responsible for managing the security of the data in the database, granting privileges over the data, granting or denying access to users as appropriate. The Data Engineer collaborates with stakeholders to design and implement data-related assets that include data ingestion pipelines, cleansing and transformation activities, and data stores for analytical workloads. They use a wide range of data platform technologies including, relational and non-relational databases, file stores and data streams. They're also responsible for ensuring that the privacy of data is maintained within the Cloud and spanning from on-premises to the cloud data stores. They also own the management and monitoring of data stores and data pipelines to ensure that data loads perform as expected. A data analyst enables businesses to maximize the value of their data assets. They're responsible for designing and building scalable models, cleaning and transforming data and enabling advanced analytics, capabilities through reports and visualizations. A data analyst processes raw data into relevant insights based on identified business requirements to deliver relevant insights. 

Which of the following are tasks typically performed by a Data Engineer?
Aligning the data architecture with business requirements. 
Developing, constructing, testing, and maintaining databases and data structures
Using data to discover tasks that can be automated.

Data engineers are tasked with managing and organizing data while also monitoring for trends or inconsistencies that would impact business goals. It's a highly technical position requiring experience and skills in areas like programming, mathematics, and computer science. Some of the most common roles and responsibilities of a data engineer include developing, constructing, testing, and maintaining databases and data structures. Aligning the data architecture with business requirements, data acquisition. Developing processes for creating and retrieving information from data sets using programming languages and tools to examine the data. Identifying ways to improve data reliability, efficiency, and quality. Conducting research for industry and business questions. Deploying sophisticated analytics programs, machine learning, and statistical methods. Preparing data for predictive and prescriptive modeling and using data to discover tasks that can be automated. To master data engineering, you need to be familiar with a range of tools that enable you to create well designed databases optimized for the business processes that will be run. You must have a thorough understanding of the architecture of the database management system, the platform on which the system runs and the business requirements for the data being stored in the database. If you're using a relational database management system, you'll need to be fluent in SQL. You must be able to use SQL to create databases, tables, indexes, views, and the other objects required by the database. Many database management systems provide tools that enable you to create and run SQL scripts. For example, SQL Server Management Studio, described earlier, lets you create and query tables visually. But you can also create your own SQL scripts manually. In some cases, you may need to interact with the database from the command line. For example, you can use the SQL command utility to connect to Microsoft, SQL server and Azure SQL database and run ad hoc queries and commands. As a sequel server professional, your primary data manipulation tool might be Transact-SQL. As a data engineer, you might use additional technologies, such as Azure Databricks and Azure HDInsight to generate and test predictive models. If you're working in the non relational field, you might use as Azure Cosmos DB as your primary data store. To manipulate and query the data, you might use languages such as HiveQL, R, or Python.


Which of the following are tasks typically performed by a data analyst?
Making large or complex data more accessible, understandable, and usable.
Transforming, improving, and integrating data from many sources, depending on the business requirements.
Finding hidden patterns using data.

Data analysts, are responsible for understanding what data actually means. A skilled data analyst, will explore the data, and use it to determine trends, issues and gain other insights, that might be a benefit, to the company. A large part of the data analyst role, is concerned with communication, and visualization. Data visualization, is key, to presenting large amounts of information, in ways that are universally understandable, or easy to interpret, and spot patterns, trends and correlations. These representations include charts, graphs, infographics, and other pictorial diagrams. Data visualization analysts, use visualization tools and software, to communicate information in these ways, for clients, or, for their own company. A good data analyst, requires experience, and skills in reporting tools, such as Microsoft Power BI, and SQL Server Reporting Services. 

The primary functions of a data analyst, usually include the following:

Making large or complex data, more accessible, understandable and usable.creating charts and graphs, histograms, geographical maps, and other visual models, that help to explain the meaning of large volumes of data, and isolate areas of interest, transforming, improving, and integrating data, from many sources, depending on the business requirements, combining the data result sets, across multiple sources, for example, combining sales data, and weather data, provides, a useful insight into how weather influence sales, of certain products, such as ice creams, finding hidden patterns using data, and delivering information in a useful and appealing way, to users, by creating rich graphical dashboards and reports. 

Traditionally, many data analysts, used Microsoft Office apps, such as Microsoft Excel for creating rich visual reports. Many analysts, now use Microsoft Power BI, a powerful visualization platform, to create rich graphical dashboards, and reports over data that can vary dynamically. Power BI, is a collection of software, services, apps and connectors, that work together, to turn your unrelated sources of data, into coherent, visually immersive, and interactive insights. Your data, might be held somewhere local, such as an Excel spreadsheet, or in a collection of cloud based and on premises databases, or some other set of data sources. Power BI, let's you easily connect to your data sources, discover what's important in that data, and share your findings with others in the organization. An analyst, can create a dashboard, using Power BI, to examine retail sales data, for items sold across multiple stores and districts. The metrics, compare this year's performance, to last year's for sales units, gross margin and variance, as well as new store analysis. 

Knowledge check

Which of the following are key roles related to dealing with data in an organization?
Data Analysts
Database Administrators
Data Engineers

Who is responsible for the design, implementation, maintenance, and operational aspects of on-premises and cloud-based database solutions built on Azure data services and SQL Server?
Database Administrators

Who is responsible for designing and building scalable models, cleaning and transforming data, and enabling advanced analytics capabilities through reports and visualizations?
Data Analysts

Azure Data Studio is a cross-platform database administration tool that provides a graphical user interface for managing many different database systems and provides connections to:On-premises SQL Server databases,Azure SQL Database,PostgreSQL,Azure SQL Data Warehouse
True

Which role is responsible for the following tasks?
Developing processes for creating and retrieving information from data sets.
Using programming languages and tools to examine the data.
Identifying ways to improve data reliability, efficiency, and quality.
Researching industry and business questions

Data Engineers

Which role is responsible for the following tasks:
Making large or complex data more accessible, understandable, and usable.
Creating charts and graphs, histograms, geographical maps, and other visual models
Transforming, improving, and integrating data from many sources
Combining the data result sets across multiple sources

Data Analysts

Microsoft Power BI is a powerful visualization platform, to create rich, graphical dashboards and reports over data that can vary dynamically.
True

SQL Server Management Studio enables you to generate Transact-SQL scripts allowing for the scheduling and automation of many common tasks.
True

Test prep

Semi-structured data can be held in which of the following formats?
Key Value Stores
JSON Format
Graph Databases

Which of the following statements describe Online Transactional Processing (OLTP)?
Systems that are often high-volume, sometimes handling many millions of transactions in a single day.

Which of the following is an example of Unstructured data?
Audio and Video files.

Collaborating with stakeholders to design and implement data-related assets that include data ingestion pipelines, cleansing and transformation activities, and data stores for analytical workloads, is the responsibility of which one of the following roles?
Data Engineers

Which role is responsible for the following tasks?
Enrolling users and maintaining system security.
Controlling and monitoring user access to the database.
Monitoring and optimizing the performance of the database.
Planning for backup and recovery of database information.

Which of the following will allow for the creation of visualizations and reports?
Power BI

Which tool can be used to assist in creating custom Dashboards that will help in graphically organizing your resources in Azure?
Azure Portal

You have been given the task to determine what solutions or services best suit your business needs. You need to provision end-to-end solutions quickly and reliably, hosted in your own Azure environment. ​
Azure Marketplace

In the Azure Marketplace you can find, try, buy, and deploy the software and services you need to build new solutions and manage your cloud infrastructure. You can provision end-to-end solutions quickly and reliably, hosted in your own Azure environment.

List of Azure services

Azure Services

This reading includes details of the most commonly used categories of Azure Services:
Compute
Web
Internet of Things (IoT)
Big data
AI
DevOps
Networking
Storage
Mobile
Databases
Compute

Service name Service function

Azure Virtual Machines                         Windows or Linux virtual machines (VMs) hosted in Azure
Azure Virtual Machine Scale Sets         Scaling for Windows or Linux VMs hosted in Azure
Azure Kubernetes Service                     Cluster management for VMs that run containerized services
Azure Service Fabric                             Distributed systems platform that runs in Azure or on-premises
Azure Batch computing applications    Managed service for parallel and high-performance 
Azure Container Instances                    Containerized apps run on Azure without provisioning servers or VMs
Azure Functions                                    An event-driven, serverless compute service


Networking

Linking compute resources and providing access to applications is the key function of Azure networking. Networking functionality in Azure includes a range of options to connect the outside world to services and features in the global Azure data centers.

Service name     Service function

Azure Virtual Network Connects       VMs to incoming virtual private network (VPN) connections
Azure Load Balancer                          Balances inbound and outbound connections to applications or service endpoints
Azure Application Gateway               Optimizes app server farm delivery while increasing application security
Azure VPN Gateway Accesses          Azure Virtual Networks through high-performance VPN gateways
Azure DNS                                        Provides ultra-fast DNS responses and ultra-high domain availability
Azure Content Delivery Network     Delivers high-bandwidth content to customers globally
Azure DDoS Protection            Protects Azure-hosted applications from distributed denial of service(DDOS) attacks
Azure Traffic Manager                     Distributes network traffic across Azure regions worldwide
Azure ExpressRoute   Connects to Azure over high-bandwidth dedicated secure connections
Azure Network Watcher                   Monitors and diagnoses network issues by using scenario-based analysis
Azure Firewall                                  Implements high-security, high-availability firewall with unlimited scalability
Azure Virtual WAN   Creates a unified wide area network (WAN) that connects local and remote sites

Storage

Azure provides four main types of storage services.

Service name Service function

Azure Blob storage Storage service for very large objects, such as video files or bitmaps
Azure File storage File shares that can be accessed and managed like a file server
Azure Queue storage A data store for queuing and reliably delivering messages between applications
Azure Table storage A NoSQL store that hosts unstructured data independent of any schema

Mobile

With Azure, developers can create mobile back-end services for iOS, Android, and Windows apps quickly and easily. Features that used to take time and increase project risks, such as adding corporate sign-in and then connecting to on-premises resources such as SAP, Oracle, SQL Server, and SharePoint, are now simple to include.

Other features of this service include:

Offline data synchronization.
Connectivity to on-premises data.
Broadcasting push notifications.
Autoscaling to match business needs.

Databases

Azure provides multiple database services to store a wide variety of data types and volumes. 

Service name Service function

Azure Cosmos DB Globally distributed database that supports NoSQL options
Azure SQL Database Fully managed relational database with auto-scale, integral                intelligence, and robust security
Azure Database for MySQL Fully managed and scalable MySQL relational database with high availability and security
Azure Database for PostgreSQL Fully managed and scalable PostgreSQL relational database with high availability and security

SQL Server on Azure Virtual Machines     Service that hosts enterprise SQL Server apps in the cloud

Azure Synapse Analytics Fully managed data warehouse with integral security at every level    of scale at no extra cost

Azure Database Migration Service     Service that migrates databases to the cloud with no application    code changes
Azure Cache for Redis Fully managed service caches frequently used and static data to       reduce data and application latency
Azure Database for MariaDB Fully managed and scalable MariaDB relational database with high    availability and security

Web

Having a great web experience is critical in today's business world. Azure includes first-class support to build and host web apps and HTTP-based web services.

Service name Description

Azure App Service Quickly create powerful cloud web-based apps.
Azure Notification Hubs Send push notifications to any platform from any back end.
Azure API Management Publish APIs to developers, partners, and employees securely and at scale.
Azure Cognitive Search Deploy this fully managed search as a service.
Web Apps feature of Azure App Service Create and deploy mission-critical web apps at scale.
Azure SignalR Service Add real-time web functionalities easily

IoT

People are able to access more information than ever before. Personal digital assistants led to smartphones, and now there are smart watches, smart thermostats, and even smart refrigerators. Personal computers used to be the norm. Now the internet allows any item that's online-capable to access valuable information. This ability for devices to garner and then relay information for data analysis is referred to as IoT.

Service name Description

IoT Central Fully managed global IoT software as a service (SaaS) solution that makes it easy to connect, monitor, and manage IoT assets at scale

Azure IoT Hub Messaging hub that provides secure communications between and monitoring of millions of IoT devices

IoT Edge Fully managed service that allows data analysis models to be pushed directly onto IoT devices, which allows them to react quickly to state changes without needing to consult cloud-based AI models

Big data

Data comes in all formats and sizes. When we talk about big data, we're referring to large volumes of data. Data from weather systems, communications systems, genomic research, imaging platforms, and many other scenarios generate hundreds of gigabytes of data. This amount of data makes it hard to analyze and make decisions. It's often so large that traditional forms of processing and analysis are no longer appropriate.

Open-source cluster technologies have been developed to deal with these large data sets. Azure supports a broad range of technologies and services to provide big data and analytic solutions.

Service name Description

Azure Synapse Analytics Run analytics at a massive scale by using a cloud-based enterprise data warehouse that takes advantage of massively parallel processing to run complex queries quickly across petabytes of data.

Azure HDInsight Process massive amounts of data with managed clusters of Hadoop clusters in the cloud.

Azure Databricks Integrate this collaborative Apache Spark-based analytics service with other big data services in Azure.

AI

AI, in the context of cloud computing, is based around a broad range of services, the core of which is machine learning. Machine learning is a data science technique that allows computers to use existing data to forecast future behaviors, outcomes, and trends. Using machine learning, computers learn without being explicitly programmed.

Service name Description

Azure Machine Learning Service Cloud-based environment you can use to develop, train, test, deploy,imanage, and track machine learning models. It can auto-generate a model and auto-tune it for you. It will let you start training on your local machine, and then scale out to the cloud.

Azure Machine Learning Studio Collaborative visual workspace where you can build, test, and deploy machine learning solutions by using prebuilt machine-learning algorithms and data-handling modules.

A closely related set of products are cognitive services. You can use these prebuilt APIs in your applications to solve complex problems.

Service name Description

Vision Use image-processing algorithms to smartly identify, caption, index, and moderate your pictures and videos.
Speech Convert spoken audio into text, use voice for verification, or add speaker recognition to your app.
Knowledge mapping Map complex information and data to solve tasks such as intelligent recommendations and semantic search.
Bing Search Add Bing Search APIs to your apps and harness the ability to comb billions of webpages, images, videos, and news with a single API call.
Natural Language Processing Allow your apps to process natural language with pre-built scripts, evaluate sentiment, and learn how to recognize what users want.

DevOps

DevOps brings together people, processes, and technology by automating software delivery to provide continuous value to your users. With Azure DevOps, you can create build and release pipelines that provide continuous integration, delivery, and deployment for your applications. 

Service name Description

Azure DevOps Use development collaboration tools such as high-performance pipelines, free private Git repositories, configurable Kanban boards, and extensive automated and cloud-based load testing. Formerly known as Visual Studio Team Services.

Azure DevTest Labs Quickly create on-demand Windows and Linux environments to test or demo applications directly from deployment pipelines.

Knowledge Check 

Cloud computing is the delivery of computing services using a pay-as-you-go pricing model.Which of the following statements are true of the pay-as-you-go pricing model?
        You can immediately stop paying for resources that are no longer needed.
        You rent compute power and storage from someone else's datacenter.

Cloud Computing provides several benefits over a physical environment. Which of the following are the benefits of cloud computing?
High availability
        Agility
        Elasticity

Which of the following options is not defined as a type of cloud deployment model?
        Private cloud

True or False?
The Azure Portal updates continuously and requires no downtime for maintenance activities.  
        True

True or False?
Serverless computing enables developers to build applications faster by eliminating the need for them to manage infrastructure.
        True

True or False?
The Azure Marketplace is a service on Azure specific to Microsoft Products.
False

Azure provides four main types of storage services. One of these services Azure Blob storage provides for which of the following.
        Storage service for very large objects such as video files or bitmaps.

Which tool can you use to assist in creating custom Dashboards that will help in graphically organizing your resources in Azure?
   Azure Portal

True or False?
To create and use Azure services, you need an Azure subscription

True

Cloud computing provides computing services over the internet using a pay-as-you-go pricing model. With this model you typically only pay for the cloud services you use. Which of the following are benefits of the pay-as-you-go pricing model?
Lower operating costs.
You can run your infrastructure more efficiently. 
You can scale as your business needs to change.

Knowledge Check 

A cloud environment has several advantages over a physical environment and these benefits become available to companies once they have migrated to the cloud. One of these benefits is referred to as agility.

Cloud-based resources can be deployed and configured quickly as your requirements change.

One of the advantages of cloud computing is scalability. Applications can be scaled vertically and horizontally. Do you think that the following statement is true or false?

Horizontal scalability is when computing capacity can be increased by adding additional RAM or CPUs to a virtual machine.

False

Cloud service providers operate on a consumption-based model. Which of the following are characteristics of a consumption-based model?

You only pay for additional resources when they are needed.

There is no need for companies to purchase and manage costly infrastructure that they may or may not use to its full capacity.

In cloud computing, there are two different types of expenses that should be considered, capital expenditure (CapEx) and operational expenditure (OpEx).

Which of these provides for the upfront spending of money on physical infrastructure, and then deducting that upfront expense overtime?

Capital expenditure (CapEx)

Cloud service models define the different levels of shared responsibility between a cloud provider and cloud tenant.

In which cloud service model is the cloud provider responsible for managing the virtual machines and networking resources that the cloud tenant deploys their applications into?

Paas

The PaaS cloud service model is a managed hosting environment. In this model the cloud provider manages the virtual machines and networking resources. The cloud tenant deploys their applications into this managed hosting environment.

Cloud service models define the different levels of shared responsibility that a cloud provider and cloud tenant are responsible for. 

In which model does the cloud provider keep the hardware up to date but operating system maintenance and network configuration are left to the cloud tenant?

IaaS

IaaS is the cloud service model that is closest to managing physical servers. In this model the cloud provider keeps the hardware up to date but operating system maintenance and network configuration are left to the cloud tenant. 

---------

When you create an index in a database, you specify a column from the table and the index contains a copy of this data in assorted order with pointers to the corresponding rows in the table. When the user runs a query that specifies this column in the where clause, the database management system can use this index to fetch the data more quickly than if it had to scan through the entire table row by row. In this example, the query retrieves all orders for customer C1. The orders table has an index on the customer ID column. The database management system can then consult the index to quickly find all matching rows in the orders table. You can create many indexes on a table. So if you also wanted to find all orders for a specific product, then creating another index on the product ID column in the orders table would be useful. However, indexes aren't free. They will inevitably come with some trade-off. For example, an index might consume additional storage space, and each time you insert, update or delete data in a table, the indexes for the table must be maintained. This additional work can slow down, insert, update and delete operations, and incur additional processing charges. Therefore, when deciding which indexes to create, you must strike a balance between having indexes that speed up your queries versus the cost of performing other operations. In a table that is read-only or that contains data that is modified infrequently, more indexes may improve query performance. If a table is subject to a large number of inserts, updates, and deletes, such as a table involved in OLTP, then creating too many indexes on that table can slow your system down. Some relational database management systems also support clustered indexes. A clustered index physically reorganizes a table by the index key. This arrangement can improve the performance of queries still further because the relational database management system doesn't have to follow references from the index to find the corresponding data in the underlying table. A table can only have a single clustered index. Let's talk about views now. A view is a virtual table based on the results set of a query. In the simplest case, you can think of a view as a window on specified roles in an underlying table. You can query the view and filter the data in much the same way as a table. A view can also display data from tables joined together. To summarize, you can think of a view as a storage query. The user does not need to know where the data is stored or the complexity of the query, they have immediate access to the relevant data. 

Which of the following are features of the cloud-based approach to hosting?
There are no capital expenses.
Virtual technology is used to host the company’s applications offsite.
There are no operational expenses. 

Knowledge Check

The main characteristics of a relational database include:

All data is tabular. 

Entities are modeled as Tables.

A table can contain any number of rows.

True or False?

A foreign key uniquely identifies each row in a table. No two rows can share the same primary key.

False

A foreign key references rows in another, related table. For each value in the foreign key column, there should be a row with the same value in the corresponding primary key column in the other table.

Which of the following may have an adverse effect on your system by adding more indexes?

A table that is queried infrequently, but subject to a large number of inserts, updates, and deletes. 

True or False?

Some relational database management systems support clustered indexes that physically reorganize a table by the index key.

True

Which of the following are benefits of a cloud-based approach to Database Management Systems?

Scalability

Organizations only have to pay for the resources they use.

Data can be backed up regularly.

True or False?

Azure offers several PaaS solutions for relational databases including:

Azure SQL Database

Azure Database for PostgreSQL

Azure Database for MySQL

Azure Database for MariaDB

True

Which of the following is a characteristic of a relational database table?

A row in a table represents a single entity or record.

------------

Non-relational data generally falls into two categories, semi-structured and non-structured.  A JSON document is enclosed in curly brackets. Each field has a name, a label, followed by a colon, and then the value of the field. Fields can contain simple values or sub-documents, each starting and ending with curly brackets. Fields can also have multiple values held as arrays and surrounded with square brackets. Literals or fixed values in a field are enclosed in quotes, and fields are separated with commas. Fields might also have multiple sub-fields, also with names. When an application reads a document, it can use a JSON parser to break up the document into its component fields and extract the individual pieces of data. 

Other formats, you might see include Avro, ORC, and Parquet. Let's take a brief look at each now. Avro is a robust format. It was created by Apache. Each record contains a header that describes the structure of the data in the record. The header is stored as JSON. The data is stored as binary information. An application uses the information in the header to parse the binary data and extract the fields it contains. Avro is a very good format for compressing data and minimizing storage and network bandwidth. ORC or Optimized Row Columnar format organizes data into columns rather than rows. It was developed by HortonWorks for optimizing, read and write operations in Apache Hive. Hive is a data warehouse system that supports fast data summarization and querying over very large datasets. Hive supports SQL-like queries over unstructured data. An ORC file contains stripes of data. Each stripe holds the data for a column or set of columns. A stripe contains an index into the rows in the stripe, the data for each row, and a footer that holds statistical information count some max men as a one for each column. Parquet is another Columnar data format. It was created by Cloudera and Twitter. A Parquet file contains row groups. Data for each column is stored together in the same row group. Each row group contains one or more chunks of data. A Parquet file includes metadata that describes the set of rules found in each chunk. An application can use this metadata to quickly locate the correct chunk for a given set of rows and retrieve the data in the specified columns for these rows. Parquet specializes in storing and processing nested data types efficiently. It supports very efficient compression and encoding schemes. Unstructured data is data that doesn't naturally contain fields. Examples of unstructured data include audio, video, and other media streams. Each item is an amorphous blob of binary data. You can't search for specific elements in this data. You might choose to store data such as this in storage that is specifically designed for the purpose. In Microsoft Azure, you would probably store video and audio data as block blobs in an Azure storage account. The term blob stands for, binary large object. A block blob only supports basic read and write operations. You could also consider files as a form of unstructured data, although in some cases a file might include metadata that indicates what type of file it is, for example, a photograph or a document, Excel spreadsheets, and so on. The file might also contain metadata, such as owner and other elements that could be stored as fields. However, the main content of the file is unstructured. 

An application that retrieves data from a key-value store is responsible for parsing the contents of the values returned. Write operations are restricted to inserts and deletes. If you need to update an item you must retrieve the item, modify it in memory in the application, and then write it back to the database, overwriting the original, effectively a delete and an insert. Azure Table storage is an example of a key-value store, Cosmos DB also implements a key-value store using the Table API. A document database represents the opposite end of the NoSQL spectrum from a key-value store. In a document database each document has a unique ID, but the fields in the documents are transparent to the database management system. Other document database management systems, such as Cosmos DB, can only read and write entire documents, in these cases an update replaces the entire document with a new version, this approach helps to reduce fragmentation in the database which can in turn improve performance. Most document databases will ingest large volumes of data more rapidly than a relational database, but aren't as optimal as a key-value store for this type of processing. The focus of a document database is its query capabilities, The most widely used column family database management system is Apache Cassandra. Azure Cosmos DB supports the column family approach through the Cassandra API. Graph databases enable you to store entities, but the main focus is on the relationships that these entities have with each other. A graph database stores two types of information, nodes that you can think of as instances of entities, and edges which specify the relationships between nodes. Nodes and edges can both have properties that provide information about that node or edge, like columns in a table, additionally, edges can have a direction indicating the nature of the relationship. The purpose of a graph database is to enable an application to efficiently perform queries that traverse the network of nodes and edges, and to analyze the relationships between entities. Azure Cosmos DB supports graph databases using the Gremlin API, the Gremlin API is a standard language for creating and querying graphs. 

Knowledge Check

In a relational database system, you store the information for entities in collections or containers. 

False

Which of the following is true when implementing non-relational databases?

Non-relational databases are either schema-free or have relaxed schemas.

Non-relational data generally falls into which two categories. Which of the following are categories of non-relational data?

Non-Structured

Semi-Structured

The term JSON stands for JavaScript Object Notation. 

True

Which of the following are examples of unstructured data? 

Audio

Video

Messagings

A key-value store is a type of NoSQL database for inserting and querying data. Each data item in a key-value store has two elements, a key, and a value. 

The key uniquely identifies the item, and the value holds the data for the item.

Graph databases enable you to store entities, but the main focus is on the relationships that these entities have with each othe

Nodes are similar to instances of entities. Edges specify the relationships between the Nodes.

Which of the following services should you deploy if you need to implement a non-relational database?

Azure Cosmos DB

Data ingestion is the process of obtaining and importing data for immediate use or storage in a database.

True

In a data analytics solution, you combine this data and construct a data warehouse that you can use to ask and answer questions about your business operations. Building a data warehouse requires that you can capture the data that you need and wrangle it into an appropriate format. You can then use analysis tools and visualizations to examine the information and identify trends and their causes. It's important to note, the term wrangling is the process by which you transform and map raw data into a more useful format for analysis. It can involve writing code to capture, filter, clean, combine and aggregate data from many sources. Let's examine two important stages in data analytics, data ingestion, and data processing. Let's explore how the two stages fit together. Firstly, the data ingestion stage where raw data is ingested from many sources, such as critical data, financial transactions and point of sale data. This raw data is then stored where it can be processed. The data processing stage cleans and transforms the data. This data can then be stored in a variety of formats where it can be made available for data explorations such as queries and visualizations. Data ingestion is the process of obtaining and importing data for immediate use or storage in a database. The data can arrive as a continuous stream, or it may come in batches depending on the source. The purpose of the ingestion process is to capture this data and store it.

A data analyst could use machine learning to help determine future trends based on these models, the data processing mechanism can take two approaches to retrieving the ingested data, processing this data to transform it and generate models, and then saving the transformed data and models. These approaches are known as ETL and ELT. ETL stands for Extract, Transform, and Load. The raw data is retrieved and transformed before being saved. The extract, transform, and load steps can be performed as a continuous pipeline of operations. It is suitable for systems that only require simple models with little dependency between items. For example, this type of process is often used for basic data cleaning tasks, duplicating data and reformatting the contents of individual fields. An alternative approach to ETL is ELT. ELT is an abbreviation of Extract, Load, and Transform. The process differs from ETL in that the data is stored before being transformed. The data processing engine can take an iterative approach, retrieving, and processing the data from storage before writing the transform data and models back to storage. ELT is more suitable for constructing complex models that depend on multiple items in the database, often using periodic batch processing. ELT is a scalable approach that is suitable for the Cloud because it can make use of the extensive processing power available. The more stream oriented approach of ETL places more emphasis on throughput. However, ETL can filter data before it's stored. In this way, ETL can help with data privacy and compliance, removing sensitive data before it arrives in your analytical data models, Microsoft Azure provides several options that you can use to implement the ELT and ETL approaches. For example, if you are storing data in Azure SQL Database, you can use SQL Server Integration Services. Integration Services can extract and transform data from a wide variety of sources, such as XML data files, flash files and relational data sources, and then load the data into one or more destinations. There are advantages of both ETL and ELT. ETL offers improved data privacy and compliance while also not requiring specialist skills. On the other hand, ELT provides data support, and is ideal for large volumes of data. Another, more generalized approach is to use Azure Data Factory. Azure Data Factory is a Cloud based data integration service that allows you to create data driven workflows for orchestrating data movement and transforming data at scale. Using Azure Data Factory, you can create and schedule data driven workflows called pipelines, that can ingest data from disparate data stores. You can build complex ETL processes that transform data visually with data flows or by using compute services such as Azure HDInsight Hadoop, Azure Databricks, and Azure SQL database. 

The most common forms of visualizations are bar and column, line, matrix, key influencers, treemaps, scatter, dotplot and filled map. 

Let's explore these in a little more detail now. Bar and column charts enable you to see how a set of variables changes across different categories. Data is usually displayed as rectangular bars, the longer the bar the larger the value. For example, a bar chart can show how sales for a pair of fictitious retailers vary between two store sites. 

Next, we have line charts, they emphasize the overall shape of an entire series of values, usually over time. The data is depicted as lines and lines are differentiated by colour as they can overlap. 

Often report designers include matrixes in reports and dashboards to allow users to select one or more elements, rows, column cells in the matrix to cross highlight other visuals on a report page. A Matrix table can Analyze Business four Q1 and Q2 from 2015 to 2019. It's possible to visualize and cross reference values for Q1 and Q2 revenues along with their corresponding year to date are YTD revenue. 

Next let's look at key influencers, a key influencer chart displays the major contributors to a selected result or value. Key influencers are a great choice to help you understand the factors that influence a key metric.

Treemaps are charts of colored rectangles with size representing the relative value of each item. They can be hierarchical with rectangles nested within the main rectangles. 

A scatter chart shows the relationship between two numerical values. 

A bubble chart is a scatter chart that replaces data points with bubbles with the bubbles size representing an additional 3rd data dimension. 

Likewise, a dot plot chart is similar to a bubble chart and scatter chart, but can plot categorical data along the X-Axis. 

Lastly, we have a filled map, if you have geographical data, you can use a field map to display how a value differs in proportion across a geography or region. You can see relative differences with shading that ranges from light, less frequent or lower, too dark, more frequent or more.

Knowledge Check

The term wrangling refers to the process of transforming and mapping raw data into a more useful format for analysis purposes.

True

The process of taking data in its raw form, cleaning the data, and then converting it into a more meaningful format such as tables, graphs, documents, etc. is referred to by which of the following terms?

Data Processing

As part of data processing, two approaches can be undertaken. What is the process of retrieving the raw data and transforming it referred to as?

ETL

Which of the following analytics helps answer questions about what has happened based on historical data?

Descriptive analytics

Which one of the following visuals displays a tabular structure that summarizes data?

Matrix chart

Which one of the following visuals can be used to display how a value differs in proportion across a geography or region.

Filled map

Which one of the following visuals displays the major contributors to a selected result or value?

Key influencers chart

Test Prep

Which of the following are characteristics of a View?

A view is a virtual table based on the result set of a query.

You can query the view and filter the data in much the same way as a table.

What can an index on a table do?

Locate rows in a table quickly by referencing an indexed value.

When moving your databases to the cloud you can avail of which of the following cloud models?

Platform-as-a-service (PaaS)

Infrastructure-as-a-Service (IaaS)

Non-relational databases are suitable for which of the following scenarios?

Gaming. 

Web and mobile applications.

IoT and telematics.

Which of the following are features of Semi-structured entities?

You define the fields that you need on a per-entity basis.

Semi-structured data is data that contains fields.

Which of the following are categories of NoSQL?

Key-value stores

Document databases

Graph databases

Column family databases

Which of the following are features of a document database?

A document typically contains the entire data for an entity.

A document database can store data in a format such as JSON, XML, or YAML format,

What is the process of obtaining and importing data for immediate use or storage in a database called?

Ingestion

Which one of the following terms refers to the technologies, applications, and practices related to the collection, integration, analysis, and presentation of business information?

Business Intelligence

Business Intelligence (BI) refers to technologies, applications, and practices for the collection, integration, analysis, and presentation of business information. The purpose of business intelligence is to support better decision-making.

Which of the following analytics helps answer questions about why things happened?

Diagnostic analytics


──────── Credits to: Coursera ────────

DISCLAIMER

The purpose of sharing the content on this website is to Educate. The author/owner of the content does not warrant that the information provided on this website is fully complete and shall not be responsible for any errors or omissions. The author/owner shall have neither liability nor responsibility to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the contents of this website. So, use the content of this website at your own risk.

This content has been shared under Educational And Non-Profit Purposes Only. No Copyright Infringement Intended, All Rights Reserved to the Actual Owner.

For Copyright Content Removal Please Contact us by Email at besttechreads[at]gmail.com

Post a Comment

Previous Post Next Post