Glossary
Big Data
Extremely large datasets that may be analyzed computationally to reveal patterns, trends, and associations, especially relating to human behavior and interactions.
Example:
Analyzing all the posts, likes, and shares across a social media platform to understand user engagement patterns involves processing Big Data.
Cleaning Data
The process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database.
Example:
Before analyzing survey results, a researcher might need to perform cleaning data to fix typos, standardize responses like 'USA' and 'United States', or remove incomplete entries.
Correlation ≠ Causation
A principle stating that just because two variables appear to be related (correlated) does not mean one causes the other.
Example:
Finding that ice cream sales and shark attacks both increase in summer shows a correlation, but eating ice cream doesn't cause shark attacks.
Data Bias
A systematic error in a data set that skews results in a particular direction, often due to flawed collection methods or inherent societal prejudices.
Example:
If a survey about smartphone preferences is only given to teenagers, the results will have data bias because they won't represent the preferences of all age groups.
Data Center
A dedicated physical facility used to house computer systems and associated components, such as telecommunications and storage systems.
Example:
Major tech companies often build enormous data centers in remote locations to store and process vast amounts of user data securely.
Information
Data that has been analyzed and processed to reveal trends, connections, or solutions, making it meaningful and useful.
Example:
When a weather app processes raw temperature and humidity readings to tell you it will be information that it's going to rain tomorrow, that's information.
Metadata
Data that provides information about other data, describing its content, context, and characteristics.
Example:
The date a photo was taken, the camera model used, and any tags added to it are all examples of metadata for that image file.
Scalability
A system's ability to handle a growing amount of work or its potential to be enlarged to accommodate that growth.
Example:
A popular online game needs good scalability to ensure it can handle a sudden surge of new players without crashing or slowing down.
Server Farm
A large collection of computer servers networked together to provide the server functionality much beyond the capability of a single machine.
Example:
To handle millions of users simultaneously streaming videos, a company would rely on a massive server farm to process all the requests.